## Sentiment Analysis

Can use SparkNLP for this too:

In [None]:
import sparknlp
from sparknlp.base import DocumentAssembler, Finisher
from sparknlp.annotator import SentenceDetector, Tokenizer, LemmatizerModel, SentimentDetectorModel
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType, FloatType

# Initialize SparkNLP
spark = sparknlp.start()

# Load pre-trained models
document_assembler = DocumentAssembler().setInputCol("article_content").setOutputCol("document")
sentence_detector = SentenceDetector().setInputCols(["document"]).setOutputCol("sentences")
tokenizer = Tokenizer().setInputCols(["sentences"]).setOutputCol("tokens")
lemmatizer = LemmatizerModel.pretrained().setInputCols(["tokens"]).setOutputCol("lemmas")
sentiment_detector = SentimentDetectorModel.pretrained().setInputCols(["lemmas"]).setOutputCol("sentiment")

# Define UDF to get overall sentiment score
def get_sentiment_score(sentiment):
    return sentiment['result']

get_sentiment_score_udf = udf(get_sentiment_score, FloatType())

# Define UDF to get overall sentiment category
def get_sentiment_category(sentiment):
    return sentiment['sentiment']

get_sentiment_category_udf = udf(get_sentiment_category, StringType())

# Define pipeline
pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, lemmatizer, sentiment_detector])

# Apply pipeline to input DataFrame
output = pipeline.fit(input_df).transform(input_df)

# Extract sentiment information from output
output = output.selectExpr("article_content", "explode(sentiment.result) as sentiment_info")

# Add columns for sentiment score and category
output = output.withColumn("sentiment_score", get_sentiment_score_udf("sentiment_info"))
output = output.withColumn("sentiment_category", get_sentiment_category_udf("sentiment_info"))

# Show output DataFrame
output.show(truncate=False)

In this example, we start by initializing SparkNLP and loading the necessary pre-trained models for document assembly, sentence detection, tokenization, lemmatization, and sentiment detection.

We then define two UDFs to extract the overall sentiment score and category from the sentiment output of the SentimentDetectorModel.

Next, we define a pipeline consisting of the pre-trained models and apply it to the input DataFrame to generate an output DataFrame with sentiment information.

Finally, we use the UDFs to add columns for the sentiment score and category to the output DataFrame.

Note that in this example, we assume that the SentimentDetectorModel will provide sentiment information at the sentence level. If you want to get the overall sentiment of the entire article, you may need to aggregate the sentiment information across all sentences.

```
+--------------------------------------------------------------------------------------------+---------------+---------------+------------------+
|article_content                                                                             |sentiment_info |sentiment_score|sentiment_category|
+--------------------------------------------------------------------------------------------+---------------+---------------+------------------+
|Asynchronous Web Scraping With Python AIOHTTP                                           |[positive, 1.0]|1.0            |positive          |
|Asynchronous Web Scraping With Python AIOHTTP                                           |[positive, 1.0]|1.0            |positive          |
|Automating Excel with Python Video Overview - Mouse Vs Python                           |[positive, 1.0]|1.0            |positive          |
|How to Monitor Python Functions on AWS Lambda with Sentry                                |[positive, 0.5]|0.5            |positive          |
|How to Monitor Python Functions on AWS Lambda with Sentry                                |[positive, 0.5]|0.5            |positive          |
|How to Monitor Python Functions on AWS Lambda with Sentry                                |[negative, 0.5]|0.5            |negative          |
+--------------------------------------------------------------------------------------------+---------------+---------------+------------------+
```