# <div style="padding:20px;color:white;margin:0;font-size:100%;text-align:left;display:fill;border-radius:5px;background-color:#6A79BA;overflow:hidden">Sentiment Analysis</div>

## Transformers pipeline

In [21]:
import pandas as pd
from transformers import pipeline
pd.set_option('display.max_colwidth', 1000)

In [9]:
# Load the Excel file into a DataFrame
df = pd.read_excel('data/nytimes_google2024h2.xlsx')
# 创建情感分析pipeline
sentiment_analyzer = pipeline('sentiment-analysis')

def compute_sentiment_score(text, max_length=512):
    if not text:
        return 0.0
    try:
        # 截断输入文本
        truncated_text = text[:max_length]
        result = sentiment_analyzer(truncated_text)[0]
        label = result['label']
        score = result['score']
        # print('label: ',label)
        # print('score: ',score)
        
        return score if label == 'POSITIVE' else -score
    except:
        return 0.0

# 计算情感分数
df['sentiment_title'] = df['Title'].apply(compute_sentiment_score)
df['sentiment_content'] = df['Content'].apply(compute_sentiment_score)

# 保存结果到CSV
df.to_csv('data/nytimes_google2024h2_with_sentiment_pipeline.csv', index=False, encoding='utf-8-sig')
df

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Unnamed: 0,Title,Tag,Created Date,url,Content,sentiment_title,sentiment_content
0,"The tech giant’s revenue also grew 15 percent,...",GOOGL,2024-07-30T20:51:51+0000,https://www.nytimes.com/2024/07/30/technology/...,Microsoft closed its first full fiscal year of...,-0.985860,-0.998521
1,The platform first known for viral videos now ...,GOOGL,2024-07-30T16:08:33+0000,https://www.nytimes.com/2024/07/30/technology/...,"Two years ago, YouTube abandoned its audacious...",0.989845,-0.998729
2,The parent company of Facebook and Instagram f...,GOOGL,2024-07-30T15:34:32+0000,https://www.nytimes.com/2024/07/30/technology/...,"Meta, the parent company of Facebook and Insta...",-0.997072,-0.988796
3,"Modern Love in miniature, featuring reader-sub...",GOOGL,2024-07-30T15:26:38+0000,https://www.nytimes.com/2024/07/30/style/tiny-...,"“I’m in love with a woman,” my mother revealed...",-0.932973,0.993388
4,The governor of Michigan on why her promises t...,GOOGL,2024-07-30T09:05:04+0000,https://www.nytimes.com/2024/07/30/opinion/ezr...,We are deep in the pivotal week of Vice Presid...,0.999443,-0.920729
...,...,...,...,...,...,...,...
265,The coronavirus pandemic schooled the world in...,GOOGL,2024-06-02T07:00:34+0000,https://www.nytimes.com/2024/06/02/business/ec...,Southern California appeared to be under siege...,0.817366,-0.994784
266,Ditch the dye; live with style.,GOOGL,2024-06-01T11:50:02+0000,https://www.nytimes.com/2024/06/01/books/read-...,"Dear readers,",0.947948,0.999183
267,Google appears to have turned off its new A.I....,GOOGL,2024-06-01T09:04:10+0000,https://www.nytimes.com/2024/06/01/technology/...,"When Sundar Pichai, Google’s chief executive, ...",-0.999009,0.995842
268,"Since Google overhauled its search engine, pub...",GOOGL,2024-06-01T09:02:28+0000,https://www.nytimes.com/2024/06/01/technology/...,When Frank Pine searched Google for a link to ...,-0.997930,-0.999112


## VaderSentiment

In [20]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Load the CSV file into a DataFrame
df = pd.read_excel('data/nytimes_google2024h2.xlsx')

# Initialize the sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

# Function to compute sentiment score
def compute_sentiment(text):
    try:
        sentiment = analyzer.polarity_scores(text)
        print('sentiment: ',sentiment)
        return sentiment['compound']  # Return the compound score which is a normalized score between -1 and 1
    except:
        return 0.0
# Apply the sentiment function to the 'title' and 'content' columns
df['sentiment_title'] = df['Title'].apply(compute_sentiment)
df['sentiment_content'] = df['Content'].apply(compute_sentiment)

# Save the DataFrame with the new sentiment columns to a new CSV file
df.to_csv('data/nytimes_google2024h2_with_sentiment_vader.csv', index=False, encoding='utf-8-sig')
df

sentiment:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentiment:  {'neg': 0.0, 'neu': 0.804, 'pos': 0.196, 'compound': 0.5267}
sentiment:  {'neg': 0.113, 'neu': 0.887, 'pos': 0.0, 'compound': -0.4939}
sentiment:  {'neg': 0.126, 'neu': 0.632, 'pos': 0.241, 'compound': 0.4588}
sentiment:  {'neg': 0.0, 'neu': 0.852, 'pos': 0.148, 'compound': 0.3818}
sentiment:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentiment:  {'neg': 0.0, 'neu': 0.66, 'pos': 0.34, 'compound': 0.8968}
sentiment:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentiment:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentiment:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentiment:  {'neg': 0.0, 'neu': 0.873, 'pos': 0.127, 'compound': 0.4939}
sentiment:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sentiment:  {'neg': 0.091, 'neu': 0.679, 'pos': 0.23, 'compound': 0.4019}
sentiment:  {'neg': 0.037, 'neu': 0.963, 'pos': 0.0, 'compound': -0.0516}
sentiment:  {'neg

Unnamed: 0,Title,Tag,Created Date,url,Content,sentiment_title,sentiment_content
0,"The tech giant’s revenue also grew 15 percent, but Wall Street is watching whether its investment in A.I. is paying off for its cloud computing business.",GOOGL,2024-07-30T20:51:51+0000,https://www.nytimes.com/2024/07/30/technology/microsoft-earnings-profit.html,Microsoft closed its first full fiscal year of aggressive artificial intelligence investment with a mixed bag of results for people worried about how much big tech companies are spending on A.I.,0.0000,0.0772
1,"The platform first known for viral videos now attracts more viewers on TVs than Netflix, Disney+ or Amazon Prime Video.",GOOGL,2024-07-30T16:08:33+0000,https://www.nytimes.com/2024/07/30/technology/youtube-streaming-tv.html,"Two years ago, YouTube abandoned its audacious plan to beat Hollywood at its own game.",0.5267,-0.2732
2,The parent company of Facebook and Instagram faced allegations that it had collected facial identification information on millions of users in violation of a state law.,GOOGL,2024-07-30T15:34:32+0000,https://www.nytimes.com/2024/07/30/technology/meta-texas-privacy-settlement.html,"Meta, the parent company of Facebook and Instagram, agreed to a record $1.4 billion settlement with Texas on Tuesday, over allegations that it had illegally collected facial recognition information on millions of users in violation of state law.",-0.4939,-0.2732
3,"Modern Love in miniature, featuring reader-submitted stories of no more than 100 words.",GOOGL,2024-07-30T15:26:38+0000,https://www.nytimes.com/2024/07/30/style/tiny-modern-love-stories-i-make-no-apologies.html,"“I’m in love with a woman,” my mother revealed in a handwritten letter before moving cross-country. At her going-away party, she whispered, “Do you still love me?” Her eyes searched mine for approval. We struggled during my teen years. Silence carried the hurt, hers and mine. Physically unwell, suffering after a painful divorce, she needed me. I kept her secret until she was ready to come out. I leaned close: “Mom, I knew. I’ll always love you.” The truth helped us heal. Recently, she texted a photo with a blissful smile: “This is me; I make no apologies.” — Lisa Mccarty",0.4588,0.9485
4,The governor of Michigan on why her promises to fix roads and Roe resonate with voters.,GOOGL,2024-07-30T09:05:04+0000,https://www.nytimes.com/2024/07/30/opinion/ezra-klein-podcast-gretchen-whitmer.html,"We are deep in the pivotal week of Vice President Kamala Harris’s veepstakes. It is reported that she will make her decision by Aug. 7. And as somebody who wanted to see a sort of mini-primary for Democrats — who made the argument that at the very least, there should be a sort of contest of the vice-presidential candidates and town halls and forums in an organized way — what has emerged in a disorganized way is much more like what I had hoped to see than I’d ever expected.",0.3818,0.1761
...,...,...,...,...,...,...,...
265,The coronavirus pandemic schooled the world in the essential role of global supply chains. Have we learned anything from it?,GOOGL,2024-06-02T07:00:34+0000,https://www.nytimes.com/2024/06/02/business/economy/covid-pandemic-global-supply-chains.html,Southern California appeared to be under siege from a blockade.,0.0000,0.0000
266,Ditch the dye; live with style.,GOOGL,2024-06-01T11:50:02+0000,https://www.nytimes.com/2024/06/01/books/read-like-wind-going-gray.html,"Dear readers,",0.0000,0.3818
267,Google appears to have turned off its new A.I. Overviews for a number of searches as it works to minimize errors.,GOOGL,2024-06-01T09:04:10+0000,https://www.nytimes.com/2024/06/01/technology/google-ai-overviews-rollback.html,"When Sundar Pichai, Google’s chief executive, introduced a generative artificial intelligence feature for the company’s search engine last month, he and his colleagues demonstrated the new capability with six text-based queries that the public could try out.",-0.2732,0.4767
268,"Since Google overhauled its search engine, publishers have tried to assess the danger to their brittle business models while calling for government intervention.",GOOGL,2024-06-01T09:02:28+0000,https://www.nytimes.com/2024/06/01/technology/google-ai-search-publishers.html,"When Frank Pine searched Google for a link to a news article two months ago, he encountered paragraphs generated by artificial intelligence about the topic at the top of his results. To see what he wanted, he had to scroll past them.",-0.5267,0.5994


## FinBert

In [18]:
from transformers import pipeline

# 初始化 FinBERT 模型
nlp = pipeline("sentiment-analysis", model="yiyanghkust/finbert-tone", tokenizer="yiyanghkust/finbert-tone")

def compute_finbert_sentiment(text):
    try:
        max_length = 512
        chunks = [text[i:i + max_length] for i in range(0, len(text), max_length)]
        sentiments = []
        for chunk in chunks:
            result = nlp(chunk)
            # print(result)
            label = result[0]['label']
            # 将标签转换为情感分数并添加到列表中
            if label == 'Positive':
                sentiments.append(1)
            elif label == 'Neutral':
                sentiments.append(0)
            elif label == 'Negative':
                sentiments.append(-1)
        # 计算所有块的平均情感分数
        # print(sentiments)
        return sum(sentiments) / len(sentiments) if sentiments else None
    except Exception as e:
        print(f"Error processing text: {e}")
        return 0.0

# 处理 DataFrame 中的文本
df['sentiment_title'] = df['Title'].apply(compute_finbert_sentiment)
df['sentiment_content'] = df['Content'].apply(compute_finbert_sentiment)

# 显示结果
df

Error processing text: object of type 'float' has no len()


Unnamed: 0,Title,Tag,Created Date,url,Content,sentiment_title,sentiment_content
0,"The tech giant’s revenue also grew 15 percent, but Wall Street is watching whether its investment in A.I. is paying off for its cloud computing business.",GOOGL,2024-07-30T20:51:51+0000,https://www.nytimes.com/2024/07/30/technology/microsoft-earnings-profit.html,Microsoft closed its first full fiscal year of aggressive artificial intelligence investment with a mixed bag of results for people worried about how much big tech companies are spending on A.I.,0.0,-1.0
1,"The platform first known for viral videos now attracts more viewers on TVs than Netflix, Disney+ or Amazon Prime Video.",GOOGL,2024-07-30T16:08:33+0000,https://www.nytimes.com/2024/07/30/technology/youtube-streaming-tv.html,"Two years ago, YouTube abandoned its audacious plan to beat Hollywood at its own game.",0.0,0.0
2,The parent company of Facebook and Instagram faced allegations that it had collected facial identification information on millions of users in violation of a state law.,GOOGL,2024-07-30T15:34:32+0000,https://www.nytimes.com/2024/07/30/technology/meta-texas-privacy-settlement.html,"Meta, the parent company of Facebook and Instagram, agreed to a record $1.4 billion settlement with Texas on Tuesday, over allegations that it had illegally collected facial recognition information on millions of users in violation of state law.",-1.0,-1.0
3,"Modern Love in miniature, featuring reader-submitted stories of no more than 100 words.",GOOGL,2024-07-30T15:26:38+0000,https://www.nytimes.com/2024/07/30/style/tiny-modern-love-stories-i-make-no-apologies.html,"“I’m in love with a woman,” my mother revealed in a handwritten letter before moving cross-country. At her going-away party, she whispered, “Do you still love me?” Her eyes searched mine for approval. We struggled during my teen years. Silence carried the hurt, hers and mine. Physically unwell, suffering after a painful divorce, she needed me. I kept her secret until she was ready to come out. I leaned close: “Mom, I knew. I’ll always love you.” The truth helped us heal. Recently, she texted a photo with a blissful smile: “This is me; I make no apologies.” — Lisa Mccarty",0.0,-0.5
4,The governor of Michigan on why her promises to fix roads and Roe resonate with voters.,GOOGL,2024-07-30T09:05:04+0000,https://www.nytimes.com/2024/07/30/opinion/ezra-klein-podcast-gretchen-whitmer.html,"We are deep in the pivotal week of Vice President Kamala Harris’s veepstakes. It is reported that she will make her decision by Aug. 7. And as somebody who wanted to see a sort of mini-primary for Democrats — who made the argument that at the very least, there should be a sort of contest of the vice-presidential candidates and town halls and forums in an organized way — what has emerged in a disorganized way is much more like what I had hoped to see than I’d ever expected.",1.0,0.0
...,...,...,...,...,...,...,...
265,The coronavirus pandemic schooled the world in the essential role of global supply chains. Have we learned anything from it?,GOOGL,2024-06-02T07:00:34+0000,https://www.nytimes.com/2024/06/02/business/economy/covid-pandemic-global-supply-chains.html,Southern California appeared to be under siege from a blockade.,0.0,0.0
266,Ditch the dye; live with style.,GOOGL,2024-06-01T11:50:02+0000,https://www.nytimes.com/2024/06/01/books/read-like-wind-going-gray.html,"Dear readers,",0.0,0.0
267,Google appears to have turned off its new A.I. Overviews for a number of searches as it works to minimize errors.,GOOGL,2024-06-01T09:04:10+0000,https://www.nytimes.com/2024/06/01/technology/google-ai-overviews-rollback.html,"When Sundar Pichai, Google’s chief executive, introduced a generative artificial intelligence feature for the company’s search engine last month, he and his colleagues demonstrated the new capability with six text-based queries that the public could try out.",1.0,1.0
268,"Since Google overhauled its search engine, publishers have tried to assess the danger to their brittle business models while calling for government intervention.",GOOGL,2024-06-01T09:02:28+0000,https://www.nytimes.com/2024/06/01/technology/google-ai-search-publishers.html,"When Frank Pine searched Google for a link to a news article two months ago, he encountered paragraphs generated by artificial intelligence about the topic at the top of his results. To see what he wanted, he had to scroll past them.",-1.0,0.0
