<a href="https://colab.research.google.com/github/prabhmeharbedi/Article-Sentiment-Analysis/blob/main/Article_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Sentiment analysis of articles from insights.blackcoffer.com

In [6]:
import requests
import pandas as pd
from bs4 import BeautifulSoup,NavigableString


In [7]:

# Download the file
url = "https://raw.githubusercontent.com/prabhmeharbedi/Article-Sentiment-Analysis/main/output_file.csv"
response = requests.get(url)

# Check if download was successful
if response.status_code == 200:
  # Save the content to a local file
  with open("output_file.csv", "wb") as f:
    f.write(response.content)

  # Read the local file using pandas
  data = pd.read_csv("output_file.csv")
  data.head()
  data.tail()
else:
  print("Error downloading the file")

data.head()
data.tail()

Unnamed: 0,URL_ID,URL
109,146,https://insights.blackcoffer.com/blockchain-fo...
110,147,https://insights.blackcoffer.com/the-future-of...
111,148,https://insights.blackcoffer.com/big-data-anal...
112,149,https://insights.blackcoffer.com/business-anal...
113,150,https://insights.blackcoffer.com/challenges-an...


In [8]:
#to create a folder for all the files
import os
os.makedirs('data')

In [9]:
import os
import requests
import pandas as pd
from bs4 import BeautifulSoup, NavigableString

# Assuming 'data' is your DataFrame with columns 'URL_ID' and 'URL'
# If you want to fetch data directly from the CSV file, you can use data = pd.read_csv('../input/nlp-text-analysis/Data.csv')

# Create an empty list to store dictionaries containing title and text for each article
article_data = []

for url in data['URL']:
    title = url.split('/')[3]
    file_name = title + '.html'
    file_path = './data/' + file_name

    # To scrape the data from the URL and save it to the file if the file doesn't exist in the directory
    if not os.path.exists(file_path):
        with open(file_path, 'w') as f:
            r = requests.get(url, headers={"User-Agent": "XY"})
            htmlcontent = r.content.decode()
            f.write(htmlcontent)

    # To open each file, retrieve the BeautifulSoup from the HTML content, and put it in a dictionary
    with open(file_path, 'r') as f:
        htmlcontent = f.read()

    soup = BeautifulSoup(htmlcontent, 'html.parser')
    article_content = soup.find('div', attrs={'class': 'td-post-content'})

    if article_content is None:
        continue

    article_text = ''
    for element in article_content:
        if not isinstance(element, NavigableString):
            text = element.text
            article_text += text

    # Append a dictionary containing title and text to the 'article_data' list
    article_data.append({'Title': title, 'Text': article_text})

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(article_data)

# Display the resulting DataFrame
print(df.head())


                                               Title  \
0       ai-in-healthcare-to-improve-patient-outcomes   
1    what-if-the-creation-is-taking-over-the-creator   
2  what-jobs-will-robots-take-from-humans-in-the-...   
3  will-machine-replace-the-human-in-the-future-o...   
4                 will-ai-replace-us-or-work-with-us   

                                                Text  
0  \n/* custom css */\n.tdb_single_content{\n    ...  
1  Human minds, a fascination in itself carrying ...  
2  IntroductionAI is rapidly evolving in the empl...  
3  “Anything that could give rise to smarter-than...  
4  “Machine intelligence is the last invention th...  


In [10]:
df.head()

Unnamed: 0,Title,Text
0,ai-in-healthcare-to-improve-patient-outcomes,\n/* custom css */\n.tdb_single_content{\n ...
1,what-if-the-creation-is-taking-over-the-creator,"Human minds, a fascination in itself carrying ..."
2,what-jobs-will-robots-take-from-humans-in-the-...,IntroductionAI is rapidly evolving in the empl...
3,will-machine-replace-the-human-in-the-future-o...,“Anything that could give rise to smarter-than...
4,will-ai-replace-us-or-work-with-us,“Machine intelligence is the last invention th...


Our goal in the sentiment analysis of the dataset will be to determine the follwing for each article:
1. **Positive Score:**
   - **Range:** 0 to 1
   - **Interpretation:** A measure of the positive sentiment in the text. A
   higher positive score indicates a more positive sentiment.

2. **Negative Score:**
   - **Range:** 0 to 1
   - **Interpretation:** A measure of the negative sentiment in the text. A
   higher negative score indicates a more negative sentiment.

3. **Polarity Score:**
   - **Range:** -1 to 1
   - **Interpretation:** A measure of the overall sentiment of the text.
     - Negative values (closer to -1) indicate negative sentiment.
     - Positive values (closer to 1) indicate positive sentiment.
     - 0 indicates a neutral sentiment.

4. **Subjectivity Score:**
   - **Range:** 0 to 1
   - **Interpretation:** A measure of the subjectivity of the text.
     - 0 indicates a highly objective (or factual) text.
     - 1 indicates a highly subjective (or opinionated) text.
     - Values in between indicate varying degrees of subjectivity.

These scores can be useful for understanding the emotional tone,
sentiment, and nature of the text.

In [11]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import pandas as pd
import nltk

# Assuming 'df' is the DataFrame you created with columns 'Title' and 'Text'

# Download the stopwords corpus if not already downloaded
nltk.download('stopwords')

# Download the punkt tokenizer for word_tokenize
nltk.download('punkt')

# Get the list of English stop words
stop_words = set(stopwords.words('english'))

# Create empty lists to store the processed text and titles
processed_texts = []
titles = []

# Iterate through each row in the DataFrame
for index, row in df.iterrows():
    title = row['Title']
    text = row['Text']

    # Tokenize the text
    word_tokens = word_tokenize(text)

    # Remove stop words
    filtered_tokens = [w for w in word_tokens if not w.lower() in stop_words]

    # Join the filtered tokens back into a sentence
    new_sentence = ' '.join(filtered_tokens)

    # Append the processed text and title to the lists
    processed_texts.append(new_sentence)
    titles.append(title)

# Create a new DataFrame with the processed text and titles
df_processed = pd.DataFrame({'Title': titles, 'Text': processed_texts})

# Display the resulting DataFrame
print(df_processed.head())


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


                                               Title  \
0       ai-in-healthcare-to-improve-patient-outcomes   
1    what-if-the-creation-is-taking-over-the-creator   
2  what-jobs-will-robots-take-from-humans-in-the-...   
3  will-machine-replace-the-human-in-the-future-o...   
4                 will-ai-replace-us-or-work-with-us   

                                                Text  
0  / * custom css * / .tdb_single_content { margi...  
1  Human minds , fascination carrying potential t...  
2  IntroductionAI rapidly evolving employment sec...  
3  “ Anything could give rise smarter-than-human ...  
4  “ Machine intelligence last invention humanity...  


In [12]:
df_processed.head()

Unnamed: 0,Title,Text
0,ai-in-healthcare-to-improve-patient-outcomes,/ * custom css * / .tdb_single_content { margi...
1,what-if-the-creation-is-taking-over-the-creator,"Human minds , fascination carrying potential t..."
2,what-jobs-will-robots-take-from-humans-in-the-...,IntroductionAI rapidly evolving employment sec...
3,will-machine-replace-the-human-in-the-future-o...,“ Anything could give rise smarter-than-human ...
4,will-ai-replace-us-or-work-with-us,“ Machine intelligence last invention humanity...


In [13]:
from nltk.sentiment import SentimentIntensityAnalyzer
import pandas as pd
import nltk
df3 = df_processed.copy()
# Assuming 'df_processed' is the DataFrame with columns 'Title' and 'Text'

# Download the VADER lexicon for sentiment analysis if not already downloaded
nltk.download('vader_lexicon')

# Initialize the SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()

# Create new lists to store the positive and negative scores
positive_scores = []
negative_scores = []

# Iterate through each row in the DataFrame
for index, row in df_processed.iterrows():
    text = row['Text']

    # Calculate the sentiment scores
    sentiment_scores = sia.polarity_scores(text)

    # Append the positive and negative scores to the respective lists
    positive_scores.append(sentiment_scores['pos'])
    negative_scores.append(sentiment_scores['neg'])

# Create new columns 'Positive_Score' and 'Negative_Score' in the DataFrame
df3['Positive_Score'] = positive_scores
df3['Negative_Score'] = negative_scores

# Display the resulting DataFrame
print(df_processed.head())



[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


                                               Title  \
0       ai-in-healthcare-to-improve-patient-outcomes   
1    what-if-the-creation-is-taking-over-the-creator   
2  what-jobs-will-robots-take-from-humans-in-the-...   
3  will-machine-replace-the-human-in-the-future-o...   
4                 will-ai-replace-us-or-work-with-us   

                                                Text  
0  / * custom css * / .tdb_single_content { margi...  
1  Human minds , fascination carrying potential t...  
2  IntroductionAI rapidly evolving employment sec...  
3  “ Anything could give rise smarter-than-human ...  
4  “ Machine intelligence last invention humanity...  


In [14]:
df3.head()

Unnamed: 0,Title,Text,Positive_Score,Negative_Score
0,ai-in-healthcare-to-improve-patient-outcomes,/ * custom css * / .tdb_single_content { margi...,0.157,0.047
1,what-if-the-creation-is-taking-over-the-creator,"Human minds , fascination carrying potential t...",0.204,0.094
2,what-jobs-will-robots-take-from-humans-in-the-...,IntroductionAI rapidly evolving employment sec...,0.153,0.079
3,will-machine-replace-the-human-in-the-future-o...,“ Anything could give rise smarter-than-human ...,0.23,0.061
4,will-ai-replace-us-or-work-with-us,“ Machine intelligence last invention humanity...,0.199,0.062


In [15]:
data.at[0,'URL']

'https://insights.blackcoffer.com/ai-in-healthcare-to-improve-patient-outcomes/'

In [16]:
from textblob import TextBlob

df4 = df3.copy()
# Create new lists to store the polarity and subjectivity scores
polarity_scores = []
subjectivity_scores = []

# Iterate through each row in the DataFrame
for index, row in df_processed.iterrows():
    text = row['Text']

    # Calculate the polarity and subjectivity scores using TextBlob
    blob = TextBlob(text)
    polarity_score = blob.sentiment.polarity
    subjectivity_score = blob.sentiment.subjectivity

    # Append the scores to the respective lists
    polarity_scores.append(polarity_score)
    subjectivity_scores.append(subjectivity_score)

# Create new columns 'Polarity_Score' and 'Subjectivity_Score' in the DataFrame
df4['Polarity_Score'] = polarity_scores
df4['Subjectivity_Score'] = subjectivity_scores

# Display the resulting DataFrame
print(df4.head())


                                               Title  \
0       ai-in-healthcare-to-improve-patient-outcomes   
1    what-if-the-creation-is-taking-over-the-creator   
2  what-jobs-will-robots-take-from-humans-in-the-...   
3  will-machine-replace-the-human-in-the-future-o...   
4                 will-ai-replace-us-or-work-with-us   

                                                Text  Positive_Score  \
0  / * custom css * / .tdb_single_content { margi...           0.157   
1  Human minds , fascination carrying potential t...           0.204   
2  IntroductionAI rapidly evolving employment sec...           0.153   
3  “ Anything could give rise smarter-than-human ...           0.230   
4  “ Machine intelligence last invention humanity...           0.199   

   Negative_Score  Polarity_Score  Subjectivity_Score  
0           0.047        0.062695            0.502872  
1           0.094        0.070871            0.404217  
2           0.079        0.085447            0.477847  
3     

In [17]:
df4.head()

Unnamed: 0,Title,Text,Positive_Score,Negative_Score,Polarity_Score,Subjectivity_Score
0,ai-in-healthcare-to-improve-patient-outcomes,/ * custom css * / .tdb_single_content { margi...,0.157,0.047,0.062695,0.502872
1,what-if-the-creation-is-taking-over-the-creator,"Human minds , fascination carrying potential t...",0.204,0.094,0.070871,0.404217
2,what-jobs-will-robots-take-from-humans-in-the-...,IntroductionAI rapidly evolving employment sec...,0.153,0.079,0.085447,0.477847
3,will-machine-replace-the-human-in-the-future-o...,“ Anything could give rise smarter-than-human ...,0.23,0.061,0.134173,0.491477
4,will-ai-replace-us-or-work-with-us,“ Machine intelligence last invention humanity...,0.199,0.062,0.0267,0.502438


In [18]:
df4.to_csv('convoproj.csv')