BBC news Dataset from Kaggle - https://www.kaggle.com/datasets/gpreda/bbc-news

## Loading and Displaying the BBC News Dataset

In this section, we will load the BBC News dataset from a CSV file and display the first few rows to get an initial understanding of the data.

In [7]:
import pandas as pd

#Load and display dataset

file_path = '/content/bbc_news.csv'
df = pd.read_csv(file_path)


df.head()


Unnamed: 0,title,pubDate,guid,link,description
0,Ukraine: Angry Zelensky vows to punish Russian...,"Mon, 07 Mar 2022 08:01:56 GMT",https://www.bbc.co.uk/news/world-europe-60638042,https://www.bbc.co.uk/news/world-europe-606380...,The Ukrainian president says the country will ...
1,War in Ukraine: Taking cover in a town under a...,"Sun, 06 Mar 2022 22:49:58 GMT",https://www.bbc.co.uk/news/world-europe-60641873,https://www.bbc.co.uk/news/world-europe-606418...,"Jeremy Bowen was on the frontline in Irpin, as..."
2,Ukraine war 'catastrophic for global food',"Mon, 07 Mar 2022 00:14:42 GMT",https://www.bbc.co.uk/news/business-60623941,https://www.bbc.co.uk/news/business-60623941?a...,One of the world's biggest fertiliser firms sa...
3,Manchester Arena bombing: Saffie Roussos's par...,"Mon, 07 Mar 2022 00:05:40 GMT",https://www.bbc.co.uk/news/uk-60579079,https://www.bbc.co.uk/news/uk-60579079?at_medi...,The parents of the Manchester Arena bombing's ...
4,Ukraine conflict: Oil price soars to highest l...,"Mon, 07 Mar 2022 08:15:53 GMT",https://www.bbc.co.uk/news/business-60642786,https://www.bbc.co.uk/news/business-60642786?a...,Consumers are feeling the impact of higher ene...


### 2.Sentiment Analysis on BBC News Dataset

The next step is sentiment analysis on the BBC News dataset. It involves the following steps:

1. **Import Libraries**: Load necessary libraries for data manipulation and sentiment analysis.
2. **Define Functions**: Create functions to clean text, analyze sentiment, and categorize sentiment polarity.
3. **Clean Descriptions**: Apply text cleaning functions to the dataset.
4. **Analyze Sentiment**: Calculate sentiment polarity and subjectivity, and categorize the results.
5. **Save Results**: Export the processed data with sentiment analysis results to a CSV file.

The final output is a CSV file containing the original and cleaned descriptions, along with their sentiment scores and categories.


In [8]:
import pandas as pd
from textblob import TextBlob
import re

#Function to clean text
def clean_text(text):
    text = re.sub(r'\s+', ' ', text)  # Remove extra spaces
    text = re.sub(r'\[.*?\]', '', text)  # Remove text in square brackets
    text = re.sub(r'https?://\S+|www\.\S+', '', text)  # Remove URLs
    text = re.sub(r'<.*?>+', '', text)  # Remove HTML tags
    text = re.sub(r'[^A-Za-z0-9\s]', '', text)  # Remove special characters
    text = text.lower()  # Convert to lowercase
    return text

#Function to get sentiment polarity and subjectivity
def get_sentiment(text):
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    subjectivity = blob.sentiment.subjectivity
    return polarity, subjectivity

#Function to categorize polarity inorder to use in visualization
def categorize_polarity(polarity):
    if polarity > 0:
        return 'Positive'
    elif polarity < 0:
        return 'Negative'
    else:
        return 'Neutral'

#Clean the description column
df['cleaned_description'] = df['description'].apply(clean_text)

#Apply the sentiment function to the cleaned descriptions
df['polarity'], df['subjectivity'] = zip(*df['cleaned_description'].apply(get_sentiment))

#Categorize polarity
df['sentiment_category'] = df['polarity'].apply(categorize_polarity)

#Save the sentiment analysis results to a CSV file
output_csv_path = 'sentiment_analysis_results.csv'
df[['title', 'cleaned_description', 'polarity', 'subjectivity', 'sentiment_category']].to_csv(output_csv_path, index=False)

print(f"Sentiment analysis results saved to {output_csv_path}")


Sentiment analysis results saved to sentiment_analysis_results.csv


### 3.Display the Processed Results

In this step, we display the first few rows of the processed dataset, including the original and cleaned descriptions, sentiment polarity, subjectivity, and sentiment category.


In [10]:
#Display the processed results

df[['title', 'cleaned_description', 'polarity', 'subjectivity', 'sentiment_category']].head()



Unnamed: 0,title,cleaned_description,polarity,subjectivity,sentiment_category
0,Ukraine: Angry Zelensky vows to punish Russian...,the ukrainian president says the country will ...,0.0,0.0,Neutral
1,War in Ukraine: Taking cover in a town under a...,jeremy bowen was on the frontline in irpin as ...,0.0,0.0,Neutral
2,Ukraine war 'catastrophic for global food',one of the worlds biggest fertiliser firms say...,0.0,0.0,Neutral
3,Manchester Arena bombing: Saffie Roussos's par...,the parents of the manchester arena bombings y...,-0.075,0.05,Negative
4,Ukraine conflict: Oil price soars to highest l...,consumers are feeling the impact of higher ene...,0.25,0.5,Positive
