# Sentiment Analysis Project

This notebook is used for exploratory data analysis, visualization, and initial model training and evaluation for the sentiment analysis project.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load the dataset
data = pd.read_csv('../data/dataset.csv')

# Display the first few rows of the dataset
data.head()

In [3]:
# Check for missing values
data.isnull().sum()

In [4]:
# Visualize the distribution of sentiment labels
plt.figure(figsize=(8, 6))
sns.countplot(x='label', data=data)
plt.title('Distribution of Sentiment Labels')
plt.xlabel('Sentiment Label')
plt.ylabel('Count')
plt.show()

In [5]:
# Data Preprocessing
from src.data_preprocessing import preprocess_data

# Preprocess the text data
processed_data = preprocess_data(data['text'])

# Display the first few processed texts
processed_data.head()

In [6]:
# Model Training
from src.model_training import train_model

# Train the sentiment analysis model
model = train_model(processed_data, data['label'])

In [7]:
# Save the trained model
import joblib
joblib.dump(model, 'sentiment_model.pkl')

## Conclusion

In this notebook, we performed exploratory data analysis, visualized the sentiment distribution, preprocessed the text data, and trained a sentiment analysis model. Further steps include evaluating the model and making predictions.