<a href="https://colab.research.google.com/github/jessicasmelton/YTCommentAnalysis/blob/main/Step%204%3A%20Data%20Visualizations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Data Visualizations**

**Sentiment Distribution Bar Chart:**

This program generates a bar chart displaying the distribution of sentiment categories (positive, negative, and neutral) in YouTube comments. The sentiment analysis results are read from a CSV file, and the counts of each sentiment category are displayed both in the console and in a bar chart.


---

**Output**

* Prints the number of positive, negative, and neutral comments.

* A visual representation of the sentiment distribution in the comments. The bar chart shows the number of comments in each sentiment category (positive, negative, neutral).


---
**Notes**

* The program assumes the CSV file contains a column named Sentiment. If the column is missing, the program will not function correctly.

* You can customize the appearance of the bar chart by modifying parameters such as figsize, color, title, xlabel, and ylabel in the plt.figure and plt.plot functions.


In [None]:
# Sentiment Distribution Bar Chart

# Import necessary libraries
import pandas as pd  # Library for data manipulation and analysis
import matplotlib.pyplot as plt  # Library for plotting data

# Load the data from the specified CSV file
file_path = 'INSERT YOUR FILE PATH HERE.csv'  # Replace with the correct path if needed
df = pd.read_csv(file_path)

# Ensure the 'Sentiment' column is properly categorized and count the number of each sentiment
sentiment_counts = df['Sentiment'].value_counts()

# Display the counts of each sentiment category
print("Number of Positive Comments:", sentiment_counts.get('Positive', 0))
print("Number of Negative Comments:", sentiment_counts.get('Negative', 0))
print("Number of Neutral Comments:", sentiment_counts.get('Neutral', 0))

# Create a bar chart for sentiment distribution
plt.figure(figsize=(10, 6))
sentiment_counts.plot(kind='bar', color=['green', 'gray', 'red'])  # Create bar chart with specified colors
plt.title('Sentiment Distribution of All Comments')  # Set the title of the chart
plt.xlabel('Sentiment')  # Set the x-axis label
plt.ylabel('Number of Comments')  # Set the y-axis label
plt.xticks(rotation=0)  # Set the rotation of x-axis labels
plt.show()  # Display the chart

**Sentiment Polarity Over Time Line Chart:**

This program generates a line chart showing the average sentiment polarity of YouTube comments over time. The sentiment analysis results are read from a CSV file, and the average sentiment polarity is calculated for each month. The resulting data is visualized in a line chart to observe trends over time.


---

**Output**

* A visual representation of the average sentiment polarity of comments over time. The line chart shows trends in sentiment polarity, with data points representing the average polarity for each month.

* No specific console output, as the focus is on the visual representation of the data.


---

**Notes**

* The program assumes the CSV file contains columns named Date Published and Polarity. If these columns are missing, the program will not function correctly.

* You can customize the appearance of the line chart by modifying parameters such as figsize, title, xlabel, ylabel, and grid in the plt.figure and plt.plot functions.

In [None]:
# Sentiment Polarity Over Time Line Chart

# Import necessary libraries
import pandas as pd  # Library for data manipulation and analysis
import matplotlib.pyplot as plt  # Library for plotting data

# Load the data from the specified CSV file
file_path = 'INSERT YOUR FILE PATH HERE.csv'  # Replace with the correct path if needed
df = pd.read_csv(file_path)

# Ensure the 'Date Published' column is in datetime format
df['Date Published'] = pd.to_datetime(df['Date Published'])

# Extract the month and year for each comment
df['Month'] = df['Date Published'].dt.to_period('M')

# Calculate the average sentiment polarity for each month
monthly_sentiment = df.groupby('Month')['Polarity'].mean()

# Create the line chart to visualize average sentiment polarity over time
plt.figure(figsize=(12, 6))  # Set the size of the figure
monthly_sentiment.plot(kind='line', marker='o')  # Create a line plot with markers for each data point
plt.title('Average Sentiment Polarity Over Time')  # Set the title of the chart
plt.xlabel('Month')  # Set the x-axis label
plt.ylabel('Average Sentiment Polarity')  # Set the y-axis label
plt.grid(True)  # Enable grid for better readability
plt.show()  # Display the chart

**Text Cloud of Frequent Words:**

This program generates a word cloud of the most frequent words in YouTube comments. A word cloud visually represents the frequency of words in a text, with more frequent words displayed in larger font sizes. This visualization helps identify common themes and keywords in the comments.


---


**Output**

* A visual representation of the most frequent words in the comments. Words that appear more frequently in the comments are displayed in larger font sizes.

---

**Notes**

* The program assumes the CSV file contains a column named Comment Text. If the column is missing, the program will not function correctly.

* You can customize the appearance of the word cloud by modifying parameters such as width, height, and background_color in the WordCloud function.

In [None]:
# Text Cloud of Frequent Words

# Import necessary libraries
import pandas as pd  # Library for data manipulation and analysis
from wordcloud import WordCloud  # Library for generating word clouds
import matplotlib.pyplot as plt  # Library for plotting data

# Load the data from the specified CSV file
file_path = 'INSERT YOUR FILE PATH HERE.csv'  # Replace with the correct path if needed
df = pd.read_csv(file_path)

# Concatenate all comments into a single string
all_comments_text = ' '.join(df['Comment Text'])

# Generate the word cloud from the concatenated comments
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(all_comments_text)

# Display the generated word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')  # Display the word cloud with bilinear interpolation
plt.axis('off')  # Hide the axes
plt.title('Most Frequent Words in All Comments')  # Add a title to the plot
plt.show()  # Show the plot