# IMDB Wordcloud

## Import Libraries

In [None]:
from pathlib import Path
import pandas as pd

## Step 2: Understanding the Dataset
Before cleaning the text let's understand the dataset. The dataset contains two columns:
- review: Contains the movie review text
- sentiment: It shows whether the review is positive or negative

In [None]:
working_dir = Path(r"D:\CODE\DATA")
datafile = working_dir / "IMDB-Dataset.csv"

df = pd.read_csv(datafile)

In [None]:
df['review'][0]
df.columns

# Step 3: Cleaning the Text Data
Before generating the word cloud, we need to clean the text data which involves:
1. Removing punctuation
2. Converting text to lowercase
3. Removing stopwords i.e common words like "the", "is", "and"

- **re.sub():** This removes punctuation and numbers
- **STOPWORDS:** These are list of common stopwords

In [None]:
import re
from wordcloud import STOPWORDS

text = ' '.join(df['review'].astype(str).tolist())
text = re.sub(r'[^A-Za-z\s]', '', text)
text = text.lower()

stopwords = set(STOPWORDS)
text = ' '.join(word for word in text.split() if word not in stopwords)

# Step 4: Generating the Word Cloud
- WordCloud(): Generates the word cloud

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

In [None]:
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

In [None]:
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')  
plt.title("IMDB Movie Reviews Word Cloud")
plt.show()

## Step 5: Customizing the Word Cloud
We can customize the word cloud with different options like:
1. Maximum number of words
2. Color scheme
3. Shape of the cloud

**max_words**: Limits the number of words  
**colormap**: Changes the color of the word cloud

In [None]:
wordcloud = WordCloud(width=800, height=400, 
                      background_color='white', 
                      max_words=100, 
                      colormap='coolwarm').generate(text)

In [None]:
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Customized IMDB Movie Reviews Word Cloud")
plt.show()


# Real life applications of Word Cloud
### Sentiment Analysis: 
Imagine we have hundreds of customer reviews. By creating two word clouds one for positive words like "great" and "friendly" and another for negative words like "late" and "broken" we can easily see what customers like or dislike.
### Social Media Analysis: 
Observing what's trending on social media by collecting hashtags and keywords, word clouds can visually highlight what's being talked about the most.
### Real-Time Data: 
In live customer chats or support systems it can instantly show common issues like "delivery delay" or "payment error" which helps teams to respond faster.