### Using NLP for Text Data Quality
**Objective**: Enhance text data quality using NLP techniques.

**Task**: Removing Stopwords

**Steps**:
1. Data Set: Use a dataset of text product descriptions.
2. Stopword Removal: Utilize an NLP library (e.g., NLTK) to remove stopwords from the
descriptions.
3. Assess Impact: Examine the effectiveness by analyzing word frequency before and after
removal.

In [None]:
# write your code from here
import pandas as pd
import nltk
from nltk.corpus import stopwords
from collections import Counter

# Step 0: Download stopwords if not already downloaded
nltk.download('stopwords')

# Step 1: Sample dataset of product descriptions
data = {
    'description': [
        "This is an excellent product with great value and features.",
        "The product is not only affordable but also very reliable.",
        "Experience the best quality with this new item on the market.",
        "A must-have item for your daily use, easy to handle and durable.",
        "Affordable price with premium quality and excellent customer support."
    ]
}
df = pd.DataFrame(data)

# Step 2: Remove stopwords
stop_words = set(stopwords.words('english'))

def remove_stopwords(text):
    tokens = text.lower().split()
    filtered = [word for word in tokens if word not in stop_words]
    return ' '.join(filtered)

df['cleaned_description'] = df['description'].apply(remove_stopwords)

# Step 3: Analyze word frequency before and after removal
def get_word_freq(text_series):
    all_words = ' '.join(text_series).split()
    return Counter(all_words)

freq_before = get_word_freq(df['description'])
freq_after = get_word_freq(df['cleaned_description'])

print("Word Frequencies BEFORE stopword removal:\n", freq_before.most_common(10))
print("\nWord Frequencies AFTER stopword removal:\n", freq_after.most_common(10))
