#📝 Pandas Notebook 5: Text Magic
"What customers are saying about your lemonade!"

## 🎯 Today's Adventure  
1. **Text Cleaning**: Fixing messy reviews  
2. **Keyword Search**: Finding "sweet" or "sour" comments  
3. **Apply()**: Custom rating adjustments  
4. **Real Use**: Analyzing 1000+ reviews  

## 🧼 Cleaning Reviews  
Like fixing spelling mistakes in your guest book:  

In [1]:
import pandas as pd

reviews = pd.DataFrame({
    "Comment": [
        "Best lemonade EVER!!!",
        "too   sweet  ",
        "Lemonade was sour",
        "pricey but good"
    ],
    "Rating": [5, 3, 2, 4]
})

# Fix whitespace and lowercase everything
reviews["Clean_Text"] = (
    reviews["Comment"]
    .str.lower()           # "BEST" → "best"
    .str.replace(r'\s+', ' ', regex=True)  # Fix extra spaces
)
print(reviews[["Comment", "Clean_Text"]])

                 Comment             Clean_Text
0  Best lemonade EVER!!!  best lemonade ever!!!
1          too   sweet               too sweet 
2      Lemonade was sour      lemonade was sour
3        pricey but good        pricey but good


## 🔍 Finding Flavor Mentions  
"Which reviews talk about sweetness?"  

In [2]:
sweet_reviews = reviews[
    reviews["Clean_Text"].str.contains("sweet|yummy", regex=True)
]
print("\nSweet mentions:\n", sweet_reviews)


Sweet mentions:
          Comment  Rating  Clean_Text
1  too   sweet         3  too sweet 


## ✏️ Rewarding Nice Reviews  
Give +1 star for using "best" or "good":  

In [3]:
def bonus_star(text):
    if "best" in text or "good" in text:
        return 1
    return 0

reviews["Bonus"] = reviews["Clean_Text"].apply(bonus_star)
reviews["Final_Rating"] = reviews["Rating"] + reviews["Bonus"]
print("\nWith bonus stars:\n", reviews)


With bonus stars:
                  Comment  Rating             Clean_Text  Bonus  Final_Rating
0  Best lemonade EVER!!!       5  best lemonade ever!!!      1             6
1          too   sweet         3             too sweet       0             3
2      Lemonade was sour       2      lemonade was sour      0             2
3        pricey but good       4        pricey but good      1             5


## 📊 Analyzing Massive Feedback  

In [4]:
# Simulate big data
big_reviews = pd.DataFrame({
    "Review": ["great"]*500 + ["ok"]*300 + ["bad"]*200,
    "Stars": [5]*500 + [3]*300 + [1]*200
})

# Top keywords
keywords = ["great", "ok", "bad"]
counts = big_reviews["Review"].value_counts()
print("\nReview Word Counts:\n", counts.loc[keywords])


Review Word Counts:
 Review
great    500
ok       300
bad      200
Name: count, dtype: int64


## ✏️ Review Manager Practice  
1. Clean: Make all text lowercase and remove "!!!"  
2. Find reviews containing "lemon" or "price"  
3. Create "Length" column counting letters per review  

*(Solutions next cell!)*  

In [5]:
# 1
reviews["Cleaner"] = (
    reviews["Comment"].str.lower().str.replace(r'!+', '', regex=True)
)

# 2
print(reviews[reviews["Clean_Text"].str.contains("lemon|price")])

# 3
reviews["Length"] = reviews["Comment"].str.len()

                 Comment  Rating             Clean_Text  Bonus  Final_Rating  \
0  Best lemonade EVER!!!       5  best lemonade ever!!!      1             6   
2      Lemonade was sour       2      lemonade was sour      0             2   
3        pricey but good       4        pricey but good      1             5   

              Cleaner  
0  best lemonade ever  
2   lemonade was sour  
3     pricey but good  
