### Using NLP for Text Data Quality
**Objective**: Enhance text data quality using NLP techniques.

**Task**: Handling Noisy Text Data

**Steps**:
1. Data Set: Obtain a dataset with customer reviews containing noise (e.g., random characters).
2. Clean Data: Use regex patterns to clean the noise from text data.
3. Evaluate: Compare the text before and after cleaning for noise.

In [1]:
# write your code from here
import pandas as pd
import re

# Step 1: Create a sample dataset with noisy customer reviews
data = {
    'ReviewID': [1, 2, 3, 4, 5],
    'CustomerReview': [
        "Loooved it!!! 😍😍 will buy again...!!!$$$$",
        "Terrible serv!ce@#%^. Never @coming again...",
        "5 stars!!!*****     Awesome product \n\n\n",
        "w0rst experienc3 eveR...!!! :(",
        "Th1s pr0duct is S0000 G00D!!! <3 <3 <3"
    ]
}

df = pd.DataFrame(data)

# Step 2: Define cleaning function using regex
def clean_text(text):
    text = text.lower()                             # Convert to lowercase
    text = re.sub(r'[^a-z0-9\s]', '', text)         # Remove punctuation/special chars
    text = re.sub(r'\d+', '', text)                 # Remove numbers
    text = re.sub(r'\s+', ' ', text).strip()        # Remove extra spaces
    return text

# Step 3: Apply cleaning function
df['CleanedReview'] = df['CustomerReview'].apply(clean_text)

# Step 4: Compare original and cleaned reviews
print("Original vs Cleaned Reviews:\n")
for idx, row in df.iterrows():
    print(f"Original: {row['CustomerReview']}")
    print(f"Cleaned : {row['CleanedReview']}")
    print("-" * 60)

Original vs Cleaned Reviews:

Original: Loooved it!!! 😍😍 will buy again...!!!$$$$
Cleaned : loooved it will buy again
------------------------------------------------------------
Original: Terrible serv!ce@#%^. Never @coming again...
Cleaned : terrible servce never coming again
------------------------------------------------------------
Original: 5 stars!!!*****     Awesome product 



Cleaned : stars awesome product
------------------------------------------------------------
Original: w0rst experienc3 eveR...!!! :(
Cleaned : wrst experienc ever
------------------------------------------------------------
Original: Th1s pr0duct is S0000 G00D!!! <3 <3 <3
Cleaned : ths prduct is s gd
------------------------------------------------------------
