### Using NLP for Text Data Quality
**Objective**: Enhance text data quality using NLP techniques.

**Task**: Spelling Corrections

**Steps**:
1. Data Set: Import a dataset containing text reviews with spelling errors.
2. Apply Corrections: Use a spell-checker from an NLP library to correct spelling mistakes.
3. Verify Improvements: Review the corrections to ensure data quality improvement.

In [1]:
# Install required library
!pip install -q textblob

# Import necessary modules
from textblob import TextBlob
import pandas as pd
import unittest

# Sample dataset with spelling errors
data = {
    'reviews': [
        "This prodct is amazng and worth evry penny!",
        "Batery life is terible, not recmmended.",
        "Exellent performnce and beautifull desgn.",
        "The screen is dull and the camra is bad."
    ]
}
df = pd.DataFrame(data)

# Improved function to correct spelling with better error handling
def correct_spelling(text):
    try:
        if not isinstance(text, str):
            raise TypeError("Input must be a string.")
        return str(TextBlob(text).correct())
    except TypeError as te:
        print(f"TypeError: {te} — Skipping value: {text}")
        return ''
    except Exception as e:
        print(f"Unexpected error while processing '{text}': {e}")
        return ''

# Apply correction to the dataset
df['corrected_reviews'] = df['reviews'].apply(correct_spelling)

# Display original and corrected reviews
for original, corrected in zip(df['reviews'], df['corrected_reviews']):
    print(f"Original : {original}")
    print(f"Corrected: {corrected}")
    print('-' * 60)

# Unit test for the spelling correction function
class TestSpellingCorrection(unittest.TestCase):
    
    def test_typical_sentence(self):
        text = "This is a grreat phne!"
        corrected = correct_spelling(text)
        self.assertIn("great", corrected)
        self.assertIn("phone", corrected)
    
    def test_empty_string(self):
        self.assertEqual(correct_spelling(""), "")
    
    def test_non_string_input(self):
        self.assertEqual(correct_spelling(12345), "")
    
    def test_large_input(self):
        text = "amazng " * 1000  # Simulate long input
        corrected = correct_spelling(text)
        self.assertTrue(len(corrected) > 0)

# Run unit tests
if __name__ == "__main__":
    unittest.main(argv=[''], exit=False)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


.

Original : This prodct is amazng and worth evry penny!
Corrected: His product is amazing and worth very penny!
------------------------------------------------------------
Original : Batery life is terible, not recmmended.
Corrected: Watery life is terrible, not recommended.
------------------------------------------------------------
Original : Exellent performnce and beautifull desgn.
Corrected: Excellent performance and beautiful design.
------------------------------------------------------------
Original : The screen is dull and the camra is bad.
Corrected: The screen is dull and the camera is bad.
------------------------------------------------------------


..F
FAIL: test_typical_sentence (__main__.TestSpellingCorrection)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/ipykernel_60129/1975256686.py", line 49, in test_typical_sentence
    self.assertIn("phone", corrected)
AssertionError: 'phone' not found in 'His is a great pine!'

----------------------------------------------------------------------
Ran 4 tests in 0.279s

FAILED (failures=1)


TypeError: Input must be a string. — Skipping value: 12345
