# Text Normalization

Text normalization is the process of transforming text into a standard format. This is important for improving the accuracy of text processing tasks. Common normalization techniques include converting text to lowercase, removing punctuation, and handling contractions.

## Why Normalize Text?

Normalization helps in:
- Reducing variability in the text.
- Ensuring consistency across different text inputs.
- Improving the performance of downstream NLP tasks.

## Common Text Normalization Techniques

1. **Lowercasing**: Convert all characters in the text to lowercase to ensure uniformity.
2. **Removing Punctuation**: Strip punctuation marks from the text to focus on the words.
3. **Handling Contractions**: Expand contractions to their full forms (e.g., "don't" to "do not").
4. **Removing Whitespace**: Trim unnecessary whitespace from the text.

### Lowercasing

In [1]:
# Define a sample text
text = "Hello, I am Farzad Asgari, and welcome to the NLPy Course. We WILL learn a LOT about NLP!"

# Convert the text to lowercase
normalized_text = text.lower()

# Print the normalized text
print("Lowercased Text:", normalized_text)

Lowercased Text: hello, i am farzad asgari, and welcome to the nlpy course. we will learn a lot about nlp!


### Removing Punctuation

In [2]:
import string

# Define a sample text
text = "Hello, I am Farzad Asgari, and welcome to the NLPy course. We will learn a lot about NLP!"

# Remove punctuation using Python's string module
normalized_text = text.translate(str.maketrans('', '', string.punctuation))

# Print the text without punctuation
print("Text Without Punctuation:", normalized_text)

Text Without Punctuation: Hello I am Farzad Asgari and welcome to the NLPy course We will learn a lot about NLP


### Handling Contractions
To handle contractions, you can use the contractions library. If it's not installed, you can use `!pip install contractions` to install it.

In [3]:
!pip install contractions



In [4]:
import contractions

# Define a sample text with contractions
text = "Hello, I am Farzad Asgari, and welcome to the NLPy course. We won't learn much without practice."

# Expand contractions
expanded_text = contractions.fix(text)

# Print the expanded text
print("Expanded Contractions:", expanded_text)

Expanded Contractions: Hello, I am Farzad Asgari, and welcome to the NLPy course. We will not learn much without practice.


### Removing Whitespace

In [5]:
# Define a sample text with extra whitespace
text = "   Hello, I am  Farzad Asgari, and welcome to the NLPy course. We will learn a lot about NLP!   "

# Remove leading and trailing whitespace
normalized_text = text.strip()

# Print the text without leading and trailing whitespace
print("Text Without Leading and Trailing Whitespace:", normalized_text)

Text Without Leading and Trailing Whitespace: Hello, I am  Farzad Asgari, and welcome to the NLPy course. We will learn a lot about NLP!


## Combining Normalization Steps
You can combine multiple normalization techniques in a single function to standardize text effectively.

In [6]:
import string
import contractions

def normalize_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    # Expand contractions
    text = contractions.fix(text)
    # Remove leading and trailing whitespace
    text = text.strip()
    return text

# Define a sample text
text = "   Hello, and welcome to the NLPy Course! We WON'T learn much without practice.   "

# Apply the normalization function
normalized_text = normalize_text(text)

# Print the normalized text
print("Normalized Text:", normalized_text)

Normalized Text: hello and welcome to the nlpy course we will not learn much without practice
