## Clean Data
Purpose of Data Cleaning : Data cleaning aims to improve data quality by identifying and correcting errors, inconsistencies, and missing values.

#### Common Practices :

- Handling NULL Values : Replace NULL values with a default value or fill them using statistical measures.

In [None]:
# Handling NULL values in a DataFrame using Pandas
df.fillna(0, inplace=True)  # Replace NULL values with 0

- Handling Special Characters : Remove or replace special characters that may cause issues during analysis.

In [None]:
# Removing special characters from a column in a DataFrame using Pandas
df['column_name'] = df['column_name'].str.replace('[^a-zA-Z0-9]', '')

- Trimming Spaces : Remove leading and trailing spaces from text data.

In [None]:
# Trimming spaces from a column in a DataFrame using Pandas
df['column_name'] = df['column_name'].str.strip()

- Inconsistent Formatting : Standardize data formats to ensure consistency.

In [None]:
# Converting text data to lowercase in a column using Pandas
df['column_name'] = df['column_name'].str.lower()

- Removing Duplicates : Identify and remove duplicate records from the dataset.

In [None]:
# Removing duplicates from a DataFrame using Pandas
df.drop_duplicates(inplace=True)

- Imputing Data : Fill missing values with estimated or calculated values.

In [None]:
# Imputing missing values with mean in a DataFrame using Pandas
mean_value = df['column_name'].mean()
df['column_name'].fillna(mean_value, inplace=True)

#### Validating Data

- Definition : Data validation ensures that the data meets certain quality standards or criteria.

In [None]:
# Validating data to check for NULL values in a DataFrame using Pandas
if df.isnull().values.any():
    print("Data contains NULL values.")
else:
    print("Data is valid.")