# Python Tutorial: Data Cleanup

Cleaning up data is a crucial step in data analysis and preparation. In Python, there are several libraries and techniques available to perform data cleanup effectively. Let's walk through a tutorial covering various aspects of data cleanup with examples, exercises, and solutions.

### 1. Removing duplicates:
Duplicate data can skew analysis results and should be removed. Python provides several ways to identify and remove duplicates.

#### Example:
```python
data = [1, 2, 3, 4, 5, 1, 2, 3]

# Remove duplicates
clean_data = list(set(data))

print(clean_data)
```

#### Exercise:
Given the list `data = ['apple', 'banana', 'apple', 'orange', 'banana']`, remove duplicates and print the cleaned list.

#### Solution:
```python
data = ['apple', 'banana', 'apple', 'orange', 'banana']

# Remove duplicates
clean_data = list(set(data))

print(clean_data)
```

### 2. Handling missing values:
Missing values are common in real-world datasets and need to be handled appropriately to avoid errors in analysis.

#### Example:
```python
import pandas as pd

data = {'A': [1, 2, None, 4], 'B': [5, None, 7, 8]}
df = pd.DataFrame(data)

# Drop rows with missing values
clean_df = df.dropna()

print(clean_df)
```

#### Exercise:
Given the DataFrame `df` with missing values, fill the missing values with the mean of each column.

#### Solution:
```python
# Fill missing values with column means
clean_df = df.fillna(df.mean())

print(clean_df)
```

### 3. Removing outliers:
Outliers can significantly affect analysis results. Detecting and removing outliers is essential for accurate analysis.

#### Example:
```python
import numpy as np

data = np.array([1, 2, 3, 100, 5, 6, 7, 8])

# Remove outliers based on Z-score
threshold = 2
mean = np.mean(data)
std_dev = np.std(data)
z_scores = np.abs((data - mean) / std_dev)
clean_data = data[z_scores < threshold]

print(clean_data)
```

#### Exercise:
Given the array `data = np.array([1, 2, 3, 100, 5, 6, 7, 8])`, remove outliers using the Interquartile Range (IQR) method.

#### Solution:
```python
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
lower_bound = q1 - (1.5 * iqr)
upper_bound = q3 + (1.5 * iqr)
clean_data = data[(data >= lower_bound) & (data <= upper_bound)]

print(clean_data)
```

### Conclusion:
Data cleanup is an essential step in preparing data for analysis. In this tutorial, we covered methods for removing duplicates, handling missing values, and removing outliers in Python. Practice these techniques with various datasets to become proficient in data cleanup.