# Python Tutorial: Incorrect Values

This is a tutorial on cleaning up incorrect values in Python. We'll cover various methods and techniques for identifying and correcting incorrect data in a dataset.

### 1. Identifying Incorrect Values
Before we can clean up incorrect values, we need to identify them. Here are some common types of incorrect values:

- Missing values: These are usually represented as `None`, `NaN`, or empty strings.
- Outliers: Values that are significantly different from the rest of the data.
- Data format errors: Values that don't match the expected format, such as strings in a numerical column.
- Inconsistent values: Values that are inconsistent with the rest of the data, such as a temperature recorded as -500°C.

### 2. Cleaning Incorrect Values
Once we've identified the incorrect values, we can clean them using various techniques. Here are some common methods:

#### a. Removing Missing Values
We can remove rows or columns with missing values using the `dropna()` function from the pandas library.

```python
import pandas as pd

# Create a DataFrame with missing values
data = {'A': [1, 2, None, 4], 'B': ['x', 'y', None, 'z']}
df = pd.DataFrame(data)

# Drop rows with missing values
clean_df = df.dropna()
print(clean_df)
```

#### b. Handling Outliers
We can handle outliers by replacing them with a more appropriate value, such as the median or mean of the dataset.

```python
# Replace outliers with the median
median = df['A'].median()
df['A'] = df['A'].apply(lambda x: median if x > 10 else x)
print(df)
```

#### c. Correcting Data Format Errors
We can use regular expressions or built-in functions like `astype()` to correct data format errors.

```python
# Convert column to numeric
df['A'] = pd.to_numeric(df['A'], errors='coerce')
print(df)
```

#### d. Dealing with Inconsistent Values
We can manually correct inconsistent values or use heuristics to infer the correct values.

```python
# Replace inconsistent values
df['A'] = df['A'].apply(lambda x: x if x > 0 else None)
print(df)
```

### Exercises:
1. Create a DataFrame with missing values and remove them.
2. Create a DataFrame with outliers and replace them with the median.
3. Create a DataFrame with data format errors and correct them.
4. Create a DataFrame with inconsistent values and correct them.

### Solutions:
1. 
```python
import pandas as pd
data = {'A': [1, 2, None, 4], 'B': ['x', 'y', None, 'z']}
df = pd.DataFrame(data)
clean_df = df.dropna()
print(clean_df)
```

2. 
```python
import pandas as pd
data = {'A': [1, 2, 20, 4], 'B': ['x', 'y', 'z', 'w']}
df = pd.DataFrame(data)
median = df['A'].median()
df['A'] = df['A'].apply(lambda x: median if x > 10 else x)
print(df)
```

3. 
```python
import pandas as pd
data = {'A': ['1', '2', '3', '4'], 'B': ['x', 'y', 'z', 'w']}
df = pd.DataFrame(data)
df['A'] = pd.to_numeric(df['A'], errors='coerce')
print(df)
```

4. 
```python
import pandas as pd
data = {'A': [1, -2, 3, 4], 'B': ['x', 'y', 'z', 'w']}
df = pd.DataFrame(data)
df['A'] = df['A'].apply(lambda x: x if x > 0 else None)
print(df)
```

This should give you a good starting point for cleaning up incorrect values in your datasets using Python! Let me know if you have any questions or need further clarification.
