# Datacamp Personal Takeaways 
## Cleaning Data With Python
***

## 1.- Exploring your data.
Use pandas to review shape and current status of data.

```python
# Import libraries needed
import pandas as pd

df.head()
df.tail()
df.shape
df.columns
df.info()
# If column name is not strange, can use dot notation.
df.columnname.value_counts()
# See statistic info from dataset.
df.columnname.describe()

df.columnname.plot(kind="hist")
df.boxplot(column ="columnname1", by= "columnname2")
df.plot(kind= "scatter", x="columnname1", y="columname2")
```

## 2.- Tidying data for analysis.
Use Hadley Wickham's tidy data concept: 
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.

**Remember!**

This is for data cleaning, not for data viz!

### Using melt.
```python
# Melt airquality: airquality_melt
airquality_melt = pd.melt(airquality, id_vars='Date', var_name="measurement", value_name="reading")
```

### Using pivot.
```python
# Pivot airquality_melt: airquality_pivot
airquality_pivot = airquality_melt.pivot(index="Date", columns="measurement", values="reading")
```

### Reset index after pivot.
```python
# Reset the index of airquality_pivot: airquality_pivot_reset
airquality_pivot_reset = airquality_pivot.reset_index()
```

### Pivot duplicate values.
```python
# Pivot table the airquality_dup: airquality_pivot
airquality_pivot = airquality_dup.pivot_table(index="Date", columns="measurement", values="reading", aggfunc=np.mean)

# Reset the index of airquality_pivot
airquality_pivot = airquality_pivot.reset_index()
```

### Creating new columns with string slicing.
```python
# Create the 'gender' column
tb_melt['gender'] = tb_melt.variable.str[0]

# Create the 'age_group' column
tb_melt['age_group'] = tb_melt.variable.str[1:]
```

### Creating new columns with string splitting.
```python
# Create the 'str_split' column
ebola_melt['str_split'] = ebola_melt["type_country"].str.split("_")

# Create the 'type' column
ebola_melt['type'] = ebola_melt["str_split"].str.get(0)

# Create the 'country' column
ebola_melt['country'] = ebola_melt["str_split"].str.get(1)
```