### Cleaning data with pandas!

What we will learn in this notebook:

- How to drop empty rows and columns
- How to drop rows with null values
- How to replace null values
- How to find and replace placeholder values
- How to change column types
- How to save a dataframe to a csv

Links to documentation for the functions introduced:

- [`.isna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html)
- [`.dropna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html?highlight=dropna)
- [`.fillna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html)
- [`.astype()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html)
- [`.to_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)

First, let's import pandas and open `../data/simple_data.csv` as a dataframe. Then, let's look at the data!

#### Checking for null values

Uh-oh. At first glance, this dataset looks pretty bad! We can check to see if there's missing data by using the `.is_na()` function. You can check it on the whole dataset, a single column, or a single row. 

Let's check if there's missing data in the whole dataset using the `.isna()` function.

That helps us, kiiind of, but. Let's chain the `.sum()` function after the `.isna()` to get some aggregate counts.

Let's check if there's missing data in row 2 using the `iloc[]` function we learned last week.

This data is a mess! Let's clean it up.

#### Dropping empty columns

There's a column, aptly titled `empty_column`, that has no values at all! We can drop it using the `.dropna()` function with `axis` and `how`.

#### Dropping empty rows

It also looks like there's a row with no data at all! We can drop that using `.dropna()` too, but this time we can set our axis to `rows`.

#### Dropping rows with missing values

Sometimes data can be incomplete. We can drop rows with incomplete data by using `.dropna()` in a slightly different way from above. 

Let's remove rows that have `NaN` or a null value, in the `missing_values` column.

#### Filling missing values

Sometimes we don't want to drop rows just because they're missing values. Sometimes we feel like we can adequately replace NaNs with an actual value.

Let's replace with the `NaN` values in the `missing_values_2` column with `.fillna()`!

#### Dropping placeholder values by condition

Sometimes people put things that shouldn't be in the data at all. In our case, the `NO_DATA` entries in `placeholder_values` are not only unecessary, but shouldn't be there at all! They're basically a `NaN` but worse--we can't drop rows with them using `.dropna()` like we were able to for our other columns.

Instead, we can ask for a simple slice of the data using conditionals!

#### Replacing placeholder values

Other times we want to replace placeholder values. To do this, we can use the `.replace()` function.

#### Fixing column types

Our data's looking much better! There's one last thing we need to do to make sure that it's ready for analysis. Let's check the types of each column. You can do this by appending  `.dtypes` to your dataframe's variable name.

Notice there are two different data types being used in this dataframe: `float64`, and `object`. Different types have different rules. These rules can help us create guardrails for ourselves. 

For instance, we probably want to be able to do math on all the numbers in the `placeholder_values` and `placeholder_values_2` columns. So let's fix that! Use the `.astype()` function to convert `placeholder_values` and `placeholder_values_2` from an `str` to a `float`.

Now check the types of each column again by using `.dtypes`. Notice a change?

participant_id          float64
missing_values          float64
missing_values_2        float64
placeholder_values      float64
placeholder_values_2    float64
dtype: object

One last thing! Right now, the column `participant_id` is a `float64`. We usually don't want or expect to do much on identification numbers, so let's convert that to a `str`.

#### Saving the data

Wow, we just went through a lot! Let's save this data so we don't have to go through this sort of trouble again! Just use the `.to_csv()` function.