# Data Cleaning

## Modifying Data Values

In [11]:
import pandas as pd

**Cleaning Functions**

**Setting Individual Values**

In [6]:
my_list = [1, 2, 3]
my_list[2]

3

In [7]:
my_list[2] = -42
my_list

[1, 2, -42]

In other words, when we write `my_list[2]` on the *left* side of the assignment operator (a single equals sign), then whatever we put on the right side of the assignment operator is being assigned *into the entry with index 2* of the list. 

As you may recall, this same logic can also be extended to two dimensions in `numpy` arrays. Consider the following: 

In [9]:
import numpy as np
my_array = np.array([[1, 2], [3, 4]])
my_array

array([[1, 2],
       [3, 4]])

In [10]:
my_array[1,1] = -42
my_array

array([[  1,   2],
       [  3, -42]])

And finally, we can also extend this logic to our `pandas` DataFrames. For example, using `.iloc`, we can make the same kinds of manipulations we just made with a `numpy array`:

In [17]:
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [5, 6, 7, 8]})
df

Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8


In [18]:
df.iloc[1,1] = -42
df

Unnamed: 0,a,b
0,1,5
1,2,-42
2,3,7
3,4,8


But this alone is only kinda useful. After all, our datasets are usually very large, and we rarely want to make modifications to cells whose indices we already know. But thankfully, in `pandas` we can pass boolean vectors to `.loc` to identify all rows that meet certain conditions and assign values to those specific cells. For example, suppose we wanted to set `b` to 0 for all rows where `a` is even. We could do:

In [25]:
# Recall that x % 2 gives the remainder after 
# dividing x by 2

df.loc[df.a % 2 == 0, 'b'] = 0
df

Unnamed: 0,a,b
0,1,0
1,2,0
2,3,7
3,4,0


See how the boolean vector on the left subset for rows where `a` was even (the value of `a % 2` is zero), and the second entry (`b`) subset for the column `b`, then we assigned 0 into those cells? It's just a generalization of the kinds of assignments we did above with lists and numpy arrays, just using boolean vectors and column labels instead of indices!

Great! But now suppose we don't just want to set certain values to a constant, but instead we wanted to, say, double all the values in odd rows. We can do that to by assigning values that "fit" into the cells on the left of the assignment operator (i.e. by making sure the values we assign have the same dimensions as the cells into which we're trying to assign them):

In [26]:
df.loc[df.a % 2 == 1, 'b'] = df.loc[df.a % 2 == 1, 'b'] * 2
df

Unnamed: 0,a,b
0,1,0
1,2,0
2,3,14
3,4,0


Because the cells selected on the left have the same shapes as those being assigned, they can just be inserted element-wise. 

And that's how you can make edits by hand, instead of by relying on methods like `.replace()`!

## Managing Data Types