### Any time a nan is encountered, replace it with a scalar value:

In [1]:
# df.my_feature.fillna( df.my_feature.mean() )
# df.fillna(0)

**To drop any rows that have missing data.**

In [3]:
# df1.dropna(how='any')

** Filling missing data**

In [4]:
# df1.fillna(value=5)

#### When a nan is encountered, replace it with the immediate, previous, non-nan value

In [5]:
# df.fillna(method='ffill')  # fill the values forward
# df.fillna(method='bfill')  # fill the values in reverse
# df.fillna(limit=5)

#### Interpolate

In [8]:
import pandas as pd
import numpy as np

In [9]:
s = pd.Series([0, 1, np.nan, 3])

In [10]:
s

0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64

In [11]:
s.interpolate()

0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

## Cleaning/filling missing data

The fillna function can “fill in” NA values with non-null data in a couple of ways, which we illustrate:

**Replace NA with a scalar value**

In [13]:
# df2.fillna(0) will replace all nans with 0

## Dropping Data

In [15]:
## >>>df = df.dropna(axis=0)  # row 
## >>>df = df.dropna(axis=1)  # column

**Drop any row that has at least 4 NON-NaNs within it:**

In [16]:
## >>>df = df.dropna(axis=0, thresh=4)

### Example

In [17]:
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'year': [2012, 2012, 2013, 2014, 2014], 
        'reports': [4, 24, 31, 2, 3]}

In [18]:
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])

In [19]:
df

Unnamed: 0,name,reports,year
Cochice,Jason,4,2012
Pima,Molly,24,2012
Santa Cruz,Tina,31,2013
Maricopa,Jake,2,2014
Yuma,Amy,3,2014


#### Drop a row

In [20]:
df.drop(["Cochice","Pima"])

Unnamed: 0,name,reports,year
Santa Cruz,Tina,31,2013
Maricopa,Jake,2,2014
Yuma,Amy,3,2014


#### Drop a column

**Note: axis=1 denotes that we are referring to a column, not a row**

In [22]:
df.drop('reports',axis = 1)

Unnamed: 0,name,year
Cochice,Jason,2012
Pima,Molly,2012
Santa Cruz,Tina,2013
Maricopa,Jake,2014
Yuma,Amy,2014


#### Filling holes

In [23]:
## >>>df = df.reset_index(drop=True)

### More wrangling

If your data types don't look the way you expected them, explicitly convert them to the desired type using the  

* .to_datetime(), 
* .to_numeric()
* .to_timedelta() 

methods:

In [24]:
## df.Age = pd.to_numeric(df.Age, errors='coerce')

After fixing up your data types, let's say you want to see all the unique values present in a particular series. Call the **.unique()** method on it to view a list, or alternatively, if you'd like to know how many times each of those unique values are present, you can call **.value_counts()**. Either method works with series, but *neither* will function *if called on a dataframe:*