# Dealing with missing data
---

If you're missing data, you might need to fill in gaps or delete rows/columns entirely. Pandas makes it easy to do.

In [1]:
import numpy as np
import pandas as pd

# DataFrame is the contract etails for our transfer targets, where known.
# 'np.nan' is a numpy value that shows that there is not a number.
# in this case, it demonstrates missing data

df = pd.DataFrame({
    'Wage': [150000,123000,np.nan],
    'GoalBonus': [4000,np.nan,np.nan],
    'ImageRights': [50000,70000,100000]
  },
  index=['Konda', 'Makho', 'Grey'],
  columns=['Wage', 'GoalBonus', 'ImageRights']
)

df

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000
Makho,123000.0,,70000
Grey,,,100000


## Removing rows and columns with missing data

If you decide to bin the players with missing data, use `.dropna()` method.

In [2]:
df.dropna()

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000


If you wanted to do the same for columns, add argument of `axis=1`.

In [3]:
df.dropna(axis=1)

Unnamed: 0,ImageRights
Konda,50000
Makho,70000
Grey,100000


`.dropna()` can also take the argument `thresh` to change the amount of missing values you're happy to deal with. Makho has only 1 missing value, whereas Grey has 2. Below, we'll allow Makho into our dataset, but continue to exclude Grey.

In [4]:
df.dropna(thresh=2)

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000
Makho,123000.0,,70000


## Fill Data

Sometimes, deleting rows and columns is a bit drastic. You may instead want to simply fill the gaps in instead. Use `.fillna()` method, passing the desired value as the argument.

In [5]:
df.fillna(value=0)

Unnamed: 0,Wage,GoalBonus,ImageRights
Konda,150000.0,4000.0,50000
Makho,123000.0,0.0,70000
Grey,0.0,0.0,100000


You might want to be a bit smarter than filling with 0s. Maybe you want to take a column and use the average to fill in the rest of the gaps.

In [6]:
df['Wage'].fillna(value=df['Wage'].mean())

Konda    150000.0
Makho    123000.0
Grey     136500.0
Name: Wage, dtype: float64