# Missing Data (or) NaN values

In [3]:
import pandas as pd
import numpy as np

employee_details = [
        { "idno": 101, "name" : "Ravi",  "salary": 125000.00 },
        { "idno": 102, "name" : np.nan,  "salary": np.nan },
        { "idno": 103, "name" : np.nan,  "salary": 150000.00 },
        { "idno": 104, "name" : "Krishna",  "salary": 105000.00 },
        { "idno": 105, "name" : "Prasad",  "salary": np.nan },
    ]
employee_details

[{'idno': 101, 'name': 'Ravi', 'salary': 125000.0},
 {'idno': 102, 'name': nan, 'salary': nan},
 {'idno': 103, 'name': nan, 'salary': 150000.0},
 {'idno': 104, 'name': 'Krishna', 'salary': 105000.0},
 {'idno': 105, 'name': 'Prasad', 'salary': nan}]

In [4]:
# Converting into Pandas DataFrame Object

df = pd.DataFrame(employee_details)

df

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
1,102,,
2,103,,150000.0
3,104,Krishna,105000.0
4,105,Prasad,


### DataFrame.dropna() — What does it do?

* Purpose: Remove rows or columns that contain missing values (NaN) from a DataFrame.

* Helps clean your data by getting rid of incomplete entries.

#### Basic syntax:

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

#### Parameters explained:

| Parameter | Description                                                                                                   |
| --------- | ------------------------------------------------------------------------------------------------------------- |
| `axis`    | 0 or `'index'` (default): drop rows with missing values<br>1 or `'columns'`: drop columns with missing values |
| `how`     | `'any'` (default): drop if **any** value is `NaN`<br>`'all'`: drop only if **all** values are `NaN`           |
| `thresh`  | Require that many **non-NA values** to keep the row/column                                                    |
| `subset`  | Limit checking to specific columns (if dropping rows) or rows (if dropping columns)                           |
| `inplace` | If `True`, do operation inplace and return `None`. Default is `False` (returns a new DataFrame)               |


In [5]:
# Example's on dropna function.

df

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
1,102,,
2,103,,150000.0
3,104,Krishna,105000.0
4,105,Prasad,


In [7]:
df.dropna()  # Remmove all rows where value is Nan.

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
3,104,Krishna,105000.0


In [11]:
df.dropna(axis=1)

Unnamed: 0,idno
0,101
1,102
2,103
3,104
4,105


In [17]:
df.dropna(thresh=2)  # This will drop if you find >=2 Nan's in a row.

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
2,103,,150000.0
3,104,Krishna,105000.0
4,105,Prasad,


### DataFrame.fillna() — What does it do?

* Used to fill missing values (NaN) in a DataFrame with a specified value or method.

* Helps replace missing data to avoid errors in analysis or modeling.

#### Basic syntax:

df.fillna(value=None, method=None, axis=None, inplace=False, limit=None)

#### Parameters explained:

| Parameter | Description                                                                                                            |
| --------- | ---------------------------------------------------------------------------------------------------------------------- |
| `value`   | Scalar, dict, Series, or DataFrame to fill `NaN` with. Example: `0` or `{'A': 1, 'B': 2}`                              |
| `method`  | `'ffill'` or `'pad'` to forward fill (use previous value), `'bfill'` or `'backfill'` to backward fill (use next value) |
| `axis`    | 0 or `'index'` (default): fill along rows<br>1 or `'columns'`: fill along columns                                      |
| `inplace` | If True, modify DataFrame in place (no return). Default False (returns a new DataFrame)                                |
| `limit`   | Max number of consecutive NaNs to fill                                                                                 |


In [18]:
df

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
1,102,,
2,103,,150000.0
3,104,Krishna,105000.0
4,105,Prasad,


In [19]:
df.fillna(value='Not Given By Employee')

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
1,102,Not Given By Employee,Not Given By Employee
2,103,Not Given By Employee,150000.0
3,104,Krishna,105000.0
4,105,Prasad,Not Given By Employee


In [20]:
df.fillna(value='Not Given By Employee',axis=1)

Unnamed: 0,idno,name,salary
0,101,Ravi,125000.0
1,102,Not Given By Employee,Not Given By Employee
2,103,Not Given By Employee,150000.0
3,104,Krishna,105000.0
4,105,Prasad,Not Given By Employee
