# Missing Money

In this activity, you’ll identify and handle missing values in a dataset. 

Instructions:

1. Import the Pandas and `pathlib` libraries.

2. Use `Path` with the `read_csv` function to read the CSV file into the DataFrame. Use the `index_col`, `parse_dates`, and `infer_datetime_format` parameters to set the Date column as the index.

3. Confirm that Pandas properly imported the DataFrame by using the `head` function to view the first five rows.

4. Determine the total number of missing values by using the `isnull` function together with the `sum` function.

5. Drop the rows that have missing values by using the `dropna` function.

6. Confirm that all the missing values have been removed by running the `isnull` function. 

References:

[Pandas read_csv function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

[Pandas isnull function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isnull.html)


## Step 1: Import the Pandas and `pathlib` libraries.

In [1]:
import pandas as pd
from pathlib import Path

## Step 2: Use `Path` with the `read_csv` function to read the CSV file into the DataFrame. Use the `index_col`, `parse_dates`, and `infer_datetime_format` parameters to set the Date column as the index.

In [4]:
# Read in the CSV file called "money_flows.csv" using the Path module
# The CSV file is located in the Resources folder
# Set the index to the column "Date"
# Set the parse_dates and infer_datetime_format parameters
money_flow_df = pd.read_csv(Path('01_missing_money/Resources/money_flows.csv'),
                           index_col = 'Date',
                            parse_dates = True
                           )
money_flow_df.head()

  money_flow_df = pd.read_csv(Path('01_missing_money/Resources/money_flows.csv'),


Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-01,
2020-01-02,1.04
2020-01-03,1.65
2020-01-03,1.65
2020-01-03,1.65


## Step 4: Determine the total number of missing values by using the `isnull` function together with the `sum` function.

In [5]:
money_flow_df.isnull()

Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-01,True
2020-01-02,False
2020-01-03,False
2020-01-03,False
2020-01-03,False
...,...
2020-12-26,False
2020-12-27,False
2020-12-28,False
2020-12-29,False


In [6]:
money_flow_df.notnull()

Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-01,False
2020-01-02,True
2020-01-03,True
2020-01-03,True
2020-01-03,True
...,...
2020-12-26,True
2020-12-27,True
2020-12-28,True
2020-12-29,True


In [7]:
money_flow_df.isna()

Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-01,True
2020-01-02,False
2020-01-03,False
2020-01-03,False
2020-01-03,False
...,...
2020-12-26,False
2020-12-27,False
2020-12-28,False
2020-12-29,False


In [8]:
money_flow_df.notna()

Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-01,False
2020-01-02,True
2020-01-03,True
2020-01-03,True
2020-01-03,True
...,...
2020-12-26,True
2020-12-27,True
2020-12-28,True
2020-12-29,True


In [9]:
money_flow_df.isna().sum()

Total Payments    10
dtype: int64

In [10]:
money_flow_df.isna().mean()

Total Payments    0.027174
dtype: float64

In [11]:
money_flow_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 368 entries, 2020-01-01 to 2020-12-30
Data columns (total 1 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Total Payments  358 non-null    float64
dtypes: float64(1)
memory usage: 5.8 KB


## Step 5: Drop the rows that have missing values by using the `dropna` function.

In [12]:
# Use the dropna function to eliminate the rows with missing values from the DataFrame
money_flow_df.dropna()

Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-02,1.04
2020-01-03,1.65
2020-01-03,1.65
2020-01-03,1.65
2020-01-04,2.02
...,...
2020-12-26,210.13
2020-12-27,211.08
2020-12-28,213.27
2020-12-29,217.28


In [13]:
money_flow_df.dropna(subset = ['Total Payments'])

Unnamed: 0_level_0,Total Payments
Date,Unnamed: 1_level_1
2020-01-02,1.04
2020-01-03,1.65
2020-01-03,1.65
2020-01-03,1.65
2020-01-04,2.02
...,...
2020-12-26,210.13
2020-12-27,211.08
2020-12-28,213.27
2020-12-29,217.28


In [15]:
money_flow_df.dropna(inplace=True)

## Step 6: Confirm that all the missing values have been removed by running the `isnull` function.

In [16]:
money_flow_df.isna().sum()

Total Payments    0
dtype: int64