**In this notebook, we will learn:**
* **How to parse date and time from a chronological data?**
* **Use date as an index using "set_index()" method**
* **Use of "fillna()  method in dataframe"**
* **filling methods,axis,limit in "fillna()"**
* **use of "interpolate()" method in dataframe**
* **interpolate time method**
* **Use of "dropna()" method in dataframe**
* **How to handle missing rows in a dataframe?**

In [5]:
import pandas as pd

df=pd.read_csv("Practice Files/weather_data2.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,28.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
5,1/8/2017,,,Sunny
6,1/9/2017,,,
7,1/10/2017,34.0,8.0,Cloudy
8,1/11/2017,40.0,12.0,Sunny


In [8]:
df.day[0] , type( df.day[0] )

('1/1/2017', str)

> **Like we are observing that an additional index column is not needed**,
> **Therefore we will set "day" as an index. See How?**

#### Parsing Date and time from the chronological data

In [16]:
df=pd.read_csv("Practice Files/weather_data2.csv",parse_dates=["day"])
df

Unnamed: 0,day,temperature,windspeed,event
0,2017-01-01,32.0,6.0,Rain
1,2017-01-04,,9.0,Sunny
2,2017-01-05,28.0,,Snow
3,2017-01-06,,7.0,
4,2017-01-07,32.0,,Rain
5,2017-01-08,,,Sunny
6,2017-01-09,,,
7,2017-01-10,34.0,8.0,Cloudy
8,2017-01-11,40.0,12.0,Sunny


In [17]:
#Now type of day is converted from string into TimeStamp instance
df.day[0] , type( df.day[0] )

(Timestamp('2017-01-01 00:00:00'), pandas._libs.tslibs.timestamps.Timestamp)

#### Use date as an index using "set_index()" method

In [18]:
# Setting day as an index 
df.set_index("day",inplace=True)
df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,28.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


#### Use of "fillna()" method

In [19]:
new_df = df.fillna(0)
new_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,0.0,9.0,Sunny
2017-01-05,28.0,0.0,Snow
2017-01-06,0.0,7.0,0
2017-01-07,32.0,0.0,Rain
2017-01-08,0.0,0.0,Sunny
2017-01-09,0.0,0.0,0
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


In [31]:
#NOTE
new_df = df.fillna(0,limit=1)
new_df

# limit , specifies that the number of null values in a row are the only rows where NaN are filled.

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,0.0,9.0,Sunny
2017-01-05,28.0,0.0,Snow
2017-01-06,,7.0,0
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


> **Here we can observe, event column is illogical for 0 as a value**

In [21]:
#To Fix , we will pass dictionary 

new_df = df.fillna({
    "temperature":0, 
    "windspeed":0,
    "event":"no event"
})
new_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,0.0,9.0,Sunny
2017-01-05,28.0,0.0,Snow
2017-01-06,0.0,7.0,no event
2017-01-07,32.0,0.0,Rain
2017-01-08,0.0,0.0,Sunny
2017-01-09,0.0,0.0,no event
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


> **Still, Having 0 in temperature and windspeed columns are not seems like accurate estimates**

> **To fix this, filling methods comes in**

In [39]:
# forward fill , means previous value is copied to next null value
new_df = df.ffill()
new_df

# Axis parameter in "ffill()" and "bfill()" method is to set whether you want to copy values horizontally or vertically(default)

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,32.0,9.0,Sunny
2017-01-05,28.0,9.0,Snow
2017-01-06,28.0,7.0,Snow
2017-01-07,32.0,7.0,Rain
2017-01-08,32.0,7.0,Sunny
2017-01-09,32.0,7.0,Sunny
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


In [25]:
# backward fill , means next value is copied to previous null value
new_df = df.bfill()
new_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,28.0,9.0,Sunny
2017-01-05,28.0,7.0,Snow
2017-01-06,32.0,7.0,Rain
2017-01-07,32.0,8.0,Rain
2017-01-08,34.0,8.0,Sunny
2017-01-09,34.0,8.0,Cloudy
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


> **Looks like, the data is good, but still it doesn't seems like an accurate estimate**

> **To fix this, we use "interpolate()"**

#### Use of "interpolate()" method

In [37]:
new_df = df.interpolate()
new_df #Avoid the Warning

  new_df = df.interpolate()


Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,30.0,9.0,Sunny
2017-01-05,28.0,8.0,Snow
2017-01-06,30.0,7.0,
2017-01-07,32.0,7.25,Rain
2017-01-08,32.666667,7.5,Sunny
2017-01-09,33.333333,7.75,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


> **default interpolation method is "Linear Interpolation" in which a mid point of two values decides the approximate extimate value for a null field**

> **But Still, as per the missing dates, this can't be an accurate estimate.**

> **Therefore,we will use "Time" method**

In [40]:
new_df = df.interpolate(method='time')
new_df

  new_df = df.interpolate(method='time')


Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,29.0,9.0,Sunny
2017-01-05,28.0,8.0,Snow
2017-01-06,30.0,7.0,
2017-01-07,32.0,7.25,Rain
2017-01-08,32.666667,7.5,Sunny
2017-01-09,33.333333,7.75,
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


> Now, we can say that, this is an accurate data

#### Use of "dropna()" method

In [42]:
new_df = df.dropna() # removes all the rows with na values
new_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


In [46]:
new_df = df.dropna(how="all") 
new_df

#‘any’ : If any NA values are present, drop that row or column.
#‘all’ : If all values are NA, drop that row or column.

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,28.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


In [47]:
new_df = df.dropna(thresh=1) # "thresh" parameter says that, if there is a single data in a row , then dont remove it
new_df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,32.0,6.0,Rain
2017-01-04,,9.0,Sunny
2017-01-05,28.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-10,34.0,8.0,Cloudy
2017-01-11,40.0,12.0,Sunny


#### Handling Missing rows in dataframe

> Here we can see, in day column , some dates are missing

In [49]:
new_dates = pd.date_range("01-01-2017","01-11-2017")
new_index = pd.DatetimeIndex(new_dates)
new_df = df.reindex(new_index)
new_df

Unnamed: 0,temperature,windspeed,event
2017-01-01,32.0,6.0,Rain
2017-01-02,,,
2017-01-03,,,
2017-01-04,,9.0,Sunny
2017-01-05,28.0,,Snow
2017-01-06,,7.0,
2017-01-07,32.0,,Rain
2017-01-08,,,Sunny
2017-01-09,,,
2017-01-10,34.0,8.0,Cloudy


In [50]:
new_df.interpolate(method="time",inplace=True)
new_df

  new_df.interpolate(method="time",inplace=True)


Unnamed: 0,temperature,windspeed,event
2017-01-01,32.0,6.0,Rain
2017-01-02,31.0,7.0,
2017-01-03,30.0,8.0,
2017-01-04,29.0,9.0,Sunny
2017-01-05,28.0,8.0,Snow
2017-01-06,30.0,7.0,
2017-01-07,32.0,7.25,Rain
2017-01-08,32.666667,7.5,Sunny
2017-01-09,33.333333,7.75,
2017-01-10,34.0,8.0,Cloudy


In [51]:
new_df.fillna({
    "event":"No Event"
})

Unnamed: 0,temperature,windspeed,event
2017-01-01,32.0,6.0,Rain
2017-01-02,31.0,7.0,No Event
2017-01-03,30.0,8.0,No Event
2017-01-04,29.0,9.0,Sunny
2017-01-05,28.0,8.0,Snow
2017-01-06,30.0,7.0,No Event
2017-01-07,32.0,7.25,Rain
2017-01-08,32.666667,7.5,Sunny
2017-01-09,33.333333,7.75,No Event
2017-01-10,34.0,8.0,Cloudy
