# More Pandas Datetime Functions/Variable Types


In [2]:
import pandas as pd
url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQcpVvVioO23cndDwr1UmKhndrSq6ES6ZUKZ4fkBBqIAavd1_coVPO_yeOye-Ub-cAWlkX3psJvOU8o/pub?output=csv"
df = pd.read_csv(url)
df['datetime'] = pd.to_datetime(df['date'])
# set 'datetime' as the index
df = df.set_index('datetime')
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1462 entries, 2013-01-01 to 2017-01-01
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   date          1462 non-null   object 
 1   meantemp      1462 non-null   float64
 2   humidity      1462 non-null   float64
 3   wind_speed    1462 non-null   float64
 4   meanpressure  1462 non-null   float64
dtypes: float64(4), object(1)
memory usage: 68.5+ KB


Unnamed: 0_level_0,date,meantemp,humidity,wind_speed,meanpressure
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-01,2013-01-01,10.0,84.5,0.0,1015.666667
2013-01-02,2013-01-02,7.4,92.0,2.98,1017.8
2013-01-03,2013-01-03,7.166667,87.0,4.633333,1018.666667
2013-01-04,2013-01-04,8.666667,71.333333,1.233333,1017.166667
2013-01-05,2013-01-05,6.0,86.833333,3.7,1016.5


In [3]:
#Now we will set a time delta of 3 days:
# make the time delta
delta_3d = pd.to_timedelta(3,'D')
delta_3d

Timedelta('3 days 00:00:00')

## Example :
For the most humid day in the weather data, what was the **average wind speed** over the 3 days prior and the 3 days after the most humid day?

In [5]:
#option 1
max_date = df['humidity'].idxmax()
# calc 3 days BEFORE
pre_max = max_date - delta_3d
print(pre_max)

# calc 3 days AFTER
post_max = max_date + delta_3d
post_max

2016-12-29 00:00:00


Timestamp('2017-01-04 00:00:00')

In [6]:
mean_windspeed = df.loc[pre_max:post_max,'wind_speed'].mean()
mean_windspeed

4.89791666675

In [7]:
#option 2
## making a date range to cover the pre-max to post-max window
date_range = pd.date_range(pre_max, post_max)
date_range

DatetimeIndex(['2016-12-29', '2016-12-30', '2016-12-31', '2017-01-01',
               '2017-01-02', '2017-01-03', '2017-01-04'],
              dtype='datetime64[ns]', freq='D')

In [8]:
# this will give an error
df.loc[date_range,'wind_speed'].mean()

KeyError: "[Timestamp('2017-01-02 00:00:00'), Timestamp('2017-01-03 00:00:00'), Timestamp('2017-01-04 00:00:00')] not in index"

- Can you see what caused the error? It turns out that our date range goes beyond the index of our data.

- This did not cause an error when just using .loc with our pre and post dates because using the range within .loc will just pull any dates that fall within the range, rather than a list of each date in the range.

- The difference is subtle 微妙, but understanding your options and how each work will give you more versatility when writing code.