# **Introduction to Dates & Time with pandas**

This jupyter notebook can be found on my GitHub account: https://github.com/mbonnemaison/Learning-Python
### **pandas** is a python library that facilitates data analysis organized in a table.

### Sources:
- Information to install pandas, introduce pandas and the user guide: https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html
- Python for Data Analysis by Wes McKinney (2nd edition used here) - Chapter 5 (Introduction), Chapter 11 (Time Series)
- Video on Data Analysis (go to comments to go to part you're interested in): https://www.youtube.com/watch?v=r-uOLxNrNk8&list=RDCMUC8butISFwT-Wl7EV0hUK0BQ&index=3

## Introduction to Time & Dates
Some of the elementary data structures for working with date & time data are:

- **Timestamps** : specific instants in time
- **Timedeltas**: Intervals of time indicated by a start and end timestamp.

### **Timestamp**
Python provides the date and time functionality in the **datetime** module that contains three popular classes:

- **Date class**: to work with dates (day, month, year)
- **Time class**: to work with times (hours, minutes, seconds, microseconds)
- **Datetime class**: to work with components of both date and time

***Timestamp*** is pandas equivalent of python’s datetime.datetime object and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.

### **What day is today? What time is it?**
**Method 1**: datetime module

In [1]:
from datetime import datetime
#From module import class

In [3]:
now = datetime.now()

In [4]:
now

datetime.datetime(2021, 4, 4, 14, 5, 9, 636458)

In [7]:
now.date()

datetime.date(2021, 4, 4)

In [8]:
now.time()

datetime.time(14, 5, 9, 636458)

In [9]:
mydate = datetime(2021,10,4, 23, 12, 34)

In [10]:
mydate

datetime.datetime(2021, 10, 4, 23, 12, 34)

In [11]:
mydate.time()

datetime.time(23, 12, 34)

**Method 2**: pandas library

In [12]:
import pandas as pd

In [15]:
now = pd.to_datetime('now', utc = True)

In [14]:
now

Timestamp('2021-04-04 18:06:54.715536')

In [18]:
now_utc = now.tz_localize('US/Eastern')

TypeError: Cannot localize tz-aware Timestamp, use tz_convert for conversions

In [172]:
now_utc

Timestamp('2021-04-04 15:38:48.619917+0000', tz='UTC')

In [16]:
now_est = now.tz_convert('US/Eastern')

In [17]:
now_est

Timestamp('2021-04-04 14:08:00.308670-0400', tz='US/Eastern')

### **Convert strings to Datetimes**
**Method #1**: Strings can be converted to dates using **datetime.strptime**.

Note: Information on format can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [21]:
datetime.strptime('2021/4/25 5:46:23', '%Y/%m/%d %H:%M:%S')

datetime.datetime(2021, 4, 25, 5, 46, 23)

**Method #2**: Strings can be converted to dates using **pd.to_datetime**.

Note: Information on format can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [105]:
import pandas as pd

In [22]:
pd.to_datetime('2021-02-19 22:45:56')

Timestamp('2021-02-19 22:45:56')

In [24]:
pd.to_datetime('2021-02-19 22:45:56', format = '%Y-%m-%d')

Timestamp('2021-02-19 22:45:56')

### **Convert a list of dates from string to datetime or Timestamp**
**Method 1**: using the datetime module

In [26]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [27]:
[datetime.strptime(x, '%Y-%m-%d') for x in date_list_str]

[datetime.datetime(2021, 3, 14, 0, 0),
 datetime.datetime(2020, 12, 25, 0, 0),
 datetime.datetime(2025, 2, 19, 0, 0)]

**Method 2**: using pandas

In [108]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [28]:
pd.to_datetime(date_list_str)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19'], dtype='datetime64[ns]', freq=None)

### **Dealing with missing values**
**Method 1**: using the datetime module

In [32]:
date_list_str2 = ['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [33]:
[datetime.strptime(x, '%Y-%m-%d') for x in date_list_str2]

ValueError: time data 'None' does not match format '%Y-%m-%d'

**Method 2**: using pandas

In [38]:
date_list_str2 = ['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [39]:
pd.to_datetime(date_list_str2)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', 'NaT'], dtype='datetime64[ns]', freq=None)

**NaT** means Not a Time

### **Reading data from a csv file using pandas**
More information on data here: https://github.com/mbonnemaison/adelego

In [40]:
data = pd.read_csv("data3months.csv", sep = '\t')

In [42]:
data

Unnamed: 0,Date,Equipment,Type,Value,Unit
0,2021-01-01 00:00:00,5MultiSensor 6 (ZW100),HUMIDITY,38000000000,%
1,2021-01-01 01:00:00,5MultiSensor 6 (ZW100),HUMIDITY,36500000000,%
2,2021-01-01 02:00:00,5MultiSensor 6 (ZW100),HUMIDITY,35500000000,%
3,2021-01-01 03:00:00,5MultiSensor 6 (ZW100),HUMIDITY,35000000000,%
4,2021-01-01 03:15:00,5MultiSensor 6 (ZW100),HUMIDITY,35000000000,%
...,...,...,...,...,...
15045,2021-03-31 19:00:00,5MultiSensor 6 (ZW100),UV,0,
15046,2021-03-31 20:00:00,5MultiSensor 6 (ZW100),UV,0,
15047,2021-03-31 21:00:00,5MultiSensor 6 (ZW100),UV,0,
15048,2021-03-31 22:00:00,5MultiSensor 6 (ZW100),UV,0,


In [43]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15050 entries, 0 to 15049
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       15050 non-null  object
 1   Equipment  15050 non-null  object
 2   Type       15050 non-null  object
 3   Value      15050 non-null  object
 4   Unit       6582 non-null   object
dtypes: object(5)
memory usage: 588.0+ KB


In [46]:
data['Date']

0        2021-01-01 00:00:00
1        2021-01-01 01:00:00
2        2021-01-01 02:00:00
3        2021-01-01 03:00:00
4        2021-01-01 03:15:00
                ...         
15045    2021-03-31 19:00:00
15046    2021-03-31 20:00:00
15047    2021-03-31 21:00:00
15048    2021-03-31 22:00:00
15049    2021-03-31 23:00:00
Name: Date, Length: 15050, dtype: object

In [47]:
pd.Series([1,2,3,4,5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

### **Convert "Date" from string to timestamp**
**Method 1**: using the datetime module

In [48]:
data["Date"] = [datetime.strptime(x, '%Y-%m-%d %H:%M:%S') for x in data["Date"]]

In [49]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15050 entries, 0 to 15049
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       15050 non-null  datetime64[ns]
 1   Equipment  15050 non-null  object        
 2   Type       15050 non-null  object        
 3   Value      15050 non-null  object        
 4   Unit       6582 non-null   object        
dtypes: datetime64[ns](1), object(4)
memory usage: 588.0+ KB


In [51]:
data["Date"][23]

Timestamp('2021-01-01 22:00:00')

**Method 2**: using pandas

In [52]:
data3 = pd.read_csv("data3months.csv", sep = '\t')

In [53]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15050 entries, 0 to 15049
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       15050 non-null  object
 1   Equipment  15050 non-null  object
 2   Type       15050 non-null  object
 3   Value      15050 non-null  object
 4   Unit       6582 non-null   object
dtypes: object(5)
memory usage: 588.0+ KB


In [54]:
data3["Date"] = pd.to_datetime(data3["Date"])

In [55]:
data3["Date"][0]

Timestamp('2021-01-01 00:00:00')

Missing values in DataFrame...

In [56]:
data4 = pd.read_csv("data3months-Copy1.csv", sep = '\t')

In [57]:
data4.head(10)

Unnamed: 0,Date,Equipment,Type,Value,Unit
0,,5MultiSensor 6 (ZW100),HUMIDITY,38000000000,%
1,,5MultiSensor 6 (ZW100),HUMIDITY,36500000000,%
2,,5MultiSensor 6 (ZW100),HUMIDITY,35500000000,%
3,,5MultiSensor 6 (ZW100),HUMIDITY,35000000000,%
4,,5MultiSensor 6 (ZW100),HUMIDITY,35000000000,%
5,2021-01-01 04:00:00,5MultiSensor 6 (ZW100),HUMIDITY,34250000000,%
6,2021-01-01 05:00:00,5MultiSensor 6 (ZW100),HUMIDITY,34000000000,%
7,2021-01-01 06:00:00,5MultiSensor 6 (ZW100),HUMIDITY,33000000000,%
8,2021-01-01 07:00:00,5MultiSensor 6 (ZW100),HUMIDITY,33250000000,%
9,2021-01-01 08:00:00,5MultiSensor 6 (ZW100),HUMIDITY,35500000000,%


In [58]:
data4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15050 entries, 0 to 15049
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       15045 non-null  object
 1   Equipment  15050 non-null  object
 2   Type       15050 non-null  object
 3   Value      15050 non-null  object
 4   Unit       6582 non-null   object
dtypes: object(5)
memory usage: 588.0+ KB


In [59]:
data4["Date"] = pd.to_datetime(data4["Date"])

In [60]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15050 entries, 0 to 15049
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       15050 non-null  datetime64[ns]
 1   Equipment  15050 non-null  object        
 2   Type       15050 non-null  object        
 3   Value      15050 non-null  object        
 4   Unit       6582 non-null   object        
dtypes: datetime64[ns](1), object(4)
memory usage: 588.0+ KB


In [62]:
data4["Date"][33]

Timestamp('2021-01-02 07:00:00')

### **Data manipulations with Timestamps in pandas**
**Select rows**

In [None]:
data.loc[(data['Date'] > '2021-02-01') & (data['Date'] < '2021-02-02') & (data['Type'] == 'HUMIDITY')]

In [64]:
data.iloc[1:3]

Unnamed: 0,Date,Equipment,Type,Value,Unit
1,2021-01-01 01:00:00,5MultiSensor 6 (ZW100),HUMIDITY,36500000000,%
2,2021-01-01 02:00:00,5MultiSensor 6 (ZW100),HUMIDITY,35500000000,%


**Sort values**

In [65]:
data.sort_values(by = ["Date"], ascending=True)

Unnamed: 0,Date,Equipment,Type,Value,Unit
0,2021-01-01 00:00:00,5MultiSensor 6 (ZW100),HUMIDITY,38000000000,%
12856,2021-01-01 00:00:00,5MultiSensor 6 (ZW100),UV,0,
10662,2021-01-01 00:00:00,5MultiSensor 6 (ZW100),TEMPERATURE,17475000000,°C
2194,2021-01-01 00:00:00,5MultiSensor 6 (ZW100),BRIGHTNESS,0,Lux
12857,2021-01-01 01:00:00,5MultiSensor 6 (ZW100),UV,0,
...,...,...,...,...,...
12854,2021-03-31 22:00:00,5MultiSensor 6 (ZW100),TEMPERATURE,22700000000,°C
12855,2021-03-31 23:00:00,5MultiSensor 6 (ZW100),TEMPERATURE,22550000000,°C
2193,2021-03-31 23:00:00,5MultiSensor 6 (ZW100),HUMIDITY,42500000000,%
4387,2021-03-31 23:00:00,5MultiSensor 6 (ZW100),BRIGHTNESS,0,Lux


### **Generate Timestamps at fixed frequency**
*Fixed frequency* consists of data points that occur at regular intervals, like every 5 minutes.

In [66]:
tsff = pd.date_range(start = '1/1/2021', periods = 50, freq = '4h')

In [67]:
tsff

DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 04:00:00',
               '2021-01-01 08:00:00', '2021-01-01 12:00:00',
               '2021-01-01 16:00:00', '2021-01-01 20:00:00',
               '2021-01-02 00:00:00', '2021-01-02 04:00:00',
               '2021-01-02 08:00:00', '2021-01-02 12:00:00',
               '2021-01-02 16:00:00', '2021-01-02 20:00:00',
               '2021-01-03 00:00:00', '2021-01-03 04:00:00',
               '2021-01-03 08:00:00', '2021-01-03 12:00:00',
               '2021-01-03 16:00:00', '2021-01-03 20:00:00',
               '2021-01-04 00:00:00', '2021-01-04 04:00:00',
               '2021-01-04 08:00:00', '2021-01-04 12:00:00',
               '2021-01-04 16:00:00', '2021-01-04 20:00:00',
               '2021-01-05 00:00:00', '2021-01-05 04:00:00',
               '2021-01-05 08:00:00', '2021-01-05 12:00:00',
               '2021-01-05 16:00:00', '2021-01-05 20:00:00',
               '2021-01-06 00:00:00', '2021-01-06 04:00:00',
               '2021-01-

## **Timedeltas**
Timedelta represents the temporal difference between two datetime objects.

In [69]:
pd.Timedelta(weeks = 1, days = 4, hours = 5)

Timedelta('11 days 05:00:00')

### **Timedelta operations**
**Add time to Timestamps**

In [70]:
ts = pd.to_datetime('2021/3/23 23:20:00') + pd.Timedelta(days=-3)

In [71]:
ts

Timestamp('2021-03-20 23:20:00')

**Difference between Timestamps generates a Timedelta**

In [72]:
delta = pd.to_datetime('2021/3/23 23:20:00') - pd.to_datetime('2021/3/20 2:34:14')

In [73]:
delta

Timedelta('3 days 20:45:46')

**Adding Timedeltas**

In [74]:
td1 = pd.Timedelta(weeks = 3, days = 3, hours = 3)
td2 = pd.Timedelta(weeks = 1, days = 1, hours = 1)

In [75]:
td1+td2

Timedelta('32 days 04:00:00')

### **Convert strings to Timedelta**

In [86]:
pd.to_timedelta('233:23:23')

Timedelta('9 days 17:23:23')

## **Going further**
### ***Time periods***
Time Periods correspond to a specific length of time between a start and end timestamp. 

*Periods* can be thought of as special cases of intervals.

Example of periods: the month of March 2021 or the year 2020

### **Generate Time Periods**

In [None]:
tp = pd.Period(2020, freq='A-OCT')
#A-OCT means that we are looking at a period starting on 1/1/2020 and ending on 10/31/2020.

In [None]:
tp

### **Generate Time Periods at fixed frequency**

In [153]:
tp2 = pd.period_range(start='2017-01-01', end='2018-01-01', freq='M')

In [154]:
tp2

PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06',
             '2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12',
             '2018-01'],
            dtype='period[M]', freq='M')

In [181]:
tp3 = pd.period_range(start='2021-01-01', periods = 50, freq='A-FEB')

In [182]:
tp3

PeriodIndex(['2021', '2022', '2023', '2024', '2025', '2026', '2027', '2028',
             '2029', '2030', '2031', '2032', '2033', '2034', '2035', '2036',
             '2037', '2038', '2039', '2040', '2041', '2042', '2043', '2044',
             '2045', '2046', '2047', '2048', '2049', '2050', '2051', '2052',
             '2053', '2054', '2055', '2056', '2057', '2058', '2059', '2060',
             '2061', '2062', '2063', '2064', '2065', '2066', '2067', '2068',
             '2069', '2070'],
            dtype='period[A-FEB]', freq='A-FEB')

In [183]:
tp3[5] - tp3[2]

<3 * YearEnds: month=2>

In [184]:
type(tp3[5] - tp3[2])

pandas._libs.tslibs.offsets.YearEnd

In [185]:
tp4 = pd.date_range(start='2021-01-01', periods = 50, freq='A-FEB')

In [186]:
tp4

DatetimeIndex(['2021-02-28', '2022-02-28', '2023-02-28', '2024-02-29',
               '2025-02-28', '2026-02-28', '2027-02-28', '2028-02-29',
               '2029-02-28', '2030-02-28', '2031-02-28', '2032-02-29',
               '2033-02-28', '2034-02-28', '2035-02-28', '2036-02-29',
               '2037-02-28', '2038-02-28', '2039-02-28', '2040-02-29',
               '2041-02-28', '2042-02-28', '2043-02-28', '2044-02-29',
               '2045-02-28', '2046-02-28', '2047-02-28', '2048-02-29',
               '2049-02-28', '2050-02-28', '2051-02-28', '2052-02-29',
               '2053-02-28', '2054-02-28', '2055-02-28', '2056-02-29',
               '2057-02-28', '2058-02-28', '2059-02-28', '2060-02-29',
               '2061-02-28', '2062-02-28', '2063-02-28', '2064-02-29',
               '2065-02-28', '2066-02-28', '2067-02-28', '2068-02-29',
               '2069-02-28', '2070-02-28'],
              dtype='datetime64[ns]', freq='A-FEB')

In [187]:
tp4[5] - tp4[2]

Timedelta('1096 days 00:00:00')