# **Introduction to Dates & Time with pandas**

This jupyter notebook can be found on my GitHub account: https://github.com/mbonnemaison/Learning-Python
### **pandas** is a python library that facilitates data analysis organized in a table.

### Sources:
- Information to install pandas, introduce pandas and the user guide: https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html
- Python for Data Analysis by Wes McKinney (2nd edition used here) - Chapter 5 (Introduction), Chapter 11 (Time Series)
- Video on Data Analysis (go to comments to go to part you're interested in): https://www.youtube.com/watch?v=r-uOLxNrNk8&list=RDCMUC8butISFwT-Wl7EV0hUK0BQ&index=3

## Introduction to Time & Dates
Some of the elementary data structures for working with date & time data are:

- **Timestamps** : specific instants in time
- **Timedeltas**: Intervals of time indicated by a start and end timestamp.

### **Timestamp**

***Timestamp*** is pandas equivalent of python’s datetime.datetime object and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.

In [1]:
import pandas as pd

### **Convert strings to Datetimes**
Strings can be converted to dates using **pd.to_datetime**.

Note: Information on format can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [3]:
mytimestamp = '2021/10/23 4:34:2'

In [5]:
type(mytimestamp)

str

In [8]:
mytimestamp = pd.to_datetime(mytimestamp)

In [10]:
type(mytimestamp)

pandas._libs.tslibs.timestamps.Timestamp

In [11]:
pd.to_datetime('2021-02-19 22:45:56', format = '%Y-%m-%d')

Timestamp('2021-02-19 22:45:56')

### **Convert a list of dates from string to Timestamp**

In [12]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [13]:
date_list_str

['2021-03-14', '2020-12-25', '2025-02-19']

In [14]:
pd.to_datetime(date_list_str)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19'], dtype='datetime64[ns]', freq=None)

### **Dealing with missing values**

In [15]:
date_list_str2 = ['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [16]:
date_list_str2

['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [17]:
pd.to_datetime(date_list_str2)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', 'NaT'], dtype='datetime64[ns]', freq=None)

**NaT** means Not a Time

### **Reading data from a csv file using pandas**
More information on data here: https://github.com/mbonnemaison/adelego

In [41]:
data = pd.read_csv("24h_2021-03-14.csv",  sep = '\t')

In [24]:
data

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
0,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),HUMIDITY,21000000000,%
1,2021-03-14 01:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20750000000,%
2,2021-03-14 03:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%
3,2021-03-14 03:25:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
4,2021-03-14 03:40:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
...,...,...,...,...,...
431,2021-03-14 22:55:00,5MultiSensor 6 (ZW100),UV,0,
432,2021-03-14 23:10:00,5MultiSensor 6 (ZW100),UV,0,
433,2021-03-14 23:25:00,5MultiSensor 6 (ZW100),UV,0,
434,2021-03-14 23:40:00,5MultiSensor 6 (ZW100),UV,0,


In [25]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       436 non-null    object
 1   Equipment  436 non-null    object
 2   Parameter  436 non-null    object
 3   Value      436 non-null    object
 4   Unit       258 non-null    object
dtypes: object(5)
memory usage: 17.2+ KB


In [26]:
#Select a column
data['Date']

0      2021-03-14 00:10:00
1      2021-03-14 01:10:00
2      2021-03-14 03:10:00
3      2021-03-14 03:25:00
4      2021-03-14 03:40:00
              ...         
431    2021-03-14 22:55:00
432    2021-03-14 23:10:00
433    2021-03-14 23:25:00
434    2021-03-14 23:40:00
435    2021-03-14 23:55:00
Name: Date, Length: 436, dtype: object

In [27]:
#Select a cell
data['Date'][0]

'2021-03-14 00:10:00'

In [28]:
data['Parameter'].value_counts()

UV             86
HUMIDITY       86
TEMPERATURE    86
BRIGHTNESS     86
SABOTAGE       46
PRESENCE       46
Name: Parameter, dtype: int64

### **Convert values in the "Date" column from string to Timestamp**

In [29]:
data["Date"] = pd.to_datetime(data["Date"])

In [30]:
data["Date"][0]

Timestamp('2021-03-14 00:10:00')

In [31]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       436 non-null    datetime64[ns]
 1   Equipment  436 non-null    object        
 2   Parameter  436 non-null    object        
 3   Value      436 non-null    object        
 4   Unit       258 non-null    object        
dtypes: datetime64[ns](1), object(4)
memory usage: 17.2+ KB


***Missing values in DataFrame...***

In [32]:
dataNaT = pd.read_csv("24h_2021-03-14_NaT.csv", sep = '\t')

In [33]:
dataNaT.head(10)

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
0,,5MultiSensor 6 (ZW100),HUMIDITY,21000000000,%
1,,5MultiSensor 6 (ZW100),HUMIDITY,20750000000,%
2,,5MultiSensor 6 (ZW100),HUMIDITY,20,%
3,,5MultiSensor 6 (ZW100),HUMIDITY,21,%
4,,5MultiSensor 6 (ZW100),HUMIDITY,21,%
5,2021-03-14 03:55:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
6,2021-03-14 04:10:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
7,2021-03-14 04:25:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%
8,2021-03-14 04:40:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
9,2021-03-14 04:55:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%


In [34]:
dataNaT.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       431 non-null    object
 1   Equipment  436 non-null    object
 2   Parameter  436 non-null    object
 3   Value      436 non-null    object
 4   Unit       258 non-null    object
dtypes: object(5)
memory usage: 17.2+ KB


In [35]:
dataNaT["Date"] = pd.to_datetime(dataNaT["Date"])

In [36]:
dataNaT.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       431 non-null    datetime64[ns]
 1   Equipment  436 non-null    object        
 2   Parameter  436 non-null    object        
 3   Value      436 non-null    object        
 4   Unit       258 non-null    object        
dtypes: datetime64[ns](1), object(4)
memory usage: 17.2+ KB


In [38]:
dataNaT.head(10)

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
0,NaT,5MultiSensor 6 (ZW100),HUMIDITY,21000000000,%
1,NaT,5MultiSensor 6 (ZW100),HUMIDITY,20750000000,%
2,NaT,5MultiSensor 6 (ZW100),HUMIDITY,20,%
3,NaT,5MultiSensor 6 (ZW100),HUMIDITY,21,%
4,NaT,5MultiSensor 6 (ZW100),HUMIDITY,21,%
5,2021-03-14 03:55:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
6,2021-03-14 04:10:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
7,2021-03-14 04:25:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%
8,2021-03-14 04:40:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
9,2021-03-14 04:55:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%


In [37]:
dataNaT["Date"][33]

Timestamp('2021-03-14 10:55:00')

### **Data manipulations with Timestamps in pandas**
**Select rows**

In [39]:
data.iloc[1:3]

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
1,2021-03-14 01:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20750000000,%
2,2021-03-14 03:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%


In [None]:
data.loc[(data['Date'] > '2021-03-14 03:14:15') & (data['Date'] < '2021-03-14 15:00:00') & (data['Parameter'] == 'HUMIDITY')]

**Sort values**

In [43]:
data.sort_values(by = ["Date"], ascending=True)

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
0,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),HUMIDITY,21000000000,%
350,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),UV,0,
264,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),TEMPERATURE,20525000000,°C
86,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),BRIGHTNESS,0,Lux
351,2021-03-14 01:10:00,5MultiSensor 6 (ZW100),UV,0,
...,...,...,...,...,...
84,2021-03-14 23:40:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%
85,2021-03-14 23:55:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
349,2021-03-14 23:55:00,5MultiSensor 6 (ZW100),TEMPERATURE,175,°C
171,2021-03-14 23:55:00,5MultiSensor 6 (ZW100),BRIGHTNESS,0,Lux


In [44]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       436 non-null    object
 1   Equipment  436 non-null    object
 2   Parameter  436 non-null    object
 3   Value      436 non-null    object
 4   Unit       258 non-null    object
dtypes: object(5)
memory usage: 17.2+ KB


### **Generate Timestamps at fixed frequency**
*Fixed frequency* consists of data points that occur at regular intervals, like every 5 minutes.

In [45]:
tsff = pd.date_range(start = '1/1/2021', periods = 50, freq = '4h')

In [46]:
tsff

DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 04:00:00',
               '2021-01-01 08:00:00', '2021-01-01 12:00:00',
               '2021-01-01 16:00:00', '2021-01-01 20:00:00',
               '2021-01-02 00:00:00', '2021-01-02 04:00:00',
               '2021-01-02 08:00:00', '2021-01-02 12:00:00',
               '2021-01-02 16:00:00', '2021-01-02 20:00:00',
               '2021-01-03 00:00:00', '2021-01-03 04:00:00',
               '2021-01-03 08:00:00', '2021-01-03 12:00:00',
               '2021-01-03 16:00:00', '2021-01-03 20:00:00',
               '2021-01-04 00:00:00', '2021-01-04 04:00:00',
               '2021-01-04 08:00:00', '2021-01-04 12:00:00',
               '2021-01-04 16:00:00', '2021-01-04 20:00:00',
               '2021-01-05 00:00:00', '2021-01-05 04:00:00',
               '2021-01-05 08:00:00', '2021-01-05 12:00:00',
               '2021-01-05 16:00:00', '2021-01-05 20:00:00',
               '2021-01-06 00:00:00', '2021-01-06 04:00:00',
               '2021-01-

## **Timedeltas**
Timedelta represents the temporal difference between two datetime objects.

In [47]:
pd.Timedelta(weeks = 1, days = 4, hours = 5)

Timedelta('11 days 05:00:00')

### **Timedelta operations**
**Add time to Timestamps**

In [48]:
ts = pd.to_datetime('2021/3/23 23:20:00') + pd.Timedelta(days=-3)

In [49]:
ts

Timestamp('2021-03-20 23:20:00')

**Difference between Timestamps generates a Timedelta**

In [50]:
delta = pd.to_datetime('2021/3/23 23:20:00') - pd.to_datetime('2021/3/20 2:34:14')

In [51]:
delta

Timedelta('3 days 20:45:46')

**Adding Timedeltas**

In [52]:
td1 = pd.Timedelta(weeks = 3, days = 3, hours = 3)
td2 = pd.Timedelta(weeks = 1, days = 1, hours = 1)

In [53]:
td1+td2

Timedelta('32 days 04:00:00')

### **Convert strings to Timedelta**

In [54]:
pd.to_timedelta('23:23:23')

Timedelta('0 days 23:23:23')

## **Practice**

In [None]:
cities = pd.read_csv('top12.csv')

In [None]:
cities

In [None]:
cities.info()

**Question 1**: How would you convert the Incorporated date from string to Timestamp?

In [None]:
cities['Incorporated'] = pd.to_datetime(cities['Incorporated'], format= '%m/%d/%Y')
cities.info()

In [None]:
cities

**Question 2**: How many days between Philadelphia and Dallas incorporated dates?

In [None]:
cities['Incorporated'][7] - cities['Incorporated'][4]

## **Problems**
### **Problem 1: Timestamp limitation**
New York City was incorporated on September 2nd 1664. Convert this date into a Timestamp.

In [None]:
NYC = pd.to_datetime('9-2-1664')

Timestamp limitations: https://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits

#### Python ***datetime*** module
Python provides the date and time functionality in the **datetime** module that contains three popular classes:

- **Date class**: to work with dates (day, month, year)
- **Time class**: to work with times (hours, minutes, seconds, microseconds)
- **Datetime class**: to work with components of both date and time

In [56]:
from datetime import datetime
NYC2 = datetime(1664,9,2)

In [None]:
NYC2

***Convert strings to datetime.datetime objects***

In [60]:
NYC3 = datetime.strptime('2/9/1664', '%d/%m/%Y')

datetime.datetime(1664, 9, 2, 0, 0)

In [None]:
NYC3

***Working with a list of dates***

In [None]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [None]:
[datetime.strptime(x, '%Y-%m-%d') for x in date_list_str]

### **Problem 2: Time zone**
What time is it now?

In [None]:
now = pd.to_datetime('now')

In [None]:
now

In [None]:
now_utc = now.tz_localize('US/Eastern')

In [None]:
now_utc

In [None]:
now_est = now_utc.tz_convert('US/Pacific')

In [None]:
now_est

There is conversion of TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval.  By default, the current time zone for each connection is the server's time. This does not occur for other types such as DATETIME.
#### Python **datetime** module

In [3]:
now = datetime.now()

In [4]:
now

datetime.datetime(2021, 4, 4, 14, 5, 9, 636458)

In [7]:
now.date()

datetime.date(2021, 4, 4)

In [8]:
now.time()

datetime.time(14, 5, 9, 636458)