# **Introduction to Dates & Times with pandas**

This jupyter notebook can be found on my GitHub account: https://github.com/mbonnemaison/Learning-Python/tree/master/Learning_pandas

### Sources:
- Information to install pandas, introduce pandas and the user guide: https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html
- Python for Data Analysis by Wes McKinney (2nd edition used here) - Chapter 5 (Introduction), Chapter 11 (Time Series)

### **pandas** is a python library that facilitates data analysis organized in a table.

In [2]:
import pandas as pd

Some of the elementary data structures for working with date & time data are:

- **Timestamp** : specific instant in time
- **Timedelta**: Interval of time indicated by a start and end timestamp.
- **Period**: Fixed duration in time

## **Timestamp**
***Timestamp*** is pandas equivalent of python’s datetime.datetime object and is interchangeable with it in most cases. Timestamps can be substituted anywhere you would use ***datetime*** objects.

### **Convert strings to timestamps**
Strings can be converted to dates using **pd.to_datetime**.

Note: Information on format can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [3]:
mytimestamp = '2021/10/23 4:34:2'

In [4]:
mytimestamp

'2021/10/23 4:34:2'

In [5]:
pd.to_datetime(mytimestamp)

Timestamp('2021-10-23 04:34:02')

In [6]:
pd.to_datetime('02-19-2021 22:45:56', format = '%m-%d-%Y %H:%M:%S')

Timestamp('2021-02-19 22:45:56')

### **Convert a list of dates from string to Timestamp**

In [7]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-2-19']

In [8]:
date_list_str

['2021-03-14', '2020-12-25', '2025-2-19']

In [9]:
[pd.to_datetime(x) for x in date_list_str]

[Timestamp('2021-03-14 00:00:00'),
 Timestamp('2020-12-25 00:00:00'),
 Timestamp('2025-02-19 00:00:00')]

In [12]:
pd.to_datetime(date_list_str)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19'], dtype='datetime64[ns]', freq=None)

### **Dealing with missing values**

In [13]:
date_list_str2 = ['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [13]:
date_list_str2

['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [14]:
pd.to_datetime(date_list_str2)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', 'NaT'], dtype='datetime64[ns]', freq=None)

**NaT** means Not a Time

### **Reading data from a csv file using pandas**

In [15]:
data = pd.read_csv("24h_2021-03-14.csv",  sep = '\t')

Link to user guide for **pd.read_csv()**: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

In [16]:
data

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
0,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),HUMIDITY,21000000000,%
1,2021-03-14 01:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20750000000,%
2,2021-03-14 03:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%
3,2021-03-14 03:25:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
4,2021-03-14 03:40:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
...,...,...,...,...,...
431,2021-03-14 22:55:00,5MultiSensor 6 (ZW100),UV,0,
432,2021-03-14 23:10:00,5MultiSensor 6 (ZW100),UV,0,
433,2021-03-14 23:25:00,5MultiSensor 6 (ZW100),UV,0,
434,2021-03-14 23:40:00,5MultiSensor 6 (ZW100),UV,0,


In [17]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       436 non-null    object
 1   Equipment  436 non-null    object
 2   Parameter  436 non-null    object
 3   Value      436 non-null    object
 4   Unit       258 non-null    object
dtypes: object(5)
memory usage: 17.2+ KB


### **Select columns**

In [19]:
data['Date']
#The output is a Series, i.e. a 1-column table

'2021-03-14 03:55:00'

### **Convert values in the "Date" column from string to Timestamp**

In [20]:
data["Date"] = pd.to_datetime(data["Date"])

In [21]:
data["Date"]

0     2021-03-14 00:10:00
1     2021-03-14 01:10:00
2     2021-03-14 03:10:00
3     2021-03-14 03:25:00
4     2021-03-14 03:40:00
              ...        
431   2021-03-14 22:55:00
432   2021-03-14 23:10:00
433   2021-03-14 23:25:00
434   2021-03-14 23:40:00
435   2021-03-14 23:55:00
Name: Date, Length: 436, dtype: datetime64[ns]

In [22]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       436 non-null    datetime64[ns]
 1   Equipment  436 non-null    object        
 2   Parameter  436 non-null    object        
 3   Value      436 non-null    object        
 4   Unit       258 non-null    object        
dtypes: datetime64[ns](1), object(4)
memory usage: 17.2+ KB


***Missing values in DataFrame...***

In [None]:
dataNaT = pd.read_csv("24h_2021-03-14_NaT.csv", sep = '\t')

In [None]:
dataNaT.head(10)

In [None]:
dataNaT.info()

In [None]:
dataNaT["Date"] = pd.to_datetime(dataNaT["Date"])

In [None]:
dataNaT.head(10)

In [None]:
dataNaT.info()

### **Generate Timestamps at fixed frequency**
*Fixed frequency* consists of data points that occur at regular intervals, like every 5 minutes.

In [23]:
pd.date_range(start = '1/1/2021', periods = 50, freq = '4h')

DatetimeIndex(['2021-01-01 02:00:05', '2021-01-01 06:00:05',
               '2021-01-01 10:00:05', '2021-01-01 14:00:05',
               '2021-01-01 18:00:05', '2021-01-01 22:00:05',
               '2021-01-02 02:00:05', '2021-01-02 06:00:05',
               '2021-01-02 10:00:05', '2021-01-02 14:00:05',
               '2021-01-02 18:00:05', '2021-01-02 22:00:05',
               '2021-01-03 02:00:05', '2021-01-03 06:00:05',
               '2021-01-03 10:00:05', '2021-01-03 14:00:05',
               '2021-01-03 18:00:05', '2021-01-03 22:00:05',
               '2021-01-04 02:00:05', '2021-01-04 06:00:05',
               '2021-01-04 10:00:05', '2021-01-04 14:00:05',
               '2021-01-04 18:00:05', '2021-01-04 22:00:05',
               '2021-01-05 02:00:05', '2021-01-05 06:00:05',
               '2021-01-05 10:00:05', '2021-01-05 14:00:05',
               '2021-01-05 18:00:05', '2021-01-05 22:00:05',
               '2021-01-06 02:00:05', '2021-01-06 06:00:05',
               '2021-01-

## **Timedeltas**
Timedelta represents the temporal difference between two datetime objects.

In [24]:
pd.Timedelta(weeks = 1, days = 4, hours = 5)

Timedelta('11 days 05:00:00')

### **Timedelta operations**
**Add time to Timestamps**

In [25]:
pd.to_datetime('2021/3/23 3:20:00') + pd.Timedelta(days = 3, hours = 7)

Timestamp('2021-03-26 10:20:00')

**Difference between Timestamps generates a Timedelta**

In [26]:
pd.to_datetime('2021/3/23 23:20:00') - pd.to_datetime('2021/3/20 2:34:14')

Timedelta('3 days 20:45:46')

In [28]:
#How many days are in the month of february?
t1 = pd.to_datetime('2020-2-01 12:00:00')
t2 = pd.to_datetime('2020-3-01 12:00:00')
t2 - t1

Timedelta('29 days 00:00:00')

*Falsehoods programmers believe about time* - this link lists misconceptions we have about time: https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b923ca

**Adding Timedeltas**

In [29]:
td1 = pd.Timedelta(weeks = 3, days = 3, hours = 3)
td2 = pd.Timedelta(weeks = 1, days = 1, hours = 1)

In [30]:
td2+td1

Timedelta('32 days 04:00:00')

### **Convert strings to Timedelta**

In [29]:
pd.to_timedelta('4 days 45:53:23')

Timedelta('5 days 21:53:23')

### **Generate Timedeltas at fixed frequency**

In [31]:
pd.timedelta_range(start = '1 day', periods = 20, freq = '10H')

TimedeltaIndex(['1 days 00:00:00', '1 days 10:00:00', '1 days 20:00:00',
                '2 days 06:00:00', '2 days 16:00:00', '3 days 02:00:00',
                '3 days 12:00:00', '3 days 22:00:00', '4 days 08:00:00',
                '4 days 18:00:00', '5 days 04:00:00', '5 days 14:00:00',
                '6 days 00:00:00', '6 days 10:00:00', '6 days 20:00:00',
                '7 days 06:00:00', '7 days 16:00:00', '8 days 02:00:00',
                '8 days 12:00:00', '8 days 22:00:00'],
               dtype='timedelta64[ns]', freq='10H')

## ***Time periods*** 

*Periods* can be thought of as special cases of intervals.

Example of periods: the month of March 2021 or the year 2020

### **Generate Time Periods**

In [32]:
pd.Period(2020)

Period('2020', 'A-OCT')

### **Generate Time Periods at fixed frequency**

In [32]:
pd.period_range(start='2000-01-01', end='2020-01-01', freq='M')

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06',
             '2000-07', '2000-08', '2000-09', '2000-10',
             ...
             '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09',
             '2019-10', '2019-11', '2019-12', '2020-01'],
            dtype='period[M]', length=241, freq='M')

## **Going further**

### **Timestamp limitation**
New York City was incorporated on September 2nd 1664. Convert this date into a Timestamp.

In [33]:
pd.to_datetime('9-2-1664')

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1664-09-02 00:00:00

Timestamp limitations: https://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits

#### Python ***datetime*** module
Python provides the date and time functionality in the **datetime** module that contains the following popular classes:

- **Date class**: to work with dates (day, month, year)
- **Time class**: to work with times (hours, minutes, seconds, microseconds)
- **Datetime class**: to work with components of both date and time
- **Timedelta class**: to work with timedeltas

In [39]:
from datetime import datetime

In [40]:
datetime(1664,9,2)

datetime.datetime(1664, 9, 2, 0, 0)

***Convert strings to datetime.datetime objects***

In [41]:
datetime.strptime('2/9/1664', '%d/%m/%Y')

datetime.datetime(1664, 9, 2, 0, 0)

***Working with a list of dates***

In [None]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [None]:
[datetime.strptime(x, '%Y-%m-%d') for x in date_list_str]

***Convert Incorporated dates into datetime.datetime objects***

In [45]:
us_cities = pd.read_csv('top12.csv')
us_cities

Unnamed: 0,Cities,State,Population,Density(/sq mi),Incorporated
0,New York City,New York,8336817,28317,9/2/1664
1,Los Angeles,California,3979576,8484,4/4/1850
2,Chicago,Illinois,2693976,11900,3/4/1837
3,Houston,Texas,2320268,3613,6/5/1837
4,Phoenix,Arizona,1680992,3120,2/25/1881
5,Philadelphia,Pennsylvania,1584064,11683,10/25/1701
6,San Antonio,Texas,1547253,3238,6/5/1837
7,San Diego,California,1423851,4325,3/27/1850
8,Dallas,Texas,1343573,3866,2/2/1856
9,San Jose,California,1021795,5777,3/27/1850


In [46]:
us_cities.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Cities           12 non-null     object
 1   State            12 non-null     object
 2   Population       12 non-null     int64 
 3   Density(/sq mi)  12 non-null     int64 
 4   Incorporated     12 non-null     object
dtypes: int64(2), object(3)
memory usage: 608.0+ bytes


In [None]:
us_cities['Incorporated']

In [44]:
[datetime.strptime(x, '%m/%d/%Y') for x in us_cities['Incorporated']]

[datetime.datetime(1664, 9, 2, 0, 0),
 datetime.datetime(1850, 4, 4, 0, 0),
 datetime.datetime(1837, 3, 4, 0, 0),
 datetime.datetime(1837, 6, 5, 0, 0),
 datetime.datetime(1881, 2, 25, 0, 0),
 datetime.datetime(1701, 10, 25, 0, 0),
 datetime.datetime(1837, 6, 5, 0, 0),
 datetime.datetime(1850, 3, 27, 0, 0),
 datetime.datetime(1856, 2, 2, 0, 0),
 datetime.datetime(1850, 3, 27, 0, 0),
 datetime.datetime(1839, 12, 27, 0, 0),
 datetime.datetime(1832, 2, 9, 0, 0)]

### **Time zone**
What time is it now?

In [None]:
now = pd.to_datetime('now')
now

In [None]:
now_utc = now.tz_localize('UTC')

In [None]:
now_utc

In [None]:
now_est = now_utc.tz_convert('US/Eastern')

In [None]:
now_est