# **Introduction to Dates & Times with pandas**

This jupyter notebook can be found on my GitHub account: https://github.com/mbonnemaison/Learning-Python/tree/master/Learning_pandas
### **pandas** is a python library that facilitates data analysis organized in a table.

### Sources:
- Information to install pandas, introduce pandas and the user guide: https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html
- Python for Data Analysis by Wes McKinney (2nd edition used here) - Chapter 5 (Introduction), Chapter 11 (Time Series)

Some of the elementary data structures for working with date & time data are:

- **Timestamp** : specific instant in time
- **Timedelta**: Interval of time indicated by a start and end timestamp.

## **Timestamp**
***Timestamp*** is pandas equivalent of python’s datetime.datetime object and is interchangeable with it in most cases. Timestamps can be substituted anywhere you would use ***datetime*** objects.

In [1]:
import pandas as pd

### **Convert strings to timestamps**
Strings can be converted to dates using **pd.to_datetime**.

Note: Information on format can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [2]:
mytimestamp = '2021/10/23 4:34:2'

In [3]:
mytimestamp

'2021/10/23 4:34:2'

In [4]:
pd.to_datetime(mytimestamp)

Timestamp('2021-10-23 04:34:02')

In [5]:
pd.to_datetime('02-19-2021 22:45:56', format = '%m-%d-%Y %H:%M:%S')

Timestamp('2021-02-19 22:45:56')

### **Convert a list of dates from string to Timestamp**

In [6]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [7]:
date_list_str

['2021-03-14', '2020-12-25', '2025-02-19']

In [8]:
[pd.to_datetime(x) for x in date_list_str]

[Timestamp('2021-03-14 00:00:00'),
 Timestamp('2020-12-25 00:00:00'),
 Timestamp('2025-02-19 00:00:00')]

In [11]:
pd.to_datetime(date_list_str)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19'], dtype='datetime64[ns]', freq=None)

### **Dealing with missing values**

In [12]:
date_list_str2 = ['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [13]:
date_list_str2

['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [14]:
pd.to_datetime(date_list_str2)

DatetimeIndex(['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', 'NaT'], dtype='datetime64[ns]', freq=None)

**NaT** means Not a Time

### **Reading data from a csv file using pandas**
More information on the project where the csv file comes from: https://github.com/mbonnemaison/adelego

In [15]:
data = pd.read_csv("24h_2021-03-14.csv",  sep = '\t')

In [16]:
data

Unnamed: 0,Date,Equipment,Parameter,Value,Unit
0,2021-03-14 00:10:00,5MultiSensor 6 (ZW100),HUMIDITY,21000000000,%
1,2021-03-14 01:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20750000000,%
2,2021-03-14 03:10:00,5MultiSensor 6 (ZW100),HUMIDITY,20,%
3,2021-03-14 03:25:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
4,2021-03-14 03:40:00,5MultiSensor 6 (ZW100),HUMIDITY,21,%
...,...,...,...,...,...
431,2021-03-14 22:55:00,5MultiSensor 6 (ZW100),UV,0,
432,2021-03-14 23:10:00,5MultiSensor 6 (ZW100),UV,0,
433,2021-03-14 23:25:00,5MultiSensor 6 (ZW100),UV,0,
434,2021-03-14 23:40:00,5MultiSensor 6 (ZW100),UV,0,


Link to user guide for **pd.read_csv()**: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

In [17]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 436 entries, 0 to 435
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       436 non-null    object
 1   Equipment  436 non-null    object
 2   Parameter  436 non-null    object
 3   Value      436 non-null    object
 4   Unit       258 non-null    object
dtypes: object(5)
memory usage: 17.2+ KB


### **Select columns**

In [18]:
data['Date']
#The output is a Series, i.e. a 1-column table

0      2021-03-14 00:10:00
1      2021-03-14 01:10:00
2      2021-03-14 03:10:00
3      2021-03-14 03:25:00
4      2021-03-14 03:40:00
              ...         
431    2021-03-14 22:55:00
432    2021-03-14 23:10:00
433    2021-03-14 23:25:00
434    2021-03-14 23:40:00
435    2021-03-14 23:55:00
Name: Date, Length: 436, dtype: object

### **Convert values in the "Date" column from string to Timestamp**

In [None]:
data['Date'] = pd.to_datetime(data["Date"])

In [None]:
data["Date"]

In [None]:
data.info()

***Missing values in DataFrame...***

In [None]:
dataNaT = pd.read_csv("24h_2021-03-14_NaT.csv", sep = '\t')

In [None]:
dataNaT.head(10)

In [None]:
dataNaT.info()

In [None]:
dataNaT["Date"] = pd.to_datetime(dataNaT["Date"])

In [None]:
dataNaT.head(10)

In [None]:
dataNaT.info()

### **Generate Timestamps at fixed frequency**
*Fixed frequency* consists of data points that occur at regular intervals, like every 5 minutes.

In [None]:
pd.date_range(start = '1/1/2021', periods = 50, freq = '4h')

## **Timedeltas**
Timedelta represents the temporal difference between two datetime objects.

In [None]:
pd.Timedelta(weeks = 1, days = 4, hours = 5)

### **Timedelta operations**
**Add time to Timestamps**

In [None]:
pd.to_datetime('2021/3/23 3:20:00') + pd.Timedelta(days = 3, hours = 7)

**Difference between Timestamps generates a Timedelta**

In [None]:
pd.to_datetime('2021/3/23 23:20:00') - pd.to_datetime('2021/3/20 2:34:14')

**Adding Timedeltas**

In [None]:
td1 = pd.Timedelta(weeks = 3, days = 3, hours = 3)
td2 = pd.Timedelta(weeks = 1, days = 1, hours = 1)

In [None]:
td2-td1

### **Convert strings to Timedelta**

In [None]:
pd.to_timedelta('4 days 45:53:23')

### **Generate Timedeltas at fixed frequency**

In [None]:
pd.timedelta_range(start = '1 day', periods = 50, freq = '10H')

## ***Time periods*** 

*Periods* can be thought of as special cases of intervals.

Example of periods: the month of March 2021 or the year 2020

### **Generate Time Periods**

In [None]:
pd.Period(2020)

### **Generate Time Periods at fixed frequency**

In [None]:
pd.period_range(start='2000-01-01', end='2020-01-01', freq='M')

## **Going further**
### **Falsehoods programmers believe about time**
This link lists misconceptions we have about time: https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b923ca

**February is 28 days long**

In [None]:
t1 = pd.to_datetime('2021-3-01 12:00:00')
t2 = pd.to_datetime('2021-2-01 12:00:00')
t1 - t2

### **Timestamp limitation**
New York City was incorporated on September 2nd 1664. Convert this date into a Timestamp.

In [None]:
NYC = pd.to_datetime('9-2-1664')

Timestamp limitations: https://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits

#### Python ***datetime*** module
Python provides the date and time functionality in the **datetime** module that contains the following popular classes:

- **Date class**: to work with dates (day, month, year)
- **Time class**: to work with times (hours, minutes, seconds, microseconds)
- **Datetime class**: to work with components of both date and time
- **Timedelta class**: to work with timedeltas

In [None]:
from datetime import datetime

In [None]:
datetime(1664,9,2)

***Convert strings to datetime.datetime objects***

In [None]:
datetime.strptime('2/9/1664', '%d/%m/%Y')

***Working with a list of dates***

In [None]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [None]:
[datetime.strptime(x, '%Y-%m-%d') for x in date_list_str]

***Convert Incorporated dates into datetime.datetime objects***

In [None]:
us_cities = pd.read_csv('top12.csv')

In [None]:
us_cities

In [None]:
us_cities.info()

In [None]:
us_cities['Incorporated']

In [None]:
[datetime.strptime(x, '%m/%d/%Y') for x in us_cities['Incorporated']]

### **Time zone**
What time is it now?

In [None]:
now = pd.to_datetime('now')
now

In [None]:
now_utc = now.tz_localize('UTC')

In [None]:
now_utc

In [None]:
now_est = now_utc.tz_convert('US/Eastern')

In [None]:
now_est