# **Introduction to Dates & Time with pandas**

This jupyter notebook can be found on my GitHub account: https://github.com/mbonnemaison/Learning-Python/tree/master/Learning_pandas
### **pandas** is a python library that facilitates data analysis organized in a table.

### Sources:
- Information to install pandas, introduce pandas and the user guide: https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html
- Python for Data Analysis by Wes McKinney (2nd edition used here) - Chapter 5 (Introduction), Chapter 11 (Time Series)

## **Introduction to Time & Dates**
Some of the elementary data structures for working with date & time data are:

- **Timestamp** : specific instant in time
- **Timedelta**: Interval of time indicated by a start and end timestamp.

### **Timestamp**

***Timestamp*** is pandas equivalent of python’s datetime.datetime object and is interchangeable with it in most cases.

In [None]:
import pandas as pd

### **Convert strings to timestamps**
Strings can be converted to dates using **pd.to_datetime**.

Note: Information on format can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [None]:
mytimestamp = '2021/10/23 4:34:2'

In [None]:
mytimestamp

In [None]:
mytimestamp_real = pd.to_datetime(mytimestamp)

In [None]:
mytimestamp_real

In [None]:
pd.to_datetime('2021-02-19 22:45:56', format = '%Y-%m-%d')

In [None]:
pd.to_datetime('20210223232323')

### **Convert a list of dates from string to Timestamp**

In [None]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [None]:
date_list_str

In [None]:
pd.to_datetime(date_list_str)

### **Dealing with missing values**

In [None]:
date_list_str2 = ['2021-03-14', '2020-12-25', '2025-02-19', '2021-04-14', None]

In [None]:
date_list_str2

In [None]:
pd.to_datetime(date_list_str2)

**NaT** means Not a Time

### **Reading data from a csv file using pandas**
More information on the project where the csv file comes from: https://github.com/mbonnemaison/adelego

In [None]:
data = pd.read_csv("24h_2021-03-14.csv",  sep = '\t')

In [None]:
data

Link to user guide for **pd.read_csv()**: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

In [None]:
data.head()

In [None]:
data.info()

### **Select columns**

In [None]:
data['Date']
#The output is a Series, i.e. a 1-column table

### **Convert values in the "Date" column from string to Timestamp**

In [None]:
data['Date'] = pd.to_datetime(data["Date"])

In [None]:
data["Date"]

In [None]:
data.info()

***Missing values in DataFrame...***

In [None]:
dataNaT = pd.read_csv("24h_2021-03-14_NaT.csv", sep = '\t')

In [None]:
dataNaT.head(10)

In [None]:
dataNaT.info()

In [None]:
dataNaT["Date"] = pd.to_datetime(dataNaT["Date"])

In [None]:
dataNaT.info()

In [None]:
dataNaT.head(10)

In [None]:
dataNaT["Date"][33]

### **Generate Timestamps at fixed frequency**
*Fixed frequency* consists of data points that occur at regular intervals, like every 5 minutes.

In [None]:
tsff = pd.date_range(start = '1/1/2021', periods = 50, freq = '4h')

In [None]:
tsff

## **Timedeltas**
Timedelta represents the temporal difference between two datetime objects.

In [None]:
pd.Timedelta(weeks = 1, days = 4, hours = 5)

### **Timedelta operations**
**Add time to Timestamps**

In [None]:
ts = pd.to_datetime('2021/3/23 3:20:00') + pd.Timedelta(days=3, hours = 7)

In [None]:
ts

**Difference between Timestamps generates a Timedelta**

In [None]:
delta = pd.to_datetime('2021/3/23 23:20:00') - pd.to_datetime('2021/3/20 2:34:14')

In [None]:
delta

**Adding Timedeltas**

In [None]:
td1 = pd.Timedelta(weeks = 3, days = 3, hours = 3)
td2 = pd.Timedelta(weeks = 1, days = 1, hours = 1)

In [None]:
td1+td2

### **Convert strings to Timedelta**

In [None]:
pd.to_timedelta('45:53:23')

## **Going further**
### ***Time periods*** 

*Periods* can be thought of as special cases of intervals.

Example of periods: the month of March 2021 or the year 2020

### **Generate Time Periods**

In [None]:
tp = pd.Period(2020, freq='A-OCT')
#A-OCT means that we are looking at a period starting on 1/1/2020 and ending on 10/31/2020.

In [None]:
tp

### **Generate Time Periods at fixed frequency**

In [None]:
tp2 = pd.period_range(start='2000-01-01', end='2020-01-01', freq='A-OCT')

In [None]:
tp2

## **Problems**
### **Problem 1: Timestamp limitation**
New York City was incorporated on September 2nd 1664. Convert this date into a Timestamp.

In [None]:
NYC = pd.to_datetime('9-2-1664')

Timestamp limitations: https://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits

#### Python ***datetime*** module
Python provides the date and time functionality in the **datetime** module that contains three popular classes:

- **Date class**: to work with dates (day, month, year)
- **Time class**: to work with times (hours, minutes, seconds, microseconds)
- **Datetime class**: to work with components of both date and time

In [None]:
from datetime import datetime
NYC2 = datetime(1664,9,2)

In [None]:
NYC2

***Convert strings to datetime.datetime objects***

In [None]:
NYC3 = datetime.strptime('2/9/1664', '%d/%m/%Y')

In [None]:
NYC3

***Working with a list of dates***

In [None]:
date_list_str = ['2021-03-14', '2020-12-25', '2025-02-19']

In [None]:
[datetime.strptime(x, '%Y-%m-%d') for x in date_list_str]

***Convert Incorporated dates into datetime.datetime objects***

In [None]:
us_cities = pd.read_csv('top12.csv')

In [None]:
us_cities.info()

In [None]:
[datetime.strptime(x, '%m/%d/%Y') for x in us_cities['Incorporated']]

### **Problem 2: Time zone**
What time is it now?

In [None]:
now = pd.to_datetime('now')

In [None]:
now

In [None]:
now_utc = now.tz_localize('UTC')

In [None]:
now_utc

In [None]:
now_est = now_utc.tz_convert('US/Eastern')

In [None]:
now_est

#### Python **datetime** module

In [None]:
now = datetime.now()

In [None]:
now

In [None]:
now.date()

In [None]:
now.time()

In [None]:
now.hour

## **Practice**

In [None]:
us_cities = pd.read_csv('top12.csv')

In [None]:
us_cities

In [None]:
cities.info()

**Question 1**: How would you convert the Incorporated date from string to Timestamp?

In [None]:
us_cities['Incorporated'] = pd.to_datetime(us_cities['Incorporated'], format= '%m/%d/%Y')
us_cities.info()

**Question 2**: How many days between Philadelphia and Dallas incorporated dates?

In [None]:
us_cities

In [None]:
us_cities['Incorporated'][7] - us_cities['Incorporated'][4]