<a href="https://colab.research.google.com/github/rikanga/Easy-Numpy/blob/main/HandlingDate_and_Times.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Handling Dates and Times**

## 7.1 Converting Strings to Dates

**Problem**

Given a vector of strings representing dates and times, you want to transform them
into time series data.

**Solution**

Use pandas’ to_datetime with the format of the date and/or time specified in the
format parameter:

In [2]:
# Load libraries
import numpy as np, pandas as pd

In [3]:
# Create Date strings
date_strings = np.array([
                         '03-04-2005 11:35 PM',
                         '23-05-2010 12:01 AM',
                         '04-09-2009 09:09 PM'])

In [5]:
[pd.to_datetime(date) for date in date_strings]

[Timestamp('2005-03-04 23:35:00'),
 Timestamp('2010-05-23 00:01:00'),
 Timestamp('2009-04-09 21:09:00')]

In [6]:
pd.to_datetime(date_strings)

DatetimeIndex(['2005-03-04 23:35:00', '2010-05-23 00:01:00',
               '2009-04-09 21:09:00'],
              dtype='datetime64[ns]', freq=None)

## 7.2 Handling Time Zones

**Problem**

You have time series data and want to add or change time zone information.

**Solution**

If not specified, pandas objects have no time zone. However, we can add a time zone
using tz during creation:

In [7]:
# Create datetime with timezone
pd.Timestamp('2017-05-01 06:00:00', tz='Europe/London')

Timestamp('2017-05-01 06:00:00+0100', tz='Europe/London')

In [8]:
# Create datetime
date = pd.Timestamp('2017-05-01 06:00:00')

In [10]:
# Set time zone
date_in_london = date.tz_localize('Europe/London')

In [11]:
# Show datetime
date_in_london

Timestamp('2017-05-01 06:00:00+0100', tz='Europe/London')

In [14]:
# Change the time zone
date_in_london.tz_convert('Africa/Kinshasa')

Timestamp('2017-05-01 06:00:00+0100', tz='Africa/Kinshasa')

In [16]:
# Create three date
dates = pd.Series(pd.date_range('2/2/2002', periods=3, freq='M'))

In [17]:
dates

0   2002-02-28
1   2002-03-31
2   2002-04-30
dtype: datetime64[ns]

In [20]:
dates.dt.tz_localize('Africa/Kinshasa')

0   2002-02-28 00:00:00+01:00
1   2002-03-31 00:00:00+01:00
2   2002-04-30 00:00:00+01:00
dtype: datetime64[ns, Africa/Kinshasa]

In [22]:
# All timezone
from pytz import all_timezones

#SHow timezones
all_timezones[:10]

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau']

## 7.3 Selecting Dates and Times

**Problem**

You have a vector of dates and you want to select one or more.

**Solution**

Use two boolean conditions as the start and end dates:

In [23]:
# Create dateframe
df = pd.DataFrame()

In [24]:
# Create datetimes
df['date'] = pd.date_range('1/1/2001', periods=100000, freq='H')

df.head()

Unnamed: 0,date
0,2001-01-01 00:00:00
1,2001-01-01 01:00:00
2,2001-01-01 02:00:00
3,2001-01-01 03:00:00
4,2001-01-01 04:00:00


In [26]:
# Select observation between two datetimes
df[(df['date'] > '2002-1-1 01:00:00') & (df['date'] <= '2002-1-1 04:00:00')]

Unnamed: 0,date
8762,2002-01-01 02:00:00
8763,2002-01-01 03:00:00
8764,2002-01-01 04:00:00


In [27]:
df = df.set_index('date', drop=False)

In [28]:
df

Unnamed: 0_level_0,date
date,Unnamed: 1_level_1
2001-01-01 00:00:00,2001-01-01 00:00:00
2001-01-01 01:00:00,2001-01-01 01:00:00
2001-01-01 02:00:00,2001-01-01 02:00:00
2001-01-01 03:00:00,2001-01-01 03:00:00
2001-01-01 04:00:00,2001-01-01 04:00:00
...,...
2012-05-29 11:00:00,2012-05-29 11:00:00
2012-05-29 12:00:00,2012-05-29 12:00:00
2012-05-29 13:00:00,2012-05-29 13:00:00
2012-05-29 14:00:00,2012-05-29 14:00:00


In [29]:
df['2002-1-1 01:00:00':'2002-1-1 04:00:00']

Unnamed: 0_level_0,date
date,Unnamed: 1_level_1
2002-01-01 01:00:00,2002-01-01 01:00:00
2002-01-01 02:00:00,2002-01-01 02:00:00
2002-01-01 03:00:00,2002-01-01 03:00:00
2002-01-01 04:00:00,2002-01-01 04:00:00


## 7.4 Breaking Up Date Data into Multiple Features

**Problem**

You have a column of dates and times and you want to create features for year,
month, day, hour, and minute.

**Solution**

Use pandas Series.dt ’s time properties:

In [31]:
# Create data frame
df = pd.DataFrame()

In [32]:
# create five dates
df['date'] = pd.date_range('1/1/2001', periods=150, freq='W')

In [35]:
# Create features for year, month, day,, hour, and minute
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df["day"] = df['date'].dt.day
df["hour"] = df['date'].dt.hour
df["minute"] = df['date'].dt.minute

In [36]:
df

Unnamed: 0,date,year,month,day,hour,minute
0,2001-01-07,2001,1,7,0,0
1,2001-01-14,2001,1,14,0,0
2,2001-01-21,2001,1,21,0,0
3,2001-01-28,2001,1,28,0,0
4,2001-02-04,2001,2,4,0,0
...,...,...,...,...,...,...
145,2003-10-19,2003,10,19,0,0
146,2003-10-26,2003,10,26,0,0
147,2003-11-02,2003,11,2,0,0
148,2003-11-09,2003,11,9,0,0


## Calculating the Difference Beatween Dates

**Problem**

You have two datetime features and want to calculate the time between them for each
observation.

**Solution**

Subtract the two date features using pandas:

In [38]:
# Create dataa frame
df = pd.DataFrame()

In [42]:
# Create to datetime features
df['Arrived'] = [pd.Timestamp('01-01-2017'), pd.Timestamp('01-04-2017')]
df['Left'] = [pd.Timestamp('01-01-2017'), pd.Timestamp('01-06-2017')]

In [43]:
# Show the dataframe
df

Unnamed: 0,Arrived,Left
0,2017-01-01,2017-01-01
1,2017-01-04,2017-01-06


In [44]:
# Calculate the difference between features
df['Left'] - df["Arrived"]

0   0 days
1   2 days
dtype: timedelta64[ns]

In [46]:
# Calculate duration between features
pd.Series(delta.days for delta in (df['Left'] - df['Arrived']))

0    0
1    2
dtype: int64

## Encoding Days of the Week

**Problem**

You have a vector of dates and want to know the day of the week for each date.

**Solution**

Use pandas’ Series.dt property weekday_name :


In [47]:
date = pd.Series(pd.date_range('2/2/2002', periods=3, freq='M'))

In [54]:
# Show the day of the week
date.dt.weekday

0    3
1    6
2    1
dtype: int64

## 7.7 Creating a Lagged Feature

**Problem**

You want to create a feature that is lagged n time periods.
(Créer une caractéristiques sui est en retard de n périodes)

**Solution**

Use pandas’ shift :


In [55]:
# Create  data frame
df = pd.DataFrame()

In [56]:
# Create data
df['date'] = pd.date_range('1/1/2001', periods=5, freq='D')
df["stock_price"] = [1.1, 2.2, 3.3, 4.4, 5.5]

df

Unnamed: 0,date,stock_price
0,2001-01-01,1.1
1,2001-01-02,2.2
2,2001-01-03,3.3
3,2001-01-04,4.4
4,2001-01-05,5.5


In [57]:
# Lagged values by one row
df['previous_days_stock_price'] = df['stock_price'].shift(1)

In [58]:
# Show data frame
df

Unnamed: 0,date,stock_price,previous_days_stock_price
0,2001-01-01,1.1,
1,2001-01-02,2.2,1.1
2,2001-01-03,3.3,2.2
3,2001-01-04,4.4,3.3
4,2001-01-05,5.5,4.4
