**CURSO**: *Machine Learning* en Geociencias<br />
**Profesor**: Edier Aristizábal (evaristizabalg@unal.edu.co) <br />
**Credits**: The content of this notebook is taken from several sources: Soner Yıldırım, Bex T, Manuel Hupperich, Youssef Hosni and Piero Paialunga en www.towardsdatascience.com. Every effort has been made to trace copyright holders of the materials used in this book. The author apologies for any unintentional omissions and would be pleased to add an acknowledgment in future editions.

## Working with Time Series in Pandas

The basic building block of creating a time series data in python using Pandas time stamp (pd.Timestamp) which is shown in the example below:

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns

**Timestamped** data is the most basic type of time series data that associates values with points in time. For pandas objects it means using the points in time.

In [None]:
from datetime import datetime # To manually create dates

In [None]:
time1= pd.Timestamp(datetime(2017, 1, 1))
time2= pd.Timestamp(2017, 1, 1)
time3= pd.Timestamp("2012-05-01")
print(time2)
print(type(time2))
print(time2.year)

2017-01-01 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
2017


In [None]:
day = pd.Timestamp("1976-11-13")
day.day_name()

'Saturday'

In [None]:
dayplus=day+pd.Timedelta("2 day")
dayplus

Timestamp('1976-11-15 00:00:00')

In [None]:
pd.to_datetime("2010/11/12")

Timestamp('2010-11-12 00:00:00')

In [None]:
pd.to_datetime([1349720105, 1349806505, 1349892905, 1349979305, 1350065705], unit="s", origin="1950-01-01")

DatetimeIndex(['1992-10-08 18:15:05', '1992-10-09 18:15:05',
               '1992-10-10 18:15:05', '1992-10-11 18:15:05',
               '1992-10-12 18:15:05'],
              dtype='datetime64[ns]', freq=None)

In [None]:
pd.to_datetime("12-2010-11 00:00", format="%d-%Y-%m %H:%M")

Timestamp('2010-11-12 00:00:00')

In [None]:
pd.Timestamp("2010/11/12")

Timestamp('2010-11-12 00:00:00')

In [None]:
pd.DatetimeIndex("2010/11/12")

In [None]:
pd.DatetimeIndex(["2018-01-01"])

DatetimeIndex(['2018-01-01'], dtype='datetime64[ns]', freq=None)

In [None]:
pd.DatetimeIndex(["2018-01-01", "2018-01-03", "2018-01-05"], freq="infer")

DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], dtype='datetime64[ns]', freq='2D')

In [None]:
dti = pd.to_datetime(["1/1/2018", np.datetime64("2018-01-01"), datetime(2018, 1, 1), "2005/11/23", "2010.12.31", "Jul 31, 2009", "2010-01-10", None])
dti

DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01', '2005-11-23',
               '2010-12-31', '2009-07-31', '2010-01-10',        'NaT'],
              dtype='datetime64[ns]', freq=None)

In [None]:
dti = dti.tz_localize("UTC")
dti

DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 00:00:00+00:00',
               '2018-01-01 00:00:00+00:00', '2005-11-23 00:00:00+00:00',
               '2010-12-31 00:00:00+00:00', '2009-07-31 00:00:00+00:00',
               '2010-01-10 00:00:00+00:00',                       'NaT'],
              dtype='datetime64[ns, UTC]', freq=None)

In [None]:
dti.tz_convert("US/Pacific")

DatetimeIndex(['2017-12-31 16:00:00-08:00', '2017-12-31 16:00:00-08:00',
               '2017-12-31 16:00:00-08:00'],
              dtype='datetime64[ns, US/Pacific]', freq=None)

In [None]:
index = pd.date_range(start='2017-1-1', periods=12, freq='M')
index

DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
               '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'],
              dtype='datetime64[ns]', freq='M')

In [None]:
serie = pd.Series(pd.date_range(start='2017-1-1', periods=12, freq='M'))
serie

0    2017-01-31
1    2017-02-28
2    2017-03-31
3    2017-04-30
4    2017-05-31
5    2017-06-30
6    2017-07-31
7    2017-08-31
8    2017-09-30
9    2017-10-31
10   2017-11-30
11   2017-12-31
dtype: datetime64[ns]

In [None]:
serie = pd.Series(np.random.randn(12), pd.date_range(start='2017-1-1', periods=12, freq='M'))
serie

2017-01-31   -0.407790
2017-02-28   -0.865237
2017-03-31    0.160302
2017-04-30    1.309122
2017-05-31    0.216034
2017-06-30    0.193793
2017-07-31   -1.203934
2017-08-31   -0.971580
2017-09-30    0.974273
2017-10-31   -1.230733
2017-11-30    0.634818
2017-12-31   -1.296220
Freq: M, dtype: float64

In [3]:
df = pd.DataFrame({'name':['john','mary','peter','jeff','bill'], 'date_of_birth':['2000-01-01', '1999-12-20', '2000-11-01', '1995-02-25', '1992-06-30']})
df

Unnamed: 0,name,date_of_birth
0,john,2000-01-01
1,mary,1999-12-20
2,peter,2000-11-01
3,jeff,1995-02-25
4,bill,1992-06-30


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           5 non-null      object
 1   date_of_birth  5 non-null      object
dtypes: object(2)
memory usage: 208.0+ bytes


In [5]:
print(df.index)

RangeIndex(start=0, stop=5, step=1)


In [6]:
datetime_series = pd.to_datetime(df['date_of_birth'])
datetime_index = pd.DatetimeIndex(datetime_series.values)
df2=df.set_index(datetime_index)
df2.drop('date_of_birth',axis=1,inplace=True)
df2

Unnamed: 0,name
2000-01-01,john
1999-12-20,mary
2000-11-01,peter
1995-02-25,jeff
1992-06-30,bill


In [7]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5 entries, 2000-01-01 to 1992-06-30
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    5 non-null      object
dtypes: object(1)
memory usage: 80.0+ bytes


In [None]:
print(df2.index)

DatetimeIndex(['2000-01-01', '1999-12-20', '2000-11-01', '1995-02-25',
               '1992-06-30'],
              dtype='datetime64[ns]', freq=None)


In [None]:
df2.sort_index(inplace=True)
df2

Unnamed: 0,name
1992-06-30,bill
1995-02-25,jeff
1999-12-20,mary
2000-01-01,john
2000-11-01,peter


In [None]:
df2.asfreq('D')

Unnamed: 0,name
1992-06-30,bill
1992-07-01,
1992-07-02,
1992-07-03,
1992-07-04,
...,...
2000-10-28,
2000-10-29,
2000-10-30,
2000-10-31,


In [None]:
df = pd.DataFrame({"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]})
pd.to_datetime(df)

0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

In [None]:
pd.to_datetime(df[["year", "month", "day"]])

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

In [None]:
pd.to_datetime(['2009/07/31', 'asd'], errors='raise')

In [None]:
pd.to_datetime(["2009/07/31", "asd"], errors="ignore")

Index(['2009/07/31', 'asd'], dtype='object')

In [None]:
pd.to_datetime(["2009/07/31", "asd"], errors="coerce")

DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)