# Pandas: Time Series

In [1]:
import numpy as np
import pandas as pd

In [2]:
from datetime import datetime

## _Ranges_ de datas

10 dias a partir de primeiro de janeiro:

In [3]:
pd.date_range('2019-01-01', periods=10, freq='D')

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10'],
              dtype='datetime64[ns]', freq='D')

Dias entre primeiro e 15 de janeiro:

In [4]:
pd.date_range('2019-01-01', '2019-01-15', freq='D')

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10', '2019-01-11', '2019-01-12',
               '2019-01-13', '2019-01-14', '2019-01-15'],
              dtype='datetime64[ns]', freq='D')

15 datas entre 1/1 e 31/12, igualmente espaçadas:

In [5]:
pd.date_range('2019-01-01', '2019-12-31', periods=15)

DatetimeIndex(['2019-01-01', '2019-01-27', '2019-02-22', '2019-03-20',
               '2019-04-15', '2019-05-11', '2019-06-06', '2019-07-02',
               '2019-07-28', '2019-08-23', '2019-09-18', '2019-10-14',
               '2019-11-09', '2019-12-05', '2019-12-31'],
              dtype='datetime64[ns]', freq=None)

10 meses a partir de primeiro de janeiro:

In [6]:
pd.date_range('2019-01-01', periods=10, freq='M')

DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
               '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
               '2019-09-30', '2019-10-31'],
              dtype='datetime64[ns]', freq='M')

10 dias encerrando em primeiro de janeiro. Note que a data final é incluída no resultado.

In [7]:
pd.date_range(end='2019-01-01', periods=10, freq='D')

DatetimeIndex(['2018-12-23', '2018-12-24', '2018-12-25', '2018-12-26',
               '2018-12-27', '2018-12-28', '2018-12-29', '2018-12-30',
               '2018-12-31', '2019-01-01'],
              dtype='datetime64[ns]', freq='D')

10 meses encerrando em primeiro de janeiro. Note que, aqui, a data final não é incluída no resultado; isso acontece porque a frequência `'M'` indica `final do mês`!

In [8]:
pd.date_range(end='2019-01-01', periods=10, freq='M')

DatetimeIndex(['2018-03-31', '2018-04-30', '2018-05-31', '2018-06-30',
               '2018-07-31', '2018-08-31', '2018-09-30', '2018-10-31',
               '2018-11-30', '2018-12-31'],
              dtype='datetime64[ns]', freq='M')

Agora, usando como data final o último dia do mês:

In [9]:
pd.date_range(end='2019-01-31', periods=10, freq='M')

DatetimeIndex(['2018-04-30', '2018-05-31', '2018-06-30', '2018-07-31',
               '2018-08-31', '2018-09-30', '2018-10-31', '2018-11-30',
               '2018-12-31', '2019-01-31'],
              dtype='datetime64[ns]', freq='M')

10 meses, sempre no primeiro dia do mês:

In [10]:
pd.date_range('2019-01-01', periods=10, freq='MS')

DatetimeIndex(['2019-01-01', '2019-02-01', '2019-03-01', '2019-04-01',
               '2019-05-01', '2019-06-01', '2019-07-01', '2019-08-01',
               '2019-09-01', '2019-10-01'],
              dtype='datetime64[ns]', freq='MS')

Trimestres:

In [11]:
pd.date_range('2019-01-01', periods=10, freq='3M')

DatetimeIndex(['2019-01-31', '2019-04-30', '2019-07-31', '2019-10-31',
               '2020-01-31', '2020-04-30', '2020-07-31', '2020-10-31',
               '2021-01-31', '2021-04-30'],
              dtype='datetime64[ns]', freq='3M')

Quadrimestres:

In [12]:
pd.date_range('2019-01-01', periods=10, freq='Q')

DatetimeIndex(['2019-03-31', '2019-06-30', '2019-09-30', '2019-12-31',
               '2020-03-31', '2020-06-30', '2020-09-30', '2020-12-31',
               '2021-03-31', '2021-06-30'],
              dtype='datetime64[ns]', freq='Q-DEC')

Note que a frequência do resultado retornado é `'Q-DEC'`, que indica _quarters_ terminando em _DECEMBER_.

10 semanas, iniciando em primeiro de janeiro

In [13]:
pd.date_range('2019-01-01', periods=10, freq='W')

DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27',
               '2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24',
               '2019-03-03', '2019-03-10'],
              dtype='datetime64[ns]', freq='W-SUN')

## Dias Úteis

Também é possível usar dias úteis: basta usarmos `pd.bdate_range`:

In [14]:
pd.bdate_range('2019-01-01', periods=30)

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
               '2019-01-11', '2019-01-14', '2019-01-15', '2019-01-16',
               '2019-01-17', '2019-01-18', '2019-01-21', '2019-01-22',
               '2019-01-23', '2019-01-24', '2019-01-25', '2019-01-28',
               '2019-01-29', '2019-01-30', '2019-01-31', '2019-02-01',
               '2019-02-04', '2019-02-05', '2019-02-06', '2019-02-07',
               '2019-02-08', '2019-02-11'],
              dtype='datetime64[ns]', freq='B')

#### Mas, e os feriados?

Começamos definindo uma lista com os feriados em São Paulo (apenas em janeiro de 2019):

In [15]:
feriados_sp = [datetime(2019, 1, 1),
               datetime(2019, 1, 25)]

E, agora, criamos a nossa relação de dias úteis em SP:

In [16]:
pd.bdate_range('2019-01-01', periods=30,
               freq='C',
               holidays=feriados_sp)

DatetimeIndex(['2019-01-02', '2019-01-03', '2019-01-04', '2019-01-07',
               '2019-01-08', '2019-01-09', '2019-01-10', '2019-01-11',
               '2019-01-14', '2019-01-15', '2019-01-16', '2019-01-17',
               '2019-01-18', '2019-01-21', '2019-01-22', '2019-01-23',
               '2019-01-24', '2019-01-28', '2019-01-29', '2019-01-30',
               '2019-01-31', '2019-02-01', '2019-02-04', '2019-02-05',
               '2019-02-06', '2019-02-07', '2019-02-08', '2019-02-11',
               '2019-02-12', '2019-02-13'],
              dtype='datetime64[ns]', freq='C')

Também é possível gerar uma lista com dias específicos da semana:

In [17]:
dias_semana = [False,        # segunda-feira: ignora 
               True,         # terça-feira: inclui nos resultados
               True,         # quarta-feira: inclui
               False,        # quinta: ignora
               False,        # sexta: ignora
               True,         # sábado: inclui
               False]        # domingo: ignora
pd.bdate_range('2019-01-01', periods=30,
               freq='C',
               holidays=feriados_sp,
               weekmask=dias_semana)

DatetimeIndex(['2019-01-02', '2019-01-05', '2019-01-08', '2019-01-09',
               '2019-01-12', '2019-01-15', '2019-01-16', '2019-01-19',
               '2019-01-22', '2019-01-23', '2019-01-26', '2019-01-29',
               '2019-01-30', '2019-02-02', '2019-02-05', '2019-02-06',
               '2019-02-09', '2019-02-12', '2019-02-13', '2019-02-16',
               '2019-02-19', '2019-02-20', '2019-02-23', '2019-02-26',
               '2019-02-27', '2019-03-02', '2019-03-05', '2019-03-06',
               '2019-03-09', '2019-03-12'],
              dtype='datetime64[ns]', freq='C')

Último dia útil de cada mês (mas não parece possível informar feriados):

In [18]:
pd.date_range('2019-01-01', periods=10, freq='BM')

DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-29', '2019-04-30',
               '2019-05-31', '2019-06-28', '2019-07-31', '2019-08-30',
               '2019-09-30', '2019-10-31'],
              dtype='datetime64[ns]', freq='BM')

# Calendários para cálculo de dias úteis

O primeiro passo é definir um __calendário__ de dias úteis customizados:

In [19]:
feriados = [datetime(2019, 1, 1), datetime(2019, 1, 25)]
calendario_sp = pd.offsets.CustomBusinessDay(holidays=feriados)

Pronto: agora, basta fazer as contas!

In [20]:
dt = datetime(2018, 12, 31)
dt + 1 * calendario_sp

Timestamp('2019-01-02 00:00:00')

Para adicionar dois dias úteis:

In [21]:
dt + 2 * calendario_sp

Timestamp('2019-01-03 00:00:00')

Ou cinco:

In [22]:
dt + 5 * calendario_sp

Timestamp('2019-01-08 00:00:00')

Não acredita? Vamos conferir!

In [23]:
for dias in range(0, 22):
    print(dt.strftime('%Y-%m-%d'), '+', dias, '=', 
          (dt + dias * calendario_sp).strftime('%Y-%m-%d %a'))

2018-12-31 + 0 = 2018-12-31 Mon
2018-12-31 + 1 = 2019-01-02 Wed
2018-12-31 + 2 = 2019-01-03 Thu
2018-12-31 + 3 = 2019-01-04 Fri
2018-12-31 + 4 = 2019-01-07 Mon
2018-12-31 + 5 = 2019-01-08 Tue
2018-12-31 + 6 = 2019-01-09 Wed
2018-12-31 + 7 = 2019-01-10 Thu
2018-12-31 + 8 = 2019-01-11 Fri
2018-12-31 + 9 = 2019-01-14 Mon
2018-12-31 + 10 = 2019-01-15 Tue
2018-12-31 + 11 = 2019-01-16 Wed
2018-12-31 + 12 = 2019-01-17 Thu
2018-12-31 + 13 = 2019-01-18 Fri
2018-12-31 + 14 = 2019-01-21 Mon
2018-12-31 + 15 = 2019-01-22 Tue
2018-12-31 + 16 = 2019-01-23 Wed
2018-12-31 + 17 = 2019-01-24 Thu
2018-12-31 + 18 = 2019-01-28 Mon
2018-12-31 + 19 = 2019-01-29 Tue
2018-12-31 + 20 = 2019-01-30 Wed
2018-12-31 + 21 = 2019-01-31 Thu


Também podemos fazer isso sem o _loop_ explícito:

In [24]:
pd.date_range('2019-01-01', periods=22, 
              freq=calendario_sp)

DatetimeIndex(['2019-01-02', '2019-01-03', '2019-01-04', '2019-01-07',
               '2019-01-08', '2019-01-09', '2019-01-10', '2019-01-11',
               '2019-01-14', '2019-01-15', '2019-01-16', '2019-01-17',
               '2019-01-18', '2019-01-21', '2019-01-22', '2019-01-23',
               '2019-01-24', '2019-01-28', '2019-01-29', '2019-01-30',
               '2019-01-31', '2019-02-01'],
              dtype='datetime64[ns]', freq='C')

Para _"ajustar"_ uma data, se necessário, ao primeiro dia útil anterior ou posterior:

In [25]:
feriado = datetime(2019, 1, 25)
dia_util = datetime(2019, 1, 24)

In [26]:
calendario_sp.rollback(feriado)

Timestamp('2019-01-24 00:00:00')

In [27]:
calendario_sp.rollback(dia_util)

Timestamp('2019-01-24 00:00:00')

In [28]:
calendario_sp.rollforward(feriado)

Timestamp('2019-01-28 00:00:00')

In [29]:
calendario_sp.rollforward(dia_util)

Timestamp('2019-01-24 00:00:00')

#### E para contar quantos dias úteis há entre duas datas?

Neste caso, é necessário gerar uma relação com os dias úteis e depois contá-los:

In [30]:
d1 = datetime(2019, 1, 2)
d2 = datetime(2019, 2, 1)
todos_dias_uteis = pd.bdate_range(d1, d2,freq=calendario_sp)

In [31]:
todos_dias_uteis

DatetimeIndex(['2019-01-02', '2019-01-03', '2019-01-04', '2019-01-07',
               '2019-01-08', '2019-01-09', '2019-01-10', '2019-01-11',
               '2019-01-14', '2019-01-15', '2019-01-16', '2019-01-17',
               '2019-01-18', '2019-01-21', '2019-01-22', '2019-01-23',
               '2019-01-24', '2019-01-28', '2019-01-29', '2019-01-30',
               '2019-01-31', '2019-02-01'],
              dtype='datetime64[ns]', freq='C')

In [32]:
len(todos_dias_uteis)

22

## Períodos

Períodos correspondentes às 10 semanas iniciadas em primeiro de janeiro:

In [33]:
pd.date_range('2019-01-01', periods=10, freq='W-SAT').to_period()

PeriodIndex(['2018-12-30/2019-01-05', '2019-01-06/2019-01-12',
             '2019-01-13/2019-01-19', '2019-01-20/2019-01-26',
             '2019-01-27/2019-02-02', '2019-02-03/2019-02-09',
             '2019-02-10/2019-02-16', '2019-02-17/2019-02-23',
             '2019-02-24/2019-03-02', '2019-03-03/2019-03-09'],
            dtype='period[W-SAT]', freq='W-SAT')

(neste caso, selecionei períodos semanais terminados no sábado - `W-SAT` - para considerarmos a semana de domingo a sábado)

Trimestres terminados em novembro:

In [34]:
pd.period_range('2019Q1', freq='Q-NOV', periods=10)

PeriodIndex(['2019Q1', '2019Q2', '2019Q3', '2019Q4', '2020Q1', '2020Q2',
             '2020Q3', '2020Q4', '2021Q1', '2021Q2'],
            dtype='period[Q-NOV]', freq='Q-NOV')

Datas referentes aos trimestres terminados em novembro:

In [35]:
pd.period_range('2019Q1', freq='Q-NOV', periods=10).to_timestamp()

DatetimeIndex(['2018-12-01', '2019-03-01', '2019-06-01', '2019-09-01',
               '2019-12-01', '2020-03-01', '2020-06-01', '2020-09-01',
               '2020-12-01', '2021-03-01'],
              dtype='datetime64[ns]', freq='QS-DEC')

## Frequências disponíveis


<img src="date_time_frequencies.png" />

## Funções

`shift`

Começamos com uma _range_ de datas qualquer (neste caso, 10 períodos mensais, iniciando em primeiro de janeiro, sempre no primeiro dia do mês):

In [36]:
pd.date_range('2019-01-01', periods=10, freq='MS')

DatetimeIndex(['2019-01-01', '2019-02-01', '2019-03-01', '2019-04-01',
               '2019-05-01', '2019-06-01', '2019-07-01', '2019-08-01',
               '2019-09-01', '2019-10-01'],
              dtype='datetime64[ns]', freq='MS')

Agora, _rolamos_ essas datas 2 meses para a frente:

In [37]:
pd.date_range('2019-01-01', periods=10, freq='MS').shift(2)

DatetimeIndex(['2019-03-01', '2019-04-01', '2019-05-01', '2019-06-01',
               '2019-07-01', '2019-08-01', '2019-09-01', '2019-10-01',
               '2019-11-01', '2019-12-01'],
              dtype='datetime64[ns]', freq='MS')

Ou então, rolamos apenas dois dias:

In [38]:
pd.date_range('2019-01-01', periods=10, freq='MS').shift(2, freq='D')

DatetimeIndex(['2019-01-03', '2019-02-03', '2019-03-03', '2019-04-03',
               '2019-05-03', '2019-06-03', '2019-07-03', '2019-08-03',
               '2019-09-03', '2019-10-03'],
              dtype='datetime64[ns]', freq=None)

Também podemos _"recuar"_ dois dias:

In [39]:
pd.date_range('2019-01-01', periods=10, freq='MS').shift(-2, freq='D')

DatetimeIndex(['2018-12-30', '2019-01-30', '2019-02-27', '2019-03-30',
               '2019-04-29', '2019-05-30', '2019-06-29', '2019-07-30',
               '2019-08-30', '2019-09-29'],
              dtype='datetime64[ns]', freq=None)

In [40]:
pd.date_range('2019-01-01', freq='D', periods=40).shift(-1)

DatetimeIndex(['2018-12-31', '2019-01-01', '2019-01-02', '2019-01-03',
               '2019-01-04', '2019-01-05', '2019-01-06', '2019-01-07',
               '2019-01-08', '2019-01-09', '2019-01-10', '2019-01-11',
               '2019-01-12', '2019-01-13', '2019-01-14', '2019-01-15',
               '2019-01-16', '2019-01-17', '2019-01-18', '2019-01-19',
               '2019-01-20', '2019-01-21', '2019-01-22', '2019-01-23',
               '2019-01-24', '2019-01-25', '2019-01-26', '2019-01-27',
               '2019-01-28', '2019-01-29', '2019-01-30', '2019-01-31',
               '2019-02-01', '2019-02-02', '2019-02-03', '2019-02-04',
               '2019-02-05', '2019-02-06', '2019-02-07', '2019-02-08'],
              dtype='datetime64[ns]', freq='D')

# Datas como Índices de Series e DataFrames

In [41]:
valores = np.random.random(1000)
datas = pd.bdate_range('2000-01-01', periods=len(valores), freq='B')

In [42]:
df = pd.DataFrame(valores, 
                  index=datas,
                  columns=['Valor'])

In [43]:
df.head()

Unnamed: 0,Valor
2000-01-03,0.817091
2000-01-04,0.993833
2000-01-05,0.856333
2000-01-06,0.560181
2000-01-07,0.500848


In [44]:
df.loc[datetime(2000, 1, 7)]

Valor    0.500848
Name: 2000-01-07 00:00:00, dtype: float64

In [45]:
df.loc[df.index.year == 2001].head()

Unnamed: 0,Valor
2001-01-01,0.494556
2001-01-02,0.392503
2001-01-03,0.264153
2001-01-04,0.733385
2001-01-05,0.607034


In [46]:
df.loc[datetime(2002, 3, 15):].head()

Unnamed: 0,Valor
2002-03-15,0.415603
2002-03-18,0.840851
2002-03-19,0.774827
2002-03-20,0.654283
2002-03-21,0.533706


In [47]:
df.loc['2000'].tail()

Unnamed: 0,Valor
2000-12-25,0.144871
2000-12-26,0.452999
2000-12-27,0.809906
2000-12-28,0.220128
2000-12-29,0.679126


In [48]:
df.loc['2002-03'].head()

Unnamed: 0,Valor
2002-03-01,0.468342
2002-03-04,0.862451
2002-03-05,0.652964
2002-03-06,0.202664
2002-03-07,0.42504


In [49]:
df.loc['2002-03':'2002-05'].head()

Unnamed: 0,Valor
2002-03-01,0.468342
2002-03-04,0.862451
2002-03-05,0.652964
2002-03-06,0.202664
2002-03-07,0.42504


In [50]:
df.loc['2002-03':'2002-05'].tail()

Unnamed: 0,Valor
2002-05-27,0.962692
2002-05-28,0.731573
2002-05-29,0.095614
2002-05-30,0.288362
2002-05-31,0.7586


In [51]:
df.loc['2002-03':'2002-05-03'].tail()

Unnamed: 0,Valor
2002-04-29,0.182837
2002-04-30,0.402494
2002-05-01,0.57793
2002-05-02,0.501713
2002-05-03,0.604839


## Continua curioso?

http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html