# Serie temporali in Pandas

- Panoramica sui concetti generali di "tempo" in Pandas
- Oggetti Timestamp e DatetimeIndex e i costruttori : Timestamp(), to_datetime() e date_range()
- OggettiPeriod e PeriodIndex e i costruttori : Period() e Period_range()
- Oggetti Timedelta e TimedeltaIndex e i costruttori : Timedelta(), to_timedelta() e timedelta_range()
- metodi e attributi comuni alle serie temporali
- leggere serie temporali da un file tramite parametro parse_dates
- Slicing delle serie temporali con gli operatori loc[] e iloc[]
- metodo reindex() e resample()
- Esercizio!



## Panoramica sui concetti generali di "tempo" in Pandas

Ci sono 3 concetti principali legati al "tempo" che vengono gestiti dalla libreria Pandas:

1) **Date Time**: legato al concetto di data / orario, simile al concetto di datetime.datetime della libreria standard datetime. I suoi oggetti principali sono:
                        - Timestamp
                        - DatetimeIndex
                        
2) **Time span**: legato al concetto di arco temporale, periodo con un inizio, una fine e una frequenza di ripetizione.
 gli oggetti principali sono:
                        - Period
                        - PeriodIndex
                        

3) **Time delta**: "differenze in tempo" assoluta. I suoi oggetti principali sono:
                        - Timedelta
                        - TimedeltaIndex



## Timestamp e DatetimeIndex

- L'oggetto **Timestamp** rappresenta una singola marca temporale. Equivalente dell'oggetto *datetime* nella libreria standard *datetime* di Python, ma e' molto piu' flessibile (cosi come le *Series* hanno molta piu' flessibilita' rispetto alle *liste* in Python)

- L'oggetto **DatetimeIndex** non e' altro che una collezione di oggetti Timestamp

In [1]:
import pandas as pd

Esistono 3 costruttori principali per gli oggetti Timestamp e DatetimeIndex, vediamoli uno alla volta:

##### Costruttore Timestamp()

In [2]:
# passiamo una stringa

pd.Timestamp("30/10/1983") # primo parametro e' il valore che deve essere convertito in oggetto Timestamp

Timestamp('1983-10-30 00:00:00')

In [3]:
# tutte queste stringhe sono valide e convertite allo stesso modo

print(pd.Timestamp("30-10-1983"))

print(pd.Timestamp("30 Oct, 1983"))

print(pd.Timestamp("30-10/1983"))

print(pd.Timestamp("30-10-83"))

1983-10-30 00:00:00
1983-10-30 00:00:00
1983-10-30 00:00:00
1983-10-30 00:00:00


In [4]:
# passiamo un intero (o un float)

pd.Timestamp(436380629, unit = "ns")

# Se non specifico il parametro "unit", Il tempo viene rappresentato come offset in nanosecondi
# rispetto alla mezzanotte (UTC) del 1º gennaio 1970 (detto "epoch").

Timestamp('1970-01-01 00:00:00.436380629')

In [5]:
# di solito si specifica unit = "s"

pd.Timestamp(436380629.5, unit = "s")


Timestamp('1983-10-30 16:50:29.500000')

In [6]:
# passiamo un intero inteso come anno, mese, giorno, ora, minuti, secondi

pd.Timestamp(2020, 5, 9, 18, 44, 43)

Timestamp('2020-05-09 18:44:43')

In [7]:
#oppure specifico cosa rappresenta ogni intero

pd.Timestamp(year = 2020, month = 5, day = 9, hour = 18, minute = 44, second= 43)

Timestamp('2020-05-09 18:44:43')

###### Costruttore to_datetime()
- piu' flessibile rispetto a pd.Timestamp()

- riceve come argomento da convertire in un oggetto Timestamp diversi tipi di formati, non solo int, float, stringhe e datetime object, ma anche liste, array, Series e Dataframes

In [8]:
print(pd.to_datetime("30/10/1983"))

print(pd.to_datetime(1589049883, unit = "s"))

1983-10-30 00:00:00
2020-05-09 18:44:43


In [9]:
# passo una lista

pd.to_datetime(["30/10/1983", "30/10/1984"])
# ottengo un DatetimeIndex perche' ho una collezione / lista di oggetti Timestamp

DatetimeIndex(['1983-10-30', '1984-10-30'], dtype='datetime64[ns]', freq=None)

In [10]:
# se passo una lista al costruttore pd.Timestamp ottengo un errore

#pd.Timestamp(["30/10/1983", "30/10/1984"])

In [11]:
# Creo una Series con delle strighe che rappresentano delle date (ma hanno sempre dtype object)

pd.Series(data = ["1981", "30/10/1983", "30/10/1984", "15/02/1999 08:45"])

0                1981
1          30/10/1983
2          30/10/1984
3    15/02/1999 08:45
dtype: object

In [12]:
# passo La Series al costruttore to_datetime() e ottengo la conversione in una serie temporale

pd.to_datetime(pd.Series(data = ["1981", "30/10/1983", "30/10/1984", "15/02/1999 08:45"]))

0   1981-01-01 00:00:00
1   1983-10-30 00:00:00
2   1984-10-30 00:00:00
3   1999-02-15 08:45:00
dtype: datetime64[ns]

In [13]:
# Creo una Series con delle strighe dove non tutte rappresentano delle date reali

date = pd.Series(data = ["1981", "30/10/1983", "30/10/1984", "35/13/1999 08:45"])

In [14]:
# posso settare il parametro "errors"

pd.to_datetime(date, errors = "coerce")

0   1981-01-01
1   1983-10-30
2   1984-10-30
3          NaT
dtype: datetime64[ns]

## Costruttore date_range()
- utilizzato per costruire DateimeIndex (quindi una "collezione" di oggetti Timestamp)

- ha 3 parametri fondamentali ("start", "end" e "periods"), di cui almeno due devono essere settati

In [15]:
import pandas as pd

# utilizzo "start" e "end"
pd.date_range(start = "2019/01/01", end = "31/12/2019", freq = "QS-FEB")

DatetimeIndex(['2019-02-01', '2019-05-01', '2019-08-01', '2019-11-01'], dtype='datetime64[ns]', freq='QS-FEB')

alias per il parametro "freq": https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

In [16]:

# utilizzo "start" e "periods"

pd.date_range(start = "2019/01/01", periods = 56, freq = "W-Tue")

DatetimeIndex(['2019-01-01', '2019-01-08', '2019-01-15', '2019-01-22',
               '2019-01-29', '2019-02-05', '2019-02-12', '2019-02-19',
               '2019-02-26', '2019-03-05', '2019-03-12', '2019-03-19',
               '2019-03-26', '2019-04-02', '2019-04-09', '2019-04-16',
               '2019-04-23', '2019-04-30', '2019-05-07', '2019-05-14',
               '2019-05-21', '2019-05-28', '2019-06-04', '2019-06-11',
               '2019-06-18', '2019-06-25', '2019-07-02', '2019-07-09',
               '2019-07-16', '2019-07-23', '2019-07-30', '2019-08-06',
               '2019-08-13', '2019-08-20', '2019-08-27', '2019-09-03',
               '2019-09-10', '2019-09-17', '2019-09-24', '2019-10-01',
               '2019-10-08', '2019-10-15', '2019-10-22', '2019-10-29',
               '2019-11-05', '2019-11-12', '2019-11-19', '2019-11-26',
               '2019-12-03', '2019-12-10', '2019-12-17', '2019-12-24',
               '2019-12-31', '2020-01-07', '2020-01-14', '2020-01-21'],
     

In [17]:

# utilizzo "end" e "periods"

pd.date_range(end = "2020/01/01", periods = 12, freq = "N")

DatetimeIndex(['2019-12-31 23:59:59.999999989',
               '2019-12-31 23:59:59.999999990',
               '2019-12-31 23:59:59.999999991',
               '2019-12-31 23:59:59.999999992',
               '2019-12-31 23:59:59.999999993',
               '2019-12-31 23:59:59.999999994',
               '2019-12-31 23:59:59.999999995',
               '2019-12-31 23:59:59.999999996',
               '2019-12-31 23:59:59.999999997',
               '2019-12-31 23:59:59.999999998',
               '2019-12-31 23:59:59.999999999',
                         '2020-01-01 00:00:00'],
              dtype='datetime64[ns]', freq='N')

In [18]:

# utilizzo tutti e 3
# non posso settare "freq"

pd.date_range(start = "2019/01/01", end = "2019/12/31", periods = 364)

DatetimeIndex([          '2019-01-01 00:00:00',
               '2019-01-02 00:03:58.016528925',
               '2019-01-03 00:07:56.033057851',
               '2019-01-04 00:11:54.049586776',
               '2019-01-05 00:15:52.066115702',
               '2019-01-06 00:19:50.082644628',
               '2019-01-07 00:23:48.099173553',
               '2019-01-08 00:27:46.115702479',
               '2019-01-09 00:31:44.132231405',
               '2019-01-10 00:35:42.148760330',
               ...
               '2019-12-21 23:24:17.851239672',
               '2019-12-22 23:28:15.867768596',
               '2019-12-23 23:32:13.884297524',
               '2019-12-24 23:36:11.900826448',
               '2019-12-25 23:40:09.917355372',
               '2019-12-26 23:44:07.933884300',
               '2019-12-27 23:48:05.950413224',
               '2019-12-28 23:52:03.966942152',
               '2019-12-29 23:56:01.983471076',
                         '2019-12-31 00:00:00'],
              dtype=

## Period e PeriodIndex
- per "period" si intende un periodo temporale, ad esempio un intero giorno, un intero mese, o un anno, ecc..
- da non confondere con il concetto di "Timestamp" che si riferisce ad una singola marca temporale, ad un preciso momento nel tempo

- PeriodIndex e' una collezione di oggetti "period" cosi come DatetimeIndex e' una collezione di oggetti "Timestamp"

In [19]:
import pandas as pd

In [20]:
pd.Timestamp("2020-01-01") # si riferisce ad un specifico "momento"

Timestamp('2020-01-01 00:00:00')

#### Costruttore Period()

In [21]:
pd.Period("2020-01-01") # si riferisce a "tutto il giorno"

Period('2020-01-01', 'D')

In [22]:
a = pd.Period("2020-01-01")

a.end_time

Timestamp('2020-01-01 23:59:59.999999999')

#### Costruttore period_range()

- simile a date_range()

In [23]:
pd.period_range(start = "2019-01-01", end = "2019-12-31", freq = "D")

PeriodIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
             '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
             '2019-01-09', '2019-01-10',
             ...
             '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
             '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
             '2019-12-30', '2019-12-31'],
            dtype='period[D]', length=365, freq='D')

###### posso passare da PeriodIndex a datetimeIndex usando to_timestamp()

In [24]:
pd.period_range(start = "2019-01-01", end = "2019-12-31", freq = "D").to_timestamp()

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10',
               ...
               '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
               '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
               '2019-12-30', '2019-12-31'],
              dtype='datetime64[ns]', length=365, freq='D')

##### oppure il contrario usando metodo to_period()

In [25]:
pd.date_range(start = "2019-01-01", end = "2019-12-31", freq = "D").to_period()


PeriodIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
             '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
             '2019-01-09', '2019-01-10',
             ...
             '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
             '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
             '2019-12-30', '2019-12-31'],
            dtype='period[D]', length=365, freq='D')

## Timedelta e TimedeltaIndex
- legati al concetto generale di **Time delta**: "differenze in tempo" assoluta.
- in pratica la differenza in termini di "tempo" tra due momenti precisi
- anche in questo caso, TimedeltaIndex non e' altro che una collezione di oggetti "timedelta"

In [26]:
import pandas as pd
data1 = pd.Timestamp("2020-01-11 12:00:00")


In [27]:
data2 = pd.Timestamp("2020-01-01 10:00:00")

In [28]:
data2 - data1    # differenza in tempo (assoluta) tra due precisi istanti temporali

Timedelta('-11 days +22:00:00')

#### Costruttore Timedelta()


In [29]:
# dobbiamo specificare il "delta" in termini di tempo (settimane, giorni, ore,..)- ma non mesi e anni!


pd.Timedelta( days = 1, minutes = 50)

Timedelta('1 days 00:50:00')

#### Costruttore to_timedelta()
- similmente al costruttore to_datetime() accetta anche liste o series (al contrario di "Timedelta()")

In [30]:
pd.to_timedelta(['1 days 01:10:22', '45s', '1 W 2 days'])

TimedeltaIndex(['1 days 01:10:22', '0 days 00:00:45', '9 days 00:00:00'], dtype='timedelta64[ns]', freq=None)

In [31]:
# cosi come to_datetime() ha il parametro "errors"

pd.to_timedelta(['1 days 01:10:22', '45s', '1 W 2 days', 'ciao'], errors = "coerce")

TimedeltaIndex(['1 days 01:10:22', '0 days 00:00:45', '9 days 00:00:00', NaT], dtype='timedelta64[ns]', freq=None)

#### Costruttore timedelta_range()

In [32]:
pd.timedelta_range(start='1 W', periods=5)

TimedeltaIndex(['7 days', '8 days', '9 days', '10 days', '11 days'], dtype='timedelta64[ns]', freq='D')

In [33]:
pd.timedelta_range(start='1 W', periods=5, closed = "right")

TimedeltaIndex(['8 days', '9 days', '10 days', '11 days'], dtype='timedelta64[ns]', freq='D')

In [34]:
pd.timedelta_range(start='1 day', end='2 days', freq='6H')

TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00',
                '1 days 18:00:00', '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='6H')

In [35]:
pd.timedelta_range(start='1 day', end='20 days', periods=4) # la frequenza viene "spaziata linearmente"

TimedeltaIndex(['1 days 00:00:00', '7 days 08:00:00', '13 days 16:00:00',
                '20 days 00:00:00'],
               dtype='timedelta64[ns]', freq=None)

## Accedere agli attributi temporali tramite .dt

In [36]:
import pandas as pd
pd.date_range(start = "2020/01/01", end = "2020-12-31")

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
               '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=366, freq='D')

In [37]:
s = pd.Series(pd.date_range(start = "2020/01/01", end = "2020-12-31"))
s

0     2020-01-01
1     2020-01-02
2     2020-01-03
3     2020-01-04
4     2020-01-05
         ...    
361   2020-12-27
362   2020-12-28
363   2020-12-29
364   2020-12-30
365   2020-12-31
Length: 366, dtype: datetime64[ns]

In [38]:
df = s.to_frame(name = "2020")
df

Unnamed: 0,2020
0,2020-01-01
1,2020-01-02
2,2020-01-03
3,2020-01-04
4,2020-01-05
...,...
361,2020-12-27
362,2020-12-28
363,2020-12-29
364,2020-12-30


In [39]:
df["giorno"]  = df["2020"].dt.day_name()
df

Unnamed: 0,2020,giorno
0,2020-01-01,Wednesday
1,2020-01-02,Thursday
2,2020-01-03,Friday
3,2020-01-04,Saturday
4,2020-01-05,Sunday
...,...,...
361,2020-12-27,Sunday
362,2020-12-28,Monday
363,2020-12-29,Tuesday
364,2020-12-30,Wednesday


In [40]:
df["2020"].dt.year

0      2020
1      2020
2      2020
3      2020
4      2020
       ... 
361    2020
362    2020
363    2020
364    2020
365    2020
Name: 2020, Length: 366, dtype: int64

## Metodi e Attributi di oggetti Timestamps
- il metodo di accesso **.dt** ci permette di fare operazioni su delle series che contengono oggetti datetime
- ma io posso accedere direttamente a degli attributi e metodi quando ho oggetti Timestamp (o DatetimeIndex)

In [41]:
# riprendo esempio lezione precedente

import pandas as pd

df= pd.Series(data = "valore random", index = pd.date_range(start = "2020/01/01", end = "2020-12-31")).to_frame("2020")
df

Unnamed: 0,2020
2020-01-01,valore random
2020-01-02,valore random
2020-01-03,valore random
2020-01-04,valore random
2020-01-05,valore random
...,...
2020-12-27,valore random
2020-12-28,valore random
2020-12-29,valore random
2020-12-30,valore random


In [42]:
df.index

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
               '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=366, freq='D')

In [43]:
giorno = df.index[0]
giorno

Timestamp('2020-01-01 00:00:00', freq='D')

In [44]:
giorno.day

1

In [45]:
giorno.is_month_end

False

In [46]:
giorno.day_name()

'Wednesday'

In [47]:
giorno.month_name()

'January'

*posso usare questi attributi e metodi sull'intero oggetto DatetimeIndex*

In [48]:
df.index.day_name()

Index(['Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'Monday',
       'Tuesday', 'Wednesday', 'Thursday', 'Friday',
       ...
       'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday',
       'Monday', 'Tuesday', 'Wednesday', 'Thursday'],
      dtype='object', length=366)

In [49]:
df.insert(0, column = "day_name", value = df.index.day_name())
df

Unnamed: 0,day_name,2020
2020-01-01,Wednesday,valore random
2020-01-02,Thursday,valore random
2020-01-03,Friday,valore random
2020-01-04,Saturday,valore random
2020-01-05,Sunday,valore random
...,...,...
2020-12-27,Sunday,valore random
2020-12-28,Monday,valore random
2020-12-29,Tuesday,valore random
2020-12-30,Wednesday,valore random


In [50]:
df[df["day_name"] == "Friday"]

Unnamed: 0,day_name,2020
2020-01-03,Friday,valore random
2020-01-10,Friday,valore random
2020-01-17,Friday,valore random
2020-01-24,Friday,valore random
2020-01-31,Friday,valore random
2020-02-07,Friday,valore random
2020-02-14,Friday,valore random
2020-02-21,Friday,valore random
2020-02-28,Friday,valore random
2020-03-06,Friday,valore random


## Leggere una serie temporale da un file esterno

In [51]:
import pandas as pd

In [52]:
df = pd.read_csv("FB.csv")
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
1,2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2,2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
3,2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
4,2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...,...
248,2020-05-11,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
249,2020-05-12,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
250,2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
251,2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253 entries, 0 to 252
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       253 non-null    object 
 1   Open       253 non-null    float64
 2   High       253 non-null    float64
 3   Low        253 non-null    float64
 4   Close      253 non-null    float64
 5   Adj Close  253 non-null    float64
 6   Volume     253 non-null    int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 14.0+ KB


In [54]:
df.set_index("Date", inplace = True)

In [55]:
df.index

Index(['2019-05-16', '2019-05-17', '2019-05-20', '2019-05-21', '2019-05-22',
       '2019-05-23', '2019-05-24', '2019-05-28', '2019-05-29', '2019-05-30',
       ...
       '2020-05-04', '2020-05-05', '2020-05-06', '2020-05-07', '2020-05-08',
       '2020-05-11', '2020-05-12', '2020-05-13', '2020-05-14', '2020-05-15'],
      dtype='object', name='Date', length=253)

In [56]:
df.index = pd.to_datetime(df.index)

Trasformo la mia serie temporale direttamente col metodo _read_csv

In [57]:
df = pd.read_csv("FB.csv", parse_dates= ["Date"], index_col = "Date")
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...
2020-05-11,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
2020-05-12,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [58]:
df.index

DatetimeIndex(['2019-05-16', '2019-05-17', '2019-05-20', '2019-05-21',
               '2019-05-22', '2019-05-23', '2019-05-24', '2019-05-28',
               '2019-05-29', '2019-05-30',
               ...
               '2020-05-04', '2020-05-05', '2020-05-06', '2020-05-07',
               '2020-05-08', '2020-05-11', '2020-05-12', '2020-05-13',
               '2020-05-14', '2020-05-15'],
              dtype='datetime64[ns]', name='Date', length=253, freq=None)

## Leggere una serie temporale da un file esterno parte 2

In [59]:
import pandas as pd
pd.read_csv("FB.csv", parse_dates = ["Date"])


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
1,2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2,2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
3,2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
4,2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...,...
248,2020-05-11,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
249,2020-05-12,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
250,2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
251,2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [60]:
df= pd.read_csv("FB2.csv", parse_dates = ["Date"])
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-05-16 08_AM,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
1,2019-05-17 08_AM,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2,2019-05-20 08_AM,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
3,2019-05-21 08_AM,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
4,2019-05-22 08_AM,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...,...
248,2020-05-11 08_AM,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
249,2020-05-12 08_AM,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
250,2020-05-13 08_AM,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
251,2020-05-14 08_AM,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [61]:
df.info()

# la mia colonna "Date" e' ancora un dtype object

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253 entries, 0 to 252
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       253 non-null    object 
 1   Open       253 non-null    float64
 2   High       253 non-null    float64
 3   Low        253 non-null    float64
 4   Close      253 non-null    float64
 5   Adj Close  253 non-null    float64
 6   Volume     253 non-null    int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 14.0+ KB


proviamo ad usare il metodo to_datetime()

In [62]:
pd.to_datetime(df["Date"])

ParserError: Unknown string format: 2019-05-16 08_AM

Devo formattare le stringhe nella colonna "Date" in maniera opportuna:

- "2019-05-16 08_AM" e' nel formato anno-mese-giorno ora_AM
- il tipo di formattazione lo trovate qui: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [63]:
pd.to_datetime(df["Date"], format = "%Y-%m-%d %I_%p")

0     2019-05-16 08:00:00
1     2019-05-17 08:00:00
2     2019-05-20 08:00:00
3     2019-05-21 08:00:00
4     2019-05-22 08:00:00
              ...        
248   2020-05-11 08:00:00
249   2020-05-12 08:00:00
250   2020-05-13 08:00:00
251   2020-05-14 08:00:00
252   2020-05-15 08:00:00
Name: Date, Length: 253, dtype: datetime64[ns]

Posso fare questa operazione direttamente alla lettura del file

In [64]:
import datetime

In [65]:
df = pd.read_csv("FB2.csv", parse_dates = ["Date"], date_parser = lambda x: datetime.datetime.strptime(x, "%Y-%m-%d %I_%p"))
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-05-16 08:00:00,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
1,2019-05-17 08:00:00,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2,2019-05-20 08:00:00,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
3,2019-05-21 08:00:00,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
4,2019-05-22 08:00:00,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...,...
248,2020-05-11 08:00:00,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
249,2020-05-12 08:00:00,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
250,2020-05-13 08:00:00,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
251,2020-05-14 08:00:00,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [66]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253 entries, 0 to 252
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       253 non-null    datetime64[ns]
 1   Open       253 non-null    float64       
 2   High       253 non-null    float64       
 3   Low        253 non-null    float64       
 4   Close      253 non-null    float64       
 5   Adj Close  253 non-null    float64       
 6   Volume     253 non-null    int64         
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 14.0 KB


## loc[] e iloc[] con oggetti DatetimeIndex

In [67]:
import pandas as pd
df = pd.read_csv("FB.csv", parse_dates= ["Date"], index_col = "Date")
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...
2020-05-11,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
2020-05-12,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [68]:
df.index

DatetimeIndex(['2019-05-16', '2019-05-17', '2019-05-20', '2019-05-21',
               '2019-05-22', '2019-05-23', '2019-05-24', '2019-05-28',
               '2019-05-29', '2019-05-30',
               ...
               '2020-05-04', '2020-05-05', '2020-05-06', '2020-05-07',
               '2020-05-08', '2020-05-11', '2020-05-12', '2020-05-13',
               '2020-05-14', '2020-05-15'],
              dtype='datetime64[ns]', name='Date', length=253, freq=None)

**Estrazione tramite posizione**: *iloc[]*
- valgono le stesse regole viste per gli oggetti "Index"

In [69]:
df.iloc[0]

Open         1.850500e+02
High         1.885800e+02
Low          1.850500e+02
Close        1.869900e+02
Adj Close    1.869900e+02
Volume       1.295310e+07
Name: 2019-05-16 00:00:00, dtype: float64

In [70]:
df.iloc[0:2] # estremo destro escluso

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400


In [71]:
df.iloc[-3:-1] # estremo destro escluso

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [72]:
df.iloc[[0,1,-1]]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2020-05-15,205.270004,211.339996,204.119995,210.880005,210.880005,19375200


**Estrazione tramite etichetta (index label)**: *.loc[]*

In [73]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800


In [74]:
df.loc["2019-05-17"]

Open         1.848400e+02
High         1.875800e+02
Low          1.842800e+02
Close        1.853000e+02
Adj Close    1.853000e+02
Volume       1.048540e+07
Name: 2019-05-17 00:00:00, dtype: float64

In [75]:
df.loc["2019-05-17": "2019-05-22"] # estremo destro incluso

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800


In [76]:
df.loc["2019-06"] # ottengo tutte le osservazioni relative al mese di maggio 2019

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-06-03,175.0,175.050003,161.009995,164.149994,164.149994,56059600
2019-06-04,163.710007,168.279999,160.839996,167.5,167.5,46044300
2019-06-05,167.479996,168.720001,164.630005,168.169998,168.169998,19758300
2019-06-06,168.300003,169.699997,167.229996,168.330002,168.330002,12446400
2019-06-07,170.169998,173.869995,168.839996,173.350006,173.350006,16917300
2019-06-10,174.75,177.860001,173.800003,174.820007,174.820007,14767900
2019-06-11,178.479996,179.979996,176.789993,178.100006,178.100006,15266600
2019-06-12,178.380005,179.270004,172.880005,175.039993,175.039993,17681500
2019-06-13,175.529999,178.029999,174.610001,177.470001,177.470001,12253600
2019-06-14,180.509995,181.839996,180.0,181.330002,181.330002,16773700


In [77]:
df.loc["2020-01-01":] # tutte le osservazioni dopo il primo gennaio 2020

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,206.750000,209.789993,206.270004,209.779999,209.779999,12077100
2020-01-03,207.210007,210.399994,206.949997,208.669998,208.669998,11188400
2020-01-06,206.699997,212.779999,206.520004,212.600006,212.600006,17058900
2020-01-07,212.820007,214.580002,211.750000,213.059998,213.059998,14912400
2020-01-08,213.000000,216.240005,212.610001,215.220001,215.220001,13475000
...,...,...,...,...,...,...
2020-05-11,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
2020-05-12,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


Posso ottenere lo stesso risultato della cella di sopra col metodo **truncate()**

In [78]:
df.truncate(before = "2020-01-01")

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,206.750000,209.789993,206.270004,209.779999,209.779999,12077100
2020-01-03,207.210007,210.399994,206.949997,208.669998,208.669998,11188400
2020-01-06,206.699997,212.779999,206.520004,212.600006,212.600006,17058900
2020-01-07,212.820007,214.580002,211.750000,213.059998,213.059998,14912400
2020-01-08,213.000000,216.240005,212.610001,215.220001,215.220001,13475000
...,...,...,...,...,...,...
2020-05-11,210.889999,215.000000,210.369995,213.179993,213.179993,12911900
2020-05-12,213.289993,215.279999,210.000000,210.100006,210.100006,14704600
2020-05-13,209.429993,210.779999,202.110001,205.100006,205.100006,20684600
2020-05-14,202.559998,206.929993,200.690002,206.809998,206.809998,17178900


In [79]:
df.truncate(after = "2020-01-01")
# df.loc[:"2020-01-01"]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.050003,188.580002,185.050003,186.990005,186.990005,12953100
2019-05-17,184.839996,187.580002,184.279999,185.300003,185.300003,10485400
2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
2019-05-21,184.570007,185.699997,183.889999,184.820007,184.820007,7502800
2019-05-22,184.729996,186.740005,183.610001,185.320007,185.320007,9213800
...,...,...,...,...,...,...
2019-12-24,206.300003,206.789993,205.000000,205.119995,205.119995,6046300
2019-12-26,205.570007,207.820007,205.309998,207.789993,207.789993,9350700
2019-12-27,208.669998,208.929993,206.589996,208.100006,208.100006,10284200
2019-12-30,207.860001,207.899994,203.899994,204.410004,204.410004,10524300


In [80]:
df.loc[["2019-05-20", "2019-05-28"]]

KeyError: "None of [Index(['2019-05-20', '2019-05-28'], dtype='object', name='Date')] are in the [index]"

il KeyError di sopra mi dice che la mia lista e' un Index object, devo convertirla prima in un oggetto DatetimeIndex

In [81]:
pd.to_datetime(["2019-05-20", "2019-05-28"])

DatetimeIndex(['2019-05-20', '2019-05-28'], dtype='datetime64[ns]', freq=None)

In [82]:
df.loc[pd.to_datetime(["2019-05-20", "2019-05-28"])]

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05-20,181.880005,184.229996,181.369995,182.720001,182.720001,10352000
2019-05-28,181.539993,184.710007,181.449997,184.309998,184.309998,14843300


## metodo reindex()

In [83]:
import pandas as pd

df = pd.read_csv("FB.csv", parse_dates= ["Date"], index_col = "Date").round(2)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.05,188.58,185.05,186.99,186.99,12953100
2019-05-17,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-20,181.88,184.23,181.37,182.72,182.72,10352000
2019-05-21,184.57,185.70,183.89,184.82,184.82,7502800
2019-05-22,184.73,186.74,183.61,185.32,185.32,9213800
...,...,...,...,...,...,...
2020-05-11,210.89,215.00,210.37,213.18,213.18,12911900
2020-05-12,213.29,215.28,210.00,210.10,210.10,14704600
2020-05-13,209.43,210.78,202.11,205.10,205.10,20684600
2020-05-14,202.56,206.93,200.69,206.81,206.81,17178900


In [84]:
df.index.min()

Timestamp('2019-05-16 00:00:00')

In [85]:
df.index.max()

Timestamp('2020-05-15 00:00:00')

In [86]:
nuovo_indice = pd.date_range(start = df.index.min(), end = df.index.max(), freq = "D")

In [87]:
df.reindex(nuovo_indice, method = "ffill")

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05-16,185.05,188.58,185.05,186.99,186.99,12953100
2019-05-17,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-18,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-19,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-20,181.88,184.23,181.37,182.72,182.72,10352000
...,...,...,...,...,...,...
2020-05-11,210.89,215.00,210.37,213.18,213.18,12911900
2020-05-12,213.29,215.28,210.00,210.10,210.10,14704600
2020-05-13,209.43,210.78,202.11,205.10,205.10,20684600
2020-05-14,202.56,206.93,200.69,206.81,206.81,17178900


## metodo resample()
- utile per cambiare la "frequenza" di una serie temporale: ad esempio passare da una frequenza giornaliera ad una mensile
- lavora in background in maniera simile al metodo groupby

In [88]:
import pandas as pd

df = pd.read_csv("FB.csv", parse_dates= ["Date"], index_col = "Date").round(2)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-05-16,185.05,188.58,185.05,186.99,186.99,12953100
2019-05-17,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-20,181.88,184.23,181.37,182.72,182.72,10352000
2019-05-21,184.57,185.70,183.89,184.82,184.82,7502800
2019-05-22,184.73,186.74,183.61,185.32,185.32,9213800
...,...,...,...,...,...,...
2020-05-11,210.89,215.00,210.37,213.18,213.18,12911900
2020-05-12,213.29,215.28,210.00,210.10,210.10,14704600
2020-05-13,209.43,210.78,202.11,205.10,205.10,20684600
2020-05-14,202.56,206.93,200.69,206.81,206.81,17178900


In [89]:
df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq = "D"), method = "ffill")
df

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05-16,185.05,188.58,185.05,186.99,186.99,12953100
2019-05-17,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-18,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-19,184.84,187.58,184.28,185.30,185.30,10485400
2019-05-20,181.88,184.23,181.37,182.72,182.72,10352000
...,...,...,...,...,...,...
2020-05-11,210.89,215.00,210.37,213.18,213.18,12911900
2020-05-12,213.29,215.28,210.00,210.10,210.10,14704600
2020-05-13,209.43,210.78,202.11,205.10,205.10,20684600
2020-05-14,202.56,206.93,200.69,206.81,206.81,17178900


Passo da una frequenza giornaliera ad una mensile col metodo resample

In [90]:
mesi = df.resample("MS") # oggetto DatetimeIndexResampler (simile ad un oggetto groupby)

In [91]:
mesi.groups

{Timestamp('2019-05-01 00:00:00', freq='MS'): 16,
 Timestamp('2019-06-01 00:00:00', freq='MS'): 46,
 Timestamp('2019-07-01 00:00:00', freq='MS'): 77,
 Timestamp('2019-08-01 00:00:00', freq='MS'): 108,
 Timestamp('2019-09-01 00:00:00', freq='MS'): 138,
 Timestamp('2019-10-01 00:00:00', freq='MS'): 169,
 Timestamp('2019-11-01 00:00:00', freq='MS'): 199,
 Timestamp('2019-12-01 00:00:00', freq='MS'): 230,
 Timestamp('2020-01-01 00:00:00', freq='MS'): 261,
 Timestamp('2020-02-01 00:00:00', freq='MS'): 290,
 Timestamp('2020-03-01 00:00:00', freq='MS'): 321,
 Timestamp('2020-04-01 00:00:00', freq='MS'): 351,
 Timestamp('2020-05-01 00:00:00', freq='MS'): 366}

In [92]:
mesi.get_group('2019-08-31 00:00:00')

KeyError: '2019-08-31 00:00:00'

In [93]:
mesi.last()


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05-01,180.28,180.54,177.16,177.47,177.47,15226500
2019-06-01,190.55,193.2,189.94,193.0,193.0,16378900
2019-07-01,196.95,198.76,192.68,194.23,194.23,14593500
2019-08-01,186.78,186.8,183.46,185.67,185.67,10774500
2019-09-01,177.87,178.67,176.85,178.08,178.08,10740000
2019-10-01,196.7,198.09,188.25,191.65,191.65,42286500
2019-11-01,201.6,203.8,201.21,201.64,201.64,7985200
2019-12-01,204.0,205.56,203.6,205.25,205.25,8953500
2020-01-01,208.43,208.69,201.06,201.91,201.91,31359900
2020-02-01,182.7,192.74,181.82,192.47,192.47,32583500


In [94]:
mesi.mean() 

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05-01,183.180625,184.98125,181.9125,182.99,182.99,10682910.0
2019-06-01,181.793667,184.025,179.595333,181.935667,181.935667,20083520.0
2019-07-01,199.210645,201.475161,196.998065,199.582581,199.582581,15723680.0
2019-08-01,185.420645,187.095806,182.878387,184.556774,184.556774,13488290.0
2019-09-01,186.553667,187.662333,183.934667,185.648667,185.648667,13620140.0
2019-10-01,184.387742,186.573548,182.096774,184.438387,184.438387,13767720.0
2019-11-01,195.345667,197.042,194.109333,195.825667,195.825667,12897970.0
2019-12-01,202.548065,203.706774,200.552258,202.157097,202.157097,13793810.0
2020-01-01,216.316452,217.969677,214.480968,216.468387,216.468387,15303650.0
2020-02-01,207.684138,210.101379,205.300345,207.808621,207.808621,17164650.0


L'output della cella precedente sembra che si riferisca ad un giorno preciso invece che a tutto il mese, proviamo un'altra strada:

In [95]:
df.resample("M", kind = "period").mean()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05,183.180625,184.98125,181.9125,182.99,182.99,10682910.0
2019-06,181.793667,184.025,179.595333,181.935667,181.935667,20083520.0
2019-07,199.210645,201.475161,196.998065,199.582581,199.582581,15723680.0
2019-08,185.420645,187.095806,182.878387,184.556774,184.556774,13488290.0
2019-09,186.553667,187.662333,183.934667,185.648667,185.648667,13620140.0
2019-10,184.387742,186.573548,182.096774,184.438387,184.438387,13767720.0
2019-11,195.345667,197.042,194.109333,195.825667,195.825667,12897970.0
2019-12,202.548065,203.706774,200.552258,202.157097,202.157097,13793810.0
2020-01,216.316452,217.969677,214.480968,216.468387,216.468387,15303650.0
2020-02,207.684138,210.101379,205.300345,207.808621,207.808621,17164650.0


In [96]:
df.resample("W", kind = "period").mean()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2019-05-13/2019-05-19,184.8925,187.83,184.4725,185.7225,185.7225,11102320.0
2019-05-20/2019-05-26,182.941429,184.494286,181.575714,182.415714,182.415714,9465786.0
2019-05-27/2019-06-02,181.612857,182.571429,179.428571,180.425714,180.425714,12958530.0
2019-06-03/2019-06-09,169.285714,171.908571,165.747143,169.742857,169.742857,26437210.0
2019-06-10/2019-06-16,178.381429,180.094286,176.868571,178.488571,178.488571,15755810.0
2019-06-17/2019-06-23,189.03,191.327143,187.161429,189.701429,189.701429,24476810.0
2019-06-24/2019-06-30,190.91,193.765714,189.3,191.085714,191.085714,15051940.0
2019-07-01/2019-07-07,195.01,196.612857,193.638571,195.942857,195.942857,11034170.0
2019-07-08/2019-07-14,198.922857,202.685714,197.984286,201.934286,201.934286,14967370.0
2019-07-15/2019-07-21,202.715714,203.517143,200.014286,200.772857,200.772857,12642600.0


In [97]:
df.resample("W", kind = "period").agg({"Open": "min", "Close": "max"})

Unnamed: 0,Open,Close
2019-05-13/2019-05-19,184.84,186.99
2019-05-20/2019-05-26,181.88,185.32
2019-05-27/2019-06-02,180.28,184.31
2019-06-03/2019-06-09,163.71,173.35
2019-06-10/2019-06-16,174.75,181.33
2019-06-17/2019-06-23,185.01,191.14
2019-06-24/2019-06-30,189.54,193.0
2019-07-01/2019-07-07,193.0,197.2
2019-07-08/2019-07-14,194.97,204.87
2019-07-15/2019-07-21,200.15,203.91
