# Práca s časom a dátumom
Na reprezentovanie času a dátumu v Pandas používame nasledujúce typy: 
- Date times(Timestamp) - dtype ***datetime64[ns]***
- Time deltas (Timedelta) - dtype ***timedelta64[ns]***
- Time spans (Period) - dtype ***period[freq]***
- Date offsets - toto je specialny typ

In [2]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

# Datetime
sa pouziva na reprezentaciu datumu a casu v pythone. V pandase sa oznacuje pojmom timestamp ale python vie pracovat aj s objektami typu datetime. 

## Prečo? 
Python datetime objekt nepodporuje nanosekundy na rozdiel od pandas timestamp

In [93]:
from datetime import datetime as dt

now = dt.now()
print(type(now))
# print(now.nanosecods) # <-- takato funkcia neexistuje, datetime podporuje len mikrosekundy
now

<class 'datetime.datetime'>


datetime.datetime(2021, 1, 9, 12, 51, 42, 392396)

In [94]:
# konverzia datetime to tyimestamp
now_timestamp = pd.Timestamp(now)
print(type(now_timestamp)) # lebo povodny datetime nemal ziadne
# print(now_timestamp.nanosecond) 
now_timestamp

<class 'pandas._libs.tslibs.timestamps.Timestamp'>
0


Timestamp('2021-01-09 12:51:42.392396')

Čas a dátum sa veľmi často používajú v ***indexe*** dataframu 

# Timestamp
Pandasov timestamp je zalozeny na efektivnejsom datovom type numpy.datetime64. V pandase preto pracujeme s typom timestamp nie datetime !!!

V pandase timestamp oznacuje jeden bod v case s nanosekundovou presnostou.

In [17]:
timestamp = pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, second=20, microsecond=79, nanosecond=99)
print(timestamp)
print(type(timestamp))
timestamp.nanosecond

2020-06-09 08:30:20.000079099
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


99

In [16]:
# mikro a nanosekundy sa aplikuju pri vypise len ak existuju
print(pd.Timestamp('2019-8-1'))
print(pd.Timestamp(2020, 6, 9, 12))
print(pd.Timestamp('2020-06-09 00:00:00'))
print(pd.Timestamp('August 9, 2020 13:45'))
print(pd.Timestamp('2020-01-01T14'))
print(pd.Timestamp(300)) # <--- number of seconds after UNIX epoch (January 1, 1970)
print(pd.Timestamp(1513393355.5))

2019-08-01 00:00:00
2020-06-09 12:00:00
2020-06-09 00:00:00
2020-08-09 13:45:00
2020-01-01 14:00:00
1970-01-01 00:00:00.000000300
1970-01-01 00:00:01.513393355


In [33]:
# Nan hodnota ma svoj specialny objekt 
nan_dt = pd.Timestamp(np.nan)
print(type(nan_dt))
nan_dt

<class 'pandas._libs.tslibs.nattype.NaTType'>


NaT

In [44]:
# Pandas timestamp series
sample_timestamps = pd.date_range("2020-01-09", freq="D", periods=3) # pomocou funkcie daterange viem vytvorit takzvany  DatetimeIndex
sample_timestamps

DatetimeIndex(['2020-01-09', '2020-01-10', '2020-01-11'], dtype='datetime64[ns]', freq='D')

In [49]:
df = pd.DataFrame(sample_timestamps, columns=["times"])
df

Unnamed: 0,times
0,2020-01-09
1,2020-01-10
2,2020-01-11


In [51]:
# Cize tu mozeme vidiet ze datovy typ je numpy datetime[64]
df.dtypes

times    datetime64[ns]
dtype: object

# Časové pásma 
Timestamp vie pracovať aj s časovými pásmami. Defaultne spravanie je neznale o casovych pasmach ale v konstruktore vieme poslat hodnotu casoveho pasma z kniznice pytz

In [36]:
import pytz
len(pytz.all_timezones) # pozet vsetkych moznosti pre casove pasma 

593

In [38]:
# Ukazka casoveho pasma
pytz.all_timezones[:10]

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau']

In [42]:
ts = pd.Timestamp(1513393355, tz="Europe/Bratislava")
ts

Timestamp('1970-01-01 01:00:01.513393355+0100', tz='Europe/Bratislava')

In [52]:
# pouzitie s dataframmom 
sample_timestamps = pd.date_range("2020-01-09", freq="D", periods=3, tz="Etc/GMT+1") # pomocou funkcie daterange viem vytvorit takzvany  DatetimeIndex
sample_timestamps

DatetimeIndex(['2020-01-09 00:00:00-01:00', '2020-01-10 00:00:00-01:00',
               '2020-01-11 00:00:00-01:00'],
              dtype='datetime64[ns, Etc/GMT+1]', freq='D')

In [53]:
df = pd.DataFrame(sample_timestamps, columns=["times"])
df

Unnamed: 0,times
0,2020-01-09 00:00:00-01:00
1,2020-01-10 00:00:00-01:00
2,2020-01-11 00:00:00-01:00


In [54]:
df.times # pandas si pamata aj casovu zonu

0   2020-01-09 00:00:00-01:00
1   2020-01-10 00:00:00-01:00
2   2020-01-11 00:00:00-01:00
Name: times, dtype: datetime64[ns, Etc/GMT+1]

In [96]:
# Realny priklad datasetu 
df = pd.read_csv("dataset/timestamps_dataset.csv")
df.head()
df.timestamp

0        2018-11-13 09:39:52.794
1        2018-11-13 09:39:52.813
2        2018-11-13 09:39:52.840
3        2018-11-13 09:39:54.800
4        2018-11-13 09:39:54.867
                  ...           
10416    2018-11-13 10:00:36.872
10417    2018-11-13 10:00:36.956
10418    2018-11-13 10:00:36.989
10419    2018-11-13 10:00:37.037
10420    2018-11-13 10:00:37.054
Name: timestamp, Length: 10421, dtype: object

In [100]:
df = pd.read_csv("dataset/timestamps_dataset.csv", parse_dates=["timestamp"])
df.timestamp

0       2018-11-13 09:39:52.794
1       2018-11-13 09:39:52.813
2       2018-11-13 09:39:52.840
3       2018-11-13 09:39:54.800
4       2018-11-13 09:39:54.867
                  ...          
10416   2018-11-13 10:00:36.872
10417   2018-11-13 10:00:36.956
10418   2018-11-13 10:00:36.989
10419   2018-11-13 10:00:37.037
10420   2018-11-13 10:00:37.054
Name: timestamp, Length: 10421, dtype: datetime64[ns]

In [105]:
df.timestamp

0       2018-11-13 09:39:52.794
1       2018-11-13 09:39:52.813
2       2018-11-13 09:39:52.840
3       2018-11-13 09:39:54.800
4       2018-11-13 09:39:54.867
                  ...          
10416   2018-11-13 10:00:36.872
10417   2018-11-13 10:00:36.956
10418   2018-11-13 10:00:36.989
10419   2018-11-13 10:00:37.037
10420   2018-11-13 10:00:37.054
Name: timestamp, Length: 10421, dtype: datetime64[ns]

# Time Delta
Čo keď chcem timestampy odpočítavať? 

Timedelta ako dátový typ sa nachádza aj v pythone nie len v pandase !!!

In [106]:
x = pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, second=20, microsecond=79, nanosecond=99)
y = pd.Timestamp(year=2020, month=6, day=9, hour=8, minute=30, second=20, microsecond=79, nanosecond=89)
result=x-y
result

Timedelta('0 days 00:00:00.000000010')

In [69]:
# Záporná 
y-x

Timedelta('-31 days +00:00:00')

In [108]:
# Konštruktor v pandase 
td1 = pd.Timedelta("1 days 00:42:00.89834") # len pomocou stringu 
print(td1)

# Konstruktor v pythone 
from datetime import timedelta

td2 = timedelta(days=55, seconds=3621, microseconds=992006)
print(td2)

td1 + td2

1 days 00:42:00.898340
55 days, 1:00:21.992006


Timedelta('56 days 01:42:22.890346')

In [109]:
from datetime import datetime as dt

# vieme ich hladne pripocitavat/odpocitavat  k timestampu 
ts = pd.Timestamp(dt.now())
ts

Timestamp('2021-01-09 13:09:24.721773')

In [110]:
ts + td2

Timestamp('2021-03-05 14:09:46.713779')

In [83]:
df.timestamp + td2

0       2019-01-07 10:40:14.786006
1       2019-01-07 10:40:14.805006
2       2019-01-07 10:40:14.832006
3       2019-01-07 10:40:16.792006
4       2019-01-07 10:40:16.859006
                   ...            
10416   2019-01-07 11:00:58.864006
10417   2019-01-07 11:00:58.948006
10418   2019-01-07 11:00:58.981006
10419   2019-01-07 11:00:59.029006
10420   2019-01-07 11:00:59.046006
Name: timestamp, Length: 10421, dtype: datetime64[ns]