# Time Series 

#### In general, time series serve two purposes
- First, they help us to learn about the <strong> underlying process </strong> that generated the data.
- We would like to be able to<strong> forecast future values of the same or related series using existing data </strong>. When
we measure temperature, precipitation or wind, we would like to learn more about
more complex things, such as weather or the climate of a region and how various
factors interact. At the same time, we might be interested in weather forecasting.

#### In this chapter we will explore the time series capabilities of Pandas.
- Apart from its powerful core data structures – the series and the DataFrame – Pandas comes
with helper functions for dealing with time related data. With its extensive built-in
optimizations, Pandas is capable of handling large time series with millions of data
points with ease.

## Ở phần này:
#### Python hỗ trợ xử lý Data & Time Với thư viện Datetime
#### Ta có thể xử dụng DataFrame và Series 
#### Sử dụng hàm to_Datetime 

  - df = pd.DataFrame({' year': [2015, 2016],

                  'month': [2, 3],
                  
                    'day': [4, 5]
                  })
pd.to_datetime(df)

  - ts = pd.Series(np.random.randn(len(index)), index=["2000-01-01", "2000-01-02", "2000-01-03"])
 
tsindex = pd.to_datetime(["2000-01-01", "2000-01-02", "2000-01-03"])

#### Với Pandas ta có thể tạo ra 1 range date với hàm date_range 

# Time series primer

## 1. Working with date and time objects

###### Python supports date and time handling in the date time and time modules from the standard library:

In [41]:
import datetime 
%matplotlib inline

In [2]:
datetime.datetime(2000,1,1)

datetime.datetime(2000, 1, 1, 0, 0)

- Sometimes, dates are given or expected as strings, so a conversion from or to strings is necessary, which is realized by two functions: strptime and strftime, respectively:

In [3]:
type(datetime.datetime(2000,1,1))

datetime.datetime

In [4]:
datetime.datetime(2000,1,1).strftime("%Y%m%d")

'20000101'

In [5]:
type(datetime.datetime(2000,1,1).strftime("%Y%m%d"))

str

In [6]:
datetime.datetime.strptime("2000/1/1", "%Y/%m/%d")

datetime.datetime(2000, 1, 1, 0, 0)

In [7]:
type(datetime.datetime.strptime("2000/1/1", "%Y/%m/%d"))

datetime.datetime

- Real-world data usually comes in all kinds of shapes and it would be great if we did
not need to remember the ex4act date format specifes for parsing. Thankfully, Pandas
abstracts away a lot of the friction, when dealing with strings representing dates or
time. One of these helper functions is to_datetime:

In [8]:
import numpy as np
import pandas as pd

In [9]:
df = pd.DataFrame({'year': [2015, 2016],
                  'month': [2, 3],
                    'day': [4, 5]
                  })
pd.to_datetime(df)

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

In [10]:
pd.to_datetime("04:08:06 7-8-2000")

Timestamp('2000-07-08 04:08:06')

In [11]:
pd.to_datetime("15:05:06 4th of July of 2012")

Timestamp('2012-07-04 15:05:06')

In [12]:
pd.to_datetime("13.01.2000")

Timestamp('2000-01-13 00:00:00')

In [13]:
pd.to_datetime("7/8/2000")

Timestamp('2000-07-08 00:00:00')

- The last can refer to August 7th or July 8th, depending on the region. To <strong> disambiguate (làm cho thành một nghĩa) </strong> this case, to_datetime can be passed a keyword argument <strong> dayfirst </strong>:

In [14]:
pd.to_datetime("7/8/2000", dayfirst=True) 

Timestamp('2000-08-07 00:00:00')

- Timestamp objects can be seen as Pandas' version of datetime objects and indeed, the Timestamp class is a c of datetime

- @subclass

(logic học) lớp con

unequal s.es (thống kê) các nhóm con không đều nhau

In [15]:
issubclass(pd.Timestamp, datetime.datetime)

True

- Which means they can be used interchangeably in many cases:

In [16]:
ts = pd.to_datetime(715784800000854700)
ts

Timestamp('1992-09-06 13:06:40.000854700')

In [17]:
ts.year, ts.month, ts.day, ts.dayofweek, ts.dayofyear,ts.days_in_month,ts.daysinmonth, ts.weekday(), ts.today()

(1992, 9, 6, 6, 250, 30, 30, 6, Timestamp('2017-12-25 21:53:47.252985'))

- Timestamp objects are an important part of time series capabilities of Pandas, since timestamps are the building block of DateTimeIndex objects:

In [18]:
index = [pd.Timestamp("2000-01-01"),
pd.Timestamp("2000-01-02"),
pd.Timestamp("2000-01-03")]

In [19]:
index

[Timestamp('2000-01-01 00:00:00'),
 Timestamp('2000-01-02 00:00:00'),
 Timestamp('2000-01-03 00:00:00')]

In [20]:
ts = pd.Series(np.random.randn(len(index)), index=index)
ts

2000-01-01    0.089240
2000-01-02   -1.472215
2000-01-03    0.109225
dtype: float64

In [21]:
ts.index

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='datetime64[ns]', freq=None)

- There are a few things to note here: We create a list of <strong> timestamp objects </strong> and pass
it to the <strong> series constructor </strong> as <strong>index </strong>. <strong>This list of timestamps gets converted into a
DatetimeIndex on the ﬂy. If we had passed only the date strings, we would not
get a DatetimeIndex, just an index:</strong>

In [22]:
 ts = pd.Series(np.random.randn(len(index)), index=["2000-01-01", "2000-01-02", "2000-01-03"])
ts

2000-01-01   -0.569081
2000-01-02   -1.144054
2000-01-03    0.617624
dtype: float64

In [23]:
ts.index

Index(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='object')

- However, the to_datetime function is ﬂexible enough to be of help, if all we have
is a list of date strings:

In [24]:
index = pd.to_datetime(["2000-01-01", "2000-01-02", "2000-01-03"])

In [25]:
index = pd.to_datetime(["2000-01-01", "2000-01-02", "2000-01-03"])

In [26]:
ts.index

Index(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='object')

- Another thing to note is that while we have a DatetimeIndex, the freq and tz
attributes are both None. We will learn about the utility of both attributes later
in this chapter.

- With to_datetime we are able to convert a variety of strings and even lists of strings
into timestamp or DatetimeIndex objects. Sometimes we are not explicitly given all
the information about a series and we have to generate sequences of time stamps of
fxed intervals ourselves.

### Pandas offer another great utility function for this task: date_range

pd.date_range(start="2000-01-01", periods=3, freq='S')

#### S Secondly frequency

In [27]:
pd.date_range(start="2000-01-01", periods=4, freq='H')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:00:00',
               '2000-01-01 02:00:00', '2000-01-01 03:00:00'],
              dtype='datetime64[ns]', freq='H')

#### S Secondly frequency

In [28]:
pd.date_range(start="2000-01-01", periods=3, freq='M')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='M')

#### H Hourly frequency

In [29]:
pd.date_range(start="2000-01-01", periods=3, freq='T')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:01:00',
               '2000-01-01 00:02:00'],
              dtype='datetime64[ns]', freq='T')

#### T Minutely frequency

- <strong> periods  </strong>: integer or None, default None
    If None, must specify start and end.
    
- <strong> freq  </strong>: string or DateOffset, default 'D' (calendar daily)
    Frequency strings can have multiples, e.g. '5H'

The <strong> freq </strong> attribute allows us to specify a multitude of options. Pandas has been
used successfully in fnance and economics, not least because it is really simple to
work with business dates as well. As an example, to get an index with the frst three
business days of the millennium, the B offset alias can be used:

In [30]:
pd.date_range(start="2000-01-01", periods=3, freq='B')

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05'], dtype='datetime64[ns]', freq='B')

#### B Business day frequency

In [31]:
pd.date_range(start="2000-01-01", periods=5, freq='1D1h1min10s')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-02 01:01:10',
               '2000-01-03 02:02:20', '2000-01-04 03:03:30',
               '2000-01-05 04:04:40'],
              dtype='datetime64[ns]', freq='90070S')

- Moreover, The offset aliases can be used in combination as well. Here, we are
generating a datetime index with fve elements, each one day, one hour, one minute
and one second apart:

In [32]:
pd.date_range(start="2000-01-01", periods=5, freq='12BH')

DatetimeIndex(['2000-01-03 09:00:00', '2000-01-04 13:00:00',
               '2000-01-06 09:00:00', '2000-01-07 13:00:00',
               '2000-01-11 09:00:00'],
              dtype='datetime64[ns]', freq='12BH')

- If we want to index data every 12 hours of our business time, which by default starts
at 9 AM and ends at 5 PM, we would simply prefx the BH alias:

In [33]:
ts.index

Index(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='object')

In [34]:
pd.date_range(start="2000-01-01", periods=5, freq='W-FRI')

DatetimeIndex(['2000-01-07', '2000-01-14', '2000-01-21', '2000-01-28',
               '2000-02-04'],
              dtype='datetime64[ns]', freq='W-FRI')

- Some frequencies allow us to specify an anchoring suffx, which allows us to express intervals, such as every <strong> Friday </strong> or  <strong> every second Tuesday </strong> of the month:

In [35]:
pd.date_range(start="12-12-2017",periods=6,freq="WOM-2TUE")

DatetimeIndex(['2017-12-12', '2018-01-09', '2018-02-13', '2018-03-13',
               '2018-04-10', '2018-05-08'],
              dtype='datetime64[ns]', freq='WOM-2TUE')

Tham khảo tại: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

In [42]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://github.com/TrinhDinhPhuc/Miscellaneous-Python-Project/blob/master/Time_Series/series.png")


<br><br><center><h1 style="font-size:2em;color:#2467C0">Offset Aliases</h1></center>
<br>
<table>
<col width="550">
<col width="450">
<tr>
<td><img src="https://github.com/TrinhDinhPhuc/Miscellaneous-Python-Project/blob/master/Time_Series/series.png" align="middle" style="width:550px;height:360px;"/></td>
<td>