In [2]:
import numpy as np
import pandas as pd

### Working With Time Series


<img src = images/time.png height=500 width = 600>
In addition to above:

For time deltas or durations, Pandas provides the `Timedelta` type. `Timedelta` is a more efficient replacement for Python's native `datetime.timedelta` type, and is based on `numpy.timedelta64`. The associated index structure/class is `TimedeltaIndex` and can be created via `to_timedelta` function.

In [3]:
pd.to_timedelta(np.arange(12), 'D')

TimedeltaIndex([ '0 days',  '1 days',  '2 days',  '3 days',  '4 days',
                 '5 days',  '6 days',  '7 days',  '8 days',  '9 days',
                '10 days', '11 days'],
               dtype='timedelta64[ns]', freq=None)

In [4]:
pd.to_datetime(['4th July, 2017','04-05-2013', '01-13-2012', '13-01-2012'])

DatetimeIndex(['2017-07-04', '2013-04-05', '2012-01-13', '2012-01-13'], dtype='datetime64[ns]', freq=None)

Notice that in above example both '01-13-2012' and '13-01-2012' have been converted to '2012-01-13' (YYYYMMDD format). 

In [5]:
pd.to_datetime(['4th July, 2017','04-05-2013', '01-13-2012', '13-01-2012'], dayfirst = True)

DatetimeIndex(['2017-07-04', '2013-05-04', '2012-01-13', '2012-01-13'], dtype='datetime64[ns]', freq=None)

By default, `dayfirst` is set to `False` which means '04-05-2013' will be read as MM-DD-YYYY and will be converted YYYY-MM-DD accordingly. However, '13-01-2012' will be read as MM-DD-YYYY for obvious reason.  

By setting `dayfirst` to `True`, we force Pandas to interpret first part to interpret as date but this is not strictly enforced. For example, while parsing '01-13-2012' with `dayfirst` set to `True`, Pandas find that 13 can't be converted to month so it 13 is interpreted as date instead. 

In [6]:
date = pd.to_datetime('4th July, 2017')
date

Timestamp('2017-07-04 00:00:00')

In [7]:
date.strftime('%A')

'Tuesday'

In [8]:
print(date.dayofweek)
print(date.dayofyear)

1
185



### Providing `format` parameter value

In Pandas, values for `format` parameter in function `to_datetime` can be referred from Python's [datetime module](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).

Some standard values are listed value - 

|format   | Meaning     | Example             | 
|---------|-------------|----------------------|
| ``%a``  | abbr day name| Sun, Mon etc         |
| ``%A``  | full day name| Sunday, Monday       | 
| ``%w``  | Weekday	     | Weekday as number, where 0 is Sun and 6 Sat.|
| ``%d``  | Day         |01, 02, 31           |
| ``%b``  | Month abbr name | Jan, Feb etc    |
| ``%B``  | Month full name | January         | 
| ``%m``  | Month as number | 01 for Jan      |
| ``%y`` | year as 2 digit | 88, 94           |
| ``%Y`` | year as 4 digit | 2013, 2014       |
| ``%H`` | Hour - 24 hour base  | 00 to 23    | 
| ``%I`` | Hour - 12 hour base  | 00 to 12    | 
| ``%p`` | AM or PM       | AM, PM          | 
| ``%M`` | Minutes        | 00 to 59        |
| ``%S`` | Seconds        | 00 to 59        |




In [9]:
date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2017-07-04', '2017-07-05', '2017-07-06', '2017-07-07',
               '2017-07-08', '2017-07-09', '2017-07-10', '2017-07-11',
               '2017-07-12', '2017-07-13', '2017-07-14', '2017-07-15'],
              dtype='datetime64[ns]', freq=None)

In [10]:
index =date + pd.to_timedelta(np.arange(12), 'H')
index

DatetimeIndex(['2017-07-04 00:00:00', '2017-07-04 01:00:00',
               '2017-07-04 02:00:00', '2017-07-04 03:00:00',
               '2017-07-04 04:00:00', '2017-07-04 05:00:00',
               '2017-07-04 06:00:00', '2017-07-04 07:00:00',
               '2017-07-04 08:00:00', '2017-07-04 09:00:00',
               '2017-07-04 10:00:00', '2017-07-04 11:00:00'],
              dtype='datetime64[ns]', freq=None)

In [11]:
data = pd.Series(range(12), index=index)
data

2017-07-04 00:00:00     0
2017-07-04 01:00:00     1
2017-07-04 02:00:00     2
2017-07-04 03:00:00     3
2017-07-04 04:00:00     4
2017-07-04 05:00:00     5
2017-07-04 06:00:00     6
2017-07-04 07:00:00     7
2017-07-04 08:00:00     8
2017-07-04 09:00:00     9
2017-07-04 10:00:00    10
2017-07-04 11:00:00    11
dtype: int64

In [12]:
data['2017-07-04 00:00:00':'2017-07-04 07:00:00']

2017-07-04 00:00:00    0
2017-07-04 01:00:00    1
2017-07-04 02:00:00    2
2017-07-04 03:00:00    3
2017-07-04 04:00:00    4
2017-07-04 05:00:00    5
2017-07-04 06:00:00    6
2017-07-04 07:00:00    7
dtype: int64

In [13]:
data['2017']

2017-07-04 00:00:00     0
2017-07-04 01:00:00     1
2017-07-04 02:00:00     2
2017-07-04 03:00:00     3
2017-07-04 04:00:00     4
2017-07-04 05:00:00     5
2017-07-04 06:00:00     6
2017-07-04 07:00:00     7
2017-07-04 08:00:00     8
2017-07-04 09:00:00     9
2017-07-04 10:00:00    10
2017-07-04 11:00:00    11
dtype: int64

In [14]:
from datetime import datetime
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

In [15]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [16]:
pd.date_range('2015-07-03', periods = 5)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07'],
              dtype='datetime64[ns]', freq='D')

In [17]:
pd.date_range('2015-07-03', periods = 5, freq = 'H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00'],
              dtype='datetime64[ns]', freq='H')

In [18]:
pd.date_range('2015-07-03', periods = 5, freq = '2H30T') #2 Hr 30 Min period 

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 02:30:00',
               '2015-07-03 05:00:00', '2015-07-03 07:30:00',
               '2015-07-03 10:00:00'],
              dtype='datetime64[ns]', freq='150T')

#### Extracting date part from datetime

In [19]:
import numpy as np
a = np.random.randint(1,11, 5)
b = pd.date_range('2015-07-03', periods = 5, freq = '2H30T') 
data = {'a':a, 'date':b}
df = pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
a       5 non-null int32
date    5 non-null datetime64[ns]
dtypes: datetime64[ns](1), int32(1)
memory usage: 100.0 bytes


In [20]:
df.head()

Unnamed: 0,a,date
0,7,2015-07-03 00:00:00
1,7,2015-07-03 02:30:00
2,3,2015-07-03 05:00:00
3,9,2015-07-03 07:30:00
4,9,2015-07-03 10:00:00


In [21]:
df['new_date'] =pd.to_datetime(df['date']).dt.date #use dt.time to extract time part
df


Unnamed: 0,a,date,new_date
0,7,2015-07-03 00:00:00,2015-07-03
1,7,2015-07-03 02:30:00,2015-07-03
2,3,2015-07-03 05:00:00,2015-07-03
3,9,2015-07-03 07:30:00,2015-07-03
4,9,2015-07-03 10:00:00,2015-07-03


Note, however, that `dtype` of column `new_date` has changed. 