# Dates, Times, Classes

## Python datetime objects

In [1]:
import datetime
# Example: u'2015-08-21 06:59:35'
twitter_datetime_format = '%Y-%m-%d %H:%M:%S'
# Twitter API time string (from the CREATED_AT field)
twitter_example = u'2015-08-21 06:59:35'
datetime_inst = datetime.datetime.strptime(twitter_example,twitter_datetime_format)
datetime_inst


datetime.datetime(2015, 8, 21, 6, 59, 35)

`datetime_inst` is a special kind of object, which specifices a particular moment of time, independently of the format it was written in.

In [2]:
type(datetime_inst)

datetime.datetime

A `datetime` instance has 6 obligatorily specified attributes, which together define an instant in time.

In [4]:
#datetime.datetime(2015, 8, 21, 6, 59, 35)
print(datetime_inst.year)
print(datetime_inst.month)
print(datetime_inst.day)
print(datetime_inst.hour)
print(datetime_inst.minute)
print(datetime_inst.second)

2015
8
21
6
59
35


You can think of a datetime instance as a sort of `tuple` whose 6 elements (year,  month, day, hour, minute, second) are accessed by keywords rather than integer indexes.

In addition other calendar properties are computable, for example, the day of week as an `int`, with Monday represented as day 0. August 21, 2015 was a Friday:

In [6]:
print(datetime_inst.weekday())

4


The idea of a `datetimeobj` is that it is independent of any string format in which an instant of time might be represented.  From a `datetime` instance, you can generate a string from the datetime in any format you like, or representing any portion of the information.  First portions:

In [7]:
# the time  of day
print(datetime_inst.time())
# the date
print(datetime_inst.date())

06:59:35
2015-08-21


Now formats:

In [9]:
print(datetime_inst.strftime(twitter_datetime_format))
european_datetime_format = '%d/%m/%Y %H:%M'
print(datetime_inst.strftime(european_datetime_format))
american_date_format = '%b %d, %Y'
print(datetime_inst.strftime(american_date_format))

2015-08-21 06:59:35
21/08/2015 06:59
Aug 21, 2015


You can also print a `datetime` obj with a default format.

In [11]:
print(datetime_inst)

2015-08-21 06:59:35


Notice this is different than what you get if you just evaluate an expression in Python and let Python print back the value for you:

In [13]:
datetime_inst

datetime.datetime(2015, 8, 21, 6, 59, 35)

This is due to a standard feature of Python objects. They have more than one string representation associated with them, returned by two different methods, `__repr__` and `__str__`. 

In [16]:
datetime_inst.__repr__()

'datetime.datetime(2015, 8, 21, 6, 59, 35)'

Note the quotes.  Both `__repr__` and `__str__` return strings,  but the strings serve a different function.

In [15]:
datetime_inst.__str__()

'2015-08-21 06:59:35'

The `__repr__` method returns a string that contains a piece of code that you can execute to create another `datetime` instance just like this one.  The `__str__` method returns a "pretty" string designed to be readable and easily comprehended when printed to a screen.  Often there is no need for the two strings to be different, but the difference sometimes comes in handy.  One place where the difference makes a difference is when using the Python interpreter.  It always prints the `__repr__` string of the object returned.  Another is the `print` function.  It always prints the `__str__` string of the object it is printing.

Another thing you can do with `datetime`s is arithmetic.  You just have to remember that a time
intervals are a different type from time instances (`datetime`s).  Time intervals can be added to each other
and to `datetime`s, but `datetime`s cannot be added to each other.

In [34]:
# The time interval class
from datetime import timedelta

In [38]:
one_hour = timedelta(hours=1)

In [39]:
print(datetime_inst)
print(datetime_inst + one_hour)

2015-08-21 06:59:35
2015-08-21 07:59:35


In [43]:
print(datetime_inst + 2 * one_hour)

2015-08-21 08:59:35


In [42]:
#TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'datetime.datetime'
#datetime_inst + datetime_inst

## Pandas Timestamp objects

In [17]:
import pandas as pd
import numpy as np
import random

In [18]:
# Let's cookup some data sampled hourly over a 72 hour period
num_periods,freq = 72,'H'
rng = pd.date_range('1/1/2011', periods=num_periods, freq=freq)
# Well think of it as widgets sold in each hour, and we'll cook up numbers between 0 and 10000.
# So, 72 different sales figures, one for each hour, ranging from 0 to 10000 incluive.
S = random.sample(range(10000), 72)

The time range `rng` is just a sequence of time stamps, actually a fairly complex kind of `pandas` internal object.

In [19]:
rng

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00', '2011-01-01 05:00:00',
               '2011-01-01 06:00:00', '2011-01-01 07:00:00',
               '2011-01-01 08:00:00', '2011-01-01 09:00:00',
               '2011-01-01 10:00:00', '2011-01-01 11:00:00',
               '2011-01-01 12:00:00', '2011-01-01 13:00:00',
               '2011-01-01 14:00:00', '2011-01-01 15:00:00',
               '2011-01-01 16:00:00', '2011-01-01 17:00:00',
               '2011-01-01 18:00:00', '2011-01-01 19:00:00',
               '2011-01-01 20:00:00', '2011-01-01 21:00:00',
               '2011-01-01 22:00:00', '2011-01-01 23:00:00',
               '2011-01-02 00:00:00', '2011-01-02 01:00:00',
               '2011-01-02 02:00:00', '2011-01-02 03:00:00',
               '2011-01-02 04:00:00', '2011-01-02 05:00:00',
               '2011-01-02 06:00:00', '2011-01-02 07:00:00',
               '2011-01-

Our sample `S` is just a sequence of 72 different integers.

In [20]:
len(S)

72

In [21]:
S[:10]

[3918, 8721, 7259, 3330, 9316, 8553, 5633, 5336, 6722, 2460]

Now put the data together into a time stamped sales column, associating each number with a particular sale figure.

In [23]:
ts = pd.Series(S, index=rng)
ts
# Not necessary in this case, but if you're reading in raw time series data with independent
# time stamps (say, Tweets) it's often good practice to ensure it's in earliest->latest order.
# using the `sort_index` method.
# ts.sort_index()

2011-01-01 00:00:00    3918
2011-01-01 01:00:00    8721
2011-01-01 02:00:00    7259
2011-01-01 03:00:00    3330
2011-01-01 04:00:00    9316
                       ... 
2011-01-03 19:00:00    3981
2011-01-03 20:00:00    7243
2011-01-03 21:00:00    3328
2011-01-03 22:00:00    9692
2011-01-03 23:00:00     592
Freq: H, Length: 72, dtype: int64

One thing you can do with a `pandas` `timestamp` object is turn it into a Python `datetime` object.  They store very similar kinds of information.  The `pandas` object has some extra capabilities and extra information. Going from the `pandas` object to a `datetime` object is always possible:

In [24]:
ts.index[0]

Timestamp('2011-01-01 00:00:00', freq='H')

In [28]:
ts.index[0].to_pydatetime()

datetime.datetime(2011, 1, 1, 0, 0)

You can run the same method on the entire index to prodcue a `numpy` array of Python `datetime` instances:

In [33]:
date_array = ts.index.to_pydatetime()
date_array[:10]

array([datetime.datetime(2011, 1, 1, 0, 0),
       datetime.datetime(2011, 1, 1, 1, 0),
       datetime.datetime(2011, 1, 1, 2, 0),
       datetime.datetime(2011, 1, 1, 3, 0),
       datetime.datetime(2011, 1, 1, 4, 0),
       datetime.datetime(2011, 1, 1, 5, 0),
       datetime.datetime(2011, 1, 1, 6, 0),
       datetime.datetime(2011, 1, 1, 7, 0),
       datetime.datetime(2011, 1, 1, 8, 0),
       datetime.datetime(2011, 1, 1, 9, 0)], dtype=object)

You can do time arithmetic with the numpy array version

In [44]:
date_array[:10] + one_hour

array([datetime.datetime(2011, 1, 1, 1, 0),
       datetime.datetime(2011, 1, 1, 2, 0),
       datetime.datetime(2011, 1, 1, 3, 0),
       datetime.datetime(2011, 1, 1, 4, 0),
       datetime.datetime(2011, 1, 1, 5, 0),
       datetime.datetime(2011, 1, 1, 6, 0),
       datetime.datetime(2011, 1, 1, 7, 0),
       datetime.datetime(2011, 1, 1, 8, 0),
       datetime.datetime(2011, 1, 1, 9, 0),
       datetime.datetime(2011, 1, 1, 10, 0)], dtype=object)

As well as with the original index object

In [47]:
ts.index[:10] + one_hour

DatetimeIndex(['2011-01-01 01:00:00', '2011-01-01 02:00:00',
               '2011-01-01 03:00:00', '2011-01-01 04:00:00',
               '2011-01-01 05:00:00', '2011-01-01 06:00:00',
               '2011-01-01 07:00:00', '2011-01-01 08:00:00',
               '2011-01-01 09:00:00', '2011-01-01 10:00:00'],
              dtype='datetime64[ns]', freq='H')

To sum up:  `ts` is a Pandas `Series`; the sequence of Pandas `Timestamp`s we've been working with
is its index (Hence, `ts.index` prints out as a `DateTimeIndex`).  So we've shown
that time arithmetic is very simple with a `DateTimeIndex`.

Time computations work very much the same if the Pandas `Timestamp` sequence is a
column in a `DataFrame` (a Pandas `Series`).  Let's promote our index to be a column:

In [68]:
ts_df = pd.DataFrame(ts,columns=["Price"])

ts_df2= ts_df.reset_index(names='Dates')
ts_df2[:5]

Unnamed: 0,Dates,Price
0,2011-01-01 00:00:00,3918
1,2011-01-01 01:00:00,8721
2,2011-01-01 02:00:00,7259
3,2011-01-01 03:00:00,3330
4,2011-01-01 04:00:00,9316


In [66]:
ts_df2["Dates"] + one_hour

0    2011-01-01 01:00:00
1    2011-01-01 02:00:00
2    2011-01-01 03:00:00
3    2011-01-01 04:00:00
4    2011-01-01 05:00:00
             ...        
67   2011-01-03 20:00:00
68   2011-01-03 21:00:00
69   2011-01-03 22:00:00
70   2011-01-03 23:00:00
71   2011-01-04 00:00:00
Name: Dates, Length: 72, dtype: datetime64[ns]