# Dates, Times, Classes

## Python datetime objects

In [26]:
import datetime
# Example: u'2015-08-21 06:59:35'
twitter_datetime_format = '%Y-%m-%d %H:%M:%S'
# Twitter API time string (from the CREATED_AT field)
twitter_example = u'2015-08-21 06:59:35'
datetime_inst = datetime.datetime.strptime(twitter_example,twitter_datetime_format)
datetime_inst


datetime.datetime(2015, 8, 21, 6, 59, 35)

`datetime_inst` is a special kind of object, which specifices a particular moment of time, independently of the format it was written in.

In [27]:
type(datetime_inst)

datetime.datetime

A `datetime` instance has 6 obligatorily specified attributes, which together define an instant in time.

In [53]:
#datetime.datetime(2015, 8, 21, 6, 59, 35)
print datetime_inst.year
print datetime_inst.month
print datetime_inst.day
print datetime_inst.hour
print datetime_inst.minute
print datetime_inst.second

2015
8
21
6
59
35


You can think of a datetime instance as a sort of `tuple` whose 6 elements (year,  month, day, hour, minute, second) are accessed by keywords rather than integer indexes.

In addition other calendar properties are computable, for example, the day of week as an `int`, with Monday represented as day 0. August 21, 2015 was a Friday:

In [55]:
print datetime_inst.weekday()

4


The idea of a `datetimeobj` is that it is independent of any string format in which an instant of time might be represented.  From a `datetime` instance, you can generate a string from the datetime in any format you like, or representing any portion of the information.  First portions:

In [57]:
# the time  of day
print datetime_inst.time()
# the date
print datetime_inst.date()

06:59:35
2015-08-21


Now formats:

In [58]:
print datetime_obj.strftime(twitter_datetime_format)
european_datetime_format = '%d/%m/%Y %H:%M'
print datetime_obj.strftime(european_datetime_format)
american_date_format = '%b %d, %Y'
print datetime_obj.strftime(american_date_format)

2015-08-21 06:59:35
21/08/2015 06:59
Aug 21, 2015


You can also print a `datetime` obj with a default format.

In [42]:
print datetime_obj

2015-08-21 06:59:35


Notice this is different than what you get if you just evaluate an expression in Python and let Python print back the value for you:

In [43]:
datetime_obj

datetime.datetime(2015, 8, 21, 6, 59, 35)

This is due to a standard feature of Python objects. They have more than one string representation associated with them, returned by two different methods, `__repr__` and `__str__`. 

In [45]:
datetime_obj.__repr__()

'datetime.datetime(2015, 8, 21, 6, 59, 35)'

Note the quotes.  Both `__repr__` and `__str__` return strings,  but the strings serve a different function.

In [46]:
datetime_obj.__str__()

'2015-08-21 06:59:35'

The `__repr__` method returns a string that contains a piece of code that you can execute to create another `datetime` instance just like this one.  The `__str__` method returns a "pretty" string designed to be readable and easily comprehended when printed to a screen.  Often there is no need for the two strings to be different, but the difference sometimes comes in handy.  One place where the difference makes a difference is when using the Python interpreter.  It always prints the `__repr__` string of the object returned.  Another is the `print` function.  It always prints the `__str__` string of the object it is printing.

## Pandas Timestamp objects

In [1]:
import pandas as pd
import numpy as np
import random

In [2]:
# Let's cookup some data sampled hourly over a 72 hour period
num_periods,freq = 72,'H'
rng = pd.date_range('1/1/2011', periods=num_periods, freq=freq)
# Well think of it as widgets sold in each hour, and we'll cook up numbers between 0 and 10000.
# So, 72 different sales figures, one for each hour, ranging from 0 to 10000 incluive.
S = random.sample(range(10000), 72)

The time range `rng` is just a sequence of time stamps, actually a fairly complex kind of `pandas` internal object.

In [6]:
rng

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00', '2011-01-01 05:00:00',
               '2011-01-01 06:00:00', '2011-01-01 07:00:00',
               '2011-01-01 08:00:00', '2011-01-01 09:00:00',
               '2011-01-01 10:00:00', '2011-01-01 11:00:00',
               '2011-01-01 12:00:00', '2011-01-01 13:00:00',
               '2011-01-01 14:00:00', '2011-01-01 15:00:00',
               '2011-01-01 16:00:00', '2011-01-01 17:00:00',
               '2011-01-01 18:00:00', '2011-01-01 19:00:00',
               '2011-01-01 20:00:00', '2011-01-01 21:00:00',
               '2011-01-01 22:00:00', '2011-01-01 23:00:00',
               '2011-01-02 00:00:00', '2011-01-02 01:00:00',
               '2011-01-02 02:00:00', '2011-01-02 03:00:00',
               '2011-01-02 04:00:00', '2011-01-02 05:00:00',
               '2011-01-02 06:00:00', '2011-01-02 07:00:00',
               '2011-01-

Our sample `S` is just a sequence of 72 different integers.

In [3]:
len(S)

72

In [4]:
S[:10]

[8270, 8003, 3324, 2953, 1565, 4877, 4234, 320, 2448, 3902]

Now put the data together into a time stamped sales column, associating each number with a particular sale figure.

In [5]:
ts = pd.Series(S, index=rng)
ts
# Not necessary in this case, but if you're reading in raw time series data with independent
# time stamps (say, Tweets) it's often good practice to ensure it's in earliest->latest order.
# using the `sort_index` method.
# ts.sort_index()

2011-01-01 00:00:00    8270
2011-01-01 01:00:00    8003
2011-01-01 02:00:00    3324
2011-01-01 03:00:00    2953
2011-01-01 04:00:00    1565
2011-01-01 05:00:00    4877
2011-01-01 06:00:00    4234
2011-01-01 07:00:00     320
2011-01-01 08:00:00    2448
2011-01-01 09:00:00    3902
2011-01-01 10:00:00     881
2011-01-01 11:00:00    3686
2011-01-01 12:00:00    2135
2011-01-01 13:00:00    1250
2011-01-01 14:00:00    3387
2011-01-01 15:00:00    7923
2011-01-01 16:00:00    3519
2011-01-01 17:00:00     954
2011-01-01 18:00:00    4779
2011-01-01 19:00:00     622
2011-01-01 20:00:00     685
2011-01-01 21:00:00    5503
2011-01-01 22:00:00    2486
2011-01-01 23:00:00    7287
2011-01-02 00:00:00    3795
2011-01-02 01:00:00    1272
2011-01-02 02:00:00    5529
2011-01-02 03:00:00     230
2011-01-02 04:00:00    2106
2011-01-02 05:00:00     854
                       ... 
2011-01-02 18:00:00    7031
2011-01-02 19:00:00    3594
2011-01-02 20:00:00    1583
2011-01-02 21:00:00    2848
2011-01-02 22:00:00 

One thing you can do with a `pandas` `timestamp` object is turn it into a Python `datetime` object.  They store very similar kinds of information.  The `pandas` object has some extra capabilities and extra information. Going from the `pandas` object to a `datetime` object is always possible:

In [11]:
ts.index[0]

Timestamp('2011-01-01 00:00:00', offset='H')

In [10]:
ts.index[0].to_datetime()

datetime.datetime(2011, 1, 1, 0, 0)