# Dates, Times, Classes

## Python datetime objects

In [9]:
import datetime
# Example: u'2015-08-21 06:59:35'
twitter_datetime_format = '%Y-%m-%d %H:%M:%S'
# Twitter API time string (from the CREATED_AT field)
twitter_example = u'2015-08-21 06:59:35'
datetime_inst = datetime.datetime.strptime(twitter_example,twitter_datetime_format)
datetime_inst


datetime.datetime(2015, 8, 21, 6, 59, 35)

`datetime_inst` is a special kind of object, which specifices a particular moment of time, independently of the format it was written in.

In [10]:
type(datetime_inst)

datetime.datetime

A `datetime` instance has 6 obligatorily specified attributes, which together define an instant in time.

In [11]:
#datetime.datetime(2015, 8, 21, 6, 59, 35)
print(datetime_inst.year)
print(datetime_inst.month)
print(datetime_inst.day)
print(datetime_inst.hour)
print(datetime_inst.minute)
print(datetime_inst.second)

2015
8
21
6
59
35


You can think of a datetime instance as a sort of `tuple` whose 6 elements (year,  month, day, hour, minute, second) are accessed by keywords rather than integer indexes.

In addition other calendar properties are computable, for example, the day of week as an `int`, with Monday represented as day 0. August 21, 2015 was a Friday:

In [12]:
print(datetime_inst.weekday())

4


The idea of a `datetimeobj` is that it is independent of any string format in which an instant of time might be represented.  From a `datetime` instance, you can generate a string from the datetime in any format you like, or representing any portion of the information.  First portions:

In [13]:
# the time  of day
print(datetime_inst.time())
# the date
print(datetime_inst.date())

06:59:35
2015-08-21


Now formats:

In [14]:
print(datetime_inst.strftime(twitter_datetime_format))
european_datetime_format = '%d/%m/%Y %H:%M'
print(datetime_inst.strftime(european_datetime_format))
american_date_format = '%b %d, %Y'
print(datetime_inst.strftime(american_date_format))

2015-08-21 06:59:35
21/08/2015 06:59
Aug 21, 2015


You can also print a `datetime` obj with a default format.

In [15]:
print(datetime_inst)

2015-08-21 06:59:35


Notice this is different than what you get if you just evaluate an expression in Python and let Python print back the value for you:

In [16]:
datetime_inst

datetime.datetime(2015, 8, 21, 6, 59, 35)

This is due to a standard feature of Python objects. They have more than one string representation associated with them, returned by two different methods, `__repr__` and `__str__`. 

In [17]:
datetime_inst.__repr__()

'datetime.datetime(2015, 8, 21, 6, 59, 35)'

Note the quotes.  Both `__repr__` and `__str__` return strings,  but the strings serve a different function.

In [18]:
datetime_inst.__str__()

'2015-08-21 06:59:35'

The `__repr__` method returns a string that contains a piece of code that you can execute to create another `datetime` instance just like this one.  The `__str__` method returns a "pretty" string designed to be readable and easily comprehended when printed to a screen.  Often there is no need for the two strings to be different, but the difference sometimes comes in handy.  One place where the difference makes a difference is when using the Python interpreter.  It always prints the `__repr__` string of the object returned.  Another is the `print` function.  It always prints the `__str__` string of the object it is printing.

Another thing you can do with `datetime`s is arithmetic.  You just have to remember that a time
intervals are a different type from time instances (`datetime`s).  Time intervals can be added to each other
and to `datetime`s, but `datetime`s cannot be added to each other.

In [19]:
# The time interval class
from datetime import timedelta

In [20]:
one_hour = timedelta(hours=1)

In [21]:
print(datetime_inst)
print(datetime_inst + one_hour)

2015-08-21 06:59:35
2015-08-21 07:59:35


In [22]:
print(datetime_inst + 2 * one_hour)

2015-08-21 08:59:35


In [24]:
print(datetime_inst + (5 * one_hour)/2)

2015-08-21 09:29:35


In [23]:
#TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'datetime.datetime'
#datetime_inst + datetime_inst

## Pandas Timestamp objects

In [148]:
import pandas as pd
import numpy as np
import random

In [149]:
# Let's cookup some data sampled hourly over a 120 day period
num_periods = 120*24
freq = 'H'
rng = pd.date_range('1/1/2024', periods=num_periods, freq=freq)
# Well think of it as widgets sold in each hour, and we'll cook up numbers between 0 and 10000.
# So, 72 different sales figures, one for each hour, ranging from 0 to 10000 incluive.
S = [np.random.normal() for i in range(num_periods)]

Our sample `S` is just a sequence of values with mean 0 (STD = 1).

In [150]:
S[:10]

[1.422630569914767,
 0.5773597261191238,
 0.8918447148869114,
 0.14568549068802922,
 -1.1777192854766498,
 -0.5505724919707763,
 -0.07589472478032998,
 -0.6940048112327263,
 -0.5906729855174389,
 0.6464778580219521]

In [151]:
len(S)

2880

The time range `rng` is a sequence of time stamps, actually a fairly complex kind of `pandas` internal object.

In [152]:
rng

DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 01:00:00',
               '2024-01-01 02:00:00', '2024-01-01 03:00:00',
               '2024-01-01 04:00:00', '2024-01-01 05:00:00',
               '2024-01-01 06:00:00', '2024-01-01 07:00:00',
               '2024-01-01 08:00:00', '2024-01-01 09:00:00',
               ...
               '2024-04-29 14:00:00', '2024-04-29 15:00:00',
               '2024-04-29 16:00:00', '2024-04-29 17:00:00',
               '2024-04-29 18:00:00', '2024-04-29 19:00:00',
               '2024-04-29 20:00:00', '2024-04-29 21:00:00',
               '2024-04-29 22:00:00', '2024-04-29 23:00:00'],
              dtype='datetime64[ns]', length=2880, freq='H')

In [153]:
S[:10]

[1.422630569914767,
 0.5773597261191238,
 0.8918447148869114,
 0.14568549068802922,
 -1.1777192854766498,
 -0.5505724919707763,
 -0.07589472478032998,
 -0.6940048112327263,
 -0.5906729855174389,
 0.6464778580219521]

Now put the data together into a time stamped sales column, associating each number with a particular sale figure.

In [154]:
ts = pd.Series(S, index=rng)
ts
# Not necessary in this case, but if you're reading in raw time series data with independent
# time stamps (say, Tweets) it's often good practice to ensure it's in earliest->latest order.
# using the `sort_index` method.
# ts.sort_index()

2024-01-01 00:00:00    1.422631
2024-01-01 01:00:00    0.577360
2024-01-01 02:00:00    0.891845
2024-01-01 03:00:00    0.145685
2024-01-01 04:00:00   -1.177719
                         ...   
2024-04-29 19:00:00   -1.017564
2024-04-29 20:00:00    1.452105
2024-04-29 21:00:00    0.410665
2024-04-29 22:00:00   -0.791640
2024-04-29 23:00:00   -0.530406
Freq: H, Length: 2880, dtype: float64

One thing you can do with a `pandas` `timestamp` object is turn it into a Python `datetime` object.  They store very similar kinds of information.  The `pandas` object has some extra capabilities and extra information. Going from the `pandas` object to a `datetime` object is always possible:

In [155]:
ts.index[0]

Timestamp('2024-01-01 00:00:00', freq='H')

In [156]:
ts.index[0].to_pydatetime()

datetime.datetime(2024, 1, 1, 0, 0)

You can run the same method on the entire index to produce a `numpy` array of Python `datetime` instances:

In [170]:
date_array = ts.index.to_pydatetime()
date_array[:10]

array([datetime.datetime(2024, 1, 1, 0, 0),
       datetime.datetime(2024, 1, 1, 1, 0),
       datetime.datetime(2024, 1, 1, 2, 0),
       datetime.datetime(2024, 1, 1, 3, 0),
       datetime.datetime(2024, 1, 1, 4, 0),
       datetime.datetime(2024, 1, 1, 5, 0),
       datetime.datetime(2024, 1, 1, 6, 0),
       datetime.datetime(2024, 1, 1, 7, 0),
       datetime.datetime(2024, 1, 1, 8, 0),
       datetime.datetime(2024, 1, 1, 9, 0)], dtype=object)

You can do time arithmetic with the numpy array version

In [158]:
date_array[:10] + one_hour

array([datetime.datetime(2024, 1, 1, 1, 0),
       datetime.datetime(2024, 1, 1, 2, 0),
       datetime.datetime(2024, 1, 1, 3, 0),
       datetime.datetime(2024, 1, 1, 4, 0),
       datetime.datetime(2024, 1, 1, 5, 0),
       datetime.datetime(2024, 1, 1, 6, 0),
       datetime.datetime(2024, 1, 1, 7, 0),
       datetime.datetime(2024, 1, 1, 8, 0),
       datetime.datetime(2024, 1, 1, 9, 0),
       datetime.datetime(2024, 1, 1, 10, 0)], dtype=object)

As well as with the original index object

In [159]:
ts.index[:10] + one_hour

DatetimeIndex(['2024-01-01 01:00:00', '2024-01-01 02:00:00',
               '2024-01-01 03:00:00', '2024-01-01 04:00:00',
               '2024-01-01 05:00:00', '2024-01-01 06:00:00',
               '2024-01-01 07:00:00', '2024-01-01 08:00:00',
               '2024-01-01 09:00:00', '2024-01-01 10:00:00'],
              dtype='datetime64[ns]', freq='H')

To sum up:  `ts` is a Pandas `Series`; the sequence of Pandas `Timestamp`s we've been working with
is its index (Hence, `ts.index` prints out as a `DateTimeIndex`).  So we've shown
that time arithmetic is very simple with a `DateTimeIndex`.

Time computations work very much the same if the Pandas `Timestamp` sequence is a
column in a `DataFrame` (a Pandas `Series`).  Let's promote our index to be a column:

In [160]:
ts_df = pd.DataFrame(ts,columns=["P-Level"])

ts_df2= ts_df.reset_index(names='Dates')
ts_df2[:5]

Unnamed: 0,Dates,P-Level
0,2024-01-01 00:00:00,1.422631
1,2024-01-01 01:00:00,0.57736
2,2024-01-01 02:00:00,0.891845
3,2024-01-01 03:00:00,0.145685
4,2024-01-01 04:00:00,-1.177719


In [161]:
ts_df2["Dates"] + one_hour

0      2024-01-01 01:00:00
1      2024-01-01 02:00:00
2      2024-01-01 03:00:00
3      2024-01-01 04:00:00
4      2024-01-01 05:00:00
               ...        
2875   2024-04-29 20:00:00
2876   2024-04-29 21:00:00
2877   2024-04-29 22:00:00
2878   2024-04-29 23:00:00
2879   2024-04-30 00:00:00
Name: Dates, Length: 2880, dtype: datetime64[ns]

Let's get all the Tuesday data

In [162]:
tuesday_data = ts_df[ts_df.index.weekday == 1]
len(tuesday_data)

408

In [163]:
#  len(ts_df)%7 == 1 because the data starts and ends on a Monday
#  Subtract a day's worth of samples to strip off the extra Monday data
(len(ts_df) - 24)/7

408.0

In [164]:
## Let's get all the Frebruary data (we dont use 0-based indexing for months)
february_data = ts_df[ts_df.index.month == 2]
len(february_data)/24

29.0

In [165]:
february_data.iloc[:2], february_data.iloc[-2:],

(                      P-Level
 2024-02-01 00:00:00 -0.706495
 2024-02-01 01:00:00  0.810058,
                       P-Level
 2024-02-29 22:00:00 -0.808593
 2024-02-29 23:00:00 -1.487924)

How do we the same with `ts_df2` (where the TimeStamp info is a column)?

In [166]:
# This is an Attribute Error: 'Series' object has no attribute 'month'
#ts_df2["Dates"].month == 2

This doesn't work.  We don't have a special type for a TimeStamp typ3 `Series`.  What we do instead is use
a special TimeStamp method accessor (paralleling the `.str` accessor for string columns):

In [167]:
february_data2 = ts_df2[ts_df2["Dates"].dt.month == 2]

We get hold of the same price sequence whichever way we represented the data

In [169]:
(february_data["P-Level"].values == february_data2["P-Level"].values).all()

True

Consider the case of strings in an index for a moment.

In [140]:
xdf = pd.DataFrame({
           "births":[27,137,513]},index="Bob Dave Bill".split())
xdf

Unnamed: 0,births
Bob,27
Dave,137
Bill,513


String Indexes can use the `.str` accessor too:

In [143]:
xdf[xdf.index.str[0]  == "B"]

Unnamed: 0,births
Bob,27
Bill,513


But there is no `.dt` accessor for a `DatetimeIndex`:

In [147]:
#Attribute Error:  'DatetimeIndex' object has no attribute 'dt'
# ts_df.index.dt.month == 2

because `DateTimeIndex` is a type specialized for Time Series data: so TimeStamp methods
will all automatically work and all automatically be "vectorized".
To illustrate with one more example: the following gets the mean reading for  midnight samples:

In [184]:
ts_df[ts_df.index.hour == 0].mean()

P-Level    0.10471
dtype: float64