<h1>datetime library</h1>

* Time is linear
* It progresses as a straightline trajectory from the big bang to now and then into the future. It never stops

<h3>Reasoning about time is important in data analysis</h3>

* Analyzing financial timeseries data
* Looking at commuter transit passenger flows by time of day 
* Understanding web traffic by time of day 
* Examining seasonality in department store purchases

<h3>The datetime library</h3>
* understands the relationship between different points of time
* understands how to do operations on time

<h3>Example:</h3>
<li>Which is greater? "10/24/2017" or "11/24/2016"

In [1]:
d1 = "10/24/2017"
d2 = "11/24/2016"

# get latest date
max(d1,d2)

'11/24/2016'

That's not right, as this is seen as a string (11 > 10)

<li>How much time has passed?

In [2]:
d1 - d2

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Obviously these won't work. We can't do date operations on *strings*. Let's see what happens with **datetime**.

In [3]:
import datetime

d1 = datetime.date(2016,11,24)
d2 = datetime.date(2017,10,24)

max(d1,d2)

datetime.date(2017, 10, 24)

In [4]:
print(d2 - d1)

334 days, 0:00:00


See? **datetime objects** *understand* time.

<h3>The datetime library contains several useful types</h3>
* **date**: stores the date (month,day,year)
* **time**: stores the time (hours,minutes,seconds)
* **datetime**: stores the date as well as the time (month,day,year,hours,minutes,seconds)
* **timedelta**: duration between two datetim/date objects

<h3>datetime.date</h3>

In [5]:
# get date of start of century and today
century_start = datetime.date(2000,1,1)
today = datetime.date.today()

print(century_start,today,'\n')
print("We are",today - century_start,"days into this century"'\n')

# for cleaner output, use .days() method from the datetime library
print("We are",(today - century_start).days,"days into this century")

2000-01-01 2017-08-22 

We are 6443 days, 0:00:00 days into this century

We are 6443 days into this century


<h3>datetime.datetime</h3>

In [12]:
century_start = datetime.datetime(2000,1,1,0,0,0)
time_now = datetime.datetime.now()

print(century_start,time_now,'\n')
print("We are",time_now - century_start,"days, hours, " 
      "minutes and seconds into this century")

2000-01-01 00:00:00 2017-08-22 19:08:51.134867 

We are 6443 days, 19:08:51.134867 days, hours, minutes and seconds into this century


In [10]:
century_start = datetime.datetime(2000,1,1,0,0,0)
time_now = datetime.datetime.now()

print(century_start,time_now,'\n')
print("We are {} days, hours, " 
      "minutes and seconds into this century".format(time_now - century_start))

2000-01-01 00:00:00 2017-08-22 19:08:32.271796 

We are 6443 days, 19:08:32.271796 days, hours, minutes and seconds into this century


**datetime objects can check validity**

A ValueError exception is raised if the object is invalid

In [13]:
some_date = datetime.date(2015,2,29)

ValueError: day is out of range for month

In [14]:
# 2015 is NOT a leap year, 2016 is
some_date = datetime.date(2016,2,29)
some_date

datetime.date(2016, 2, 29)

In [15]:
some_time = datetime.datetime(2015,2,28,23,60,0)
some_time

ValueError: minute must be in 0..59

**datetime.timedelta**
* Used to store the duration between 3 points in time

In [17]:
century_start = datetime.datetime(2000,1,1,0,0,0)
time_now = datetime.datetime.now()

time_since_century_start = time_now - century_start

print("days since century start:",time_since_century_start.days)
print("seconds since century start:" ,time_since_century_start.total_seconds())
print("minutes since century start:",time_since_century_start.total_seconds()/60)
print("hours since century start:",time_since_century_start.total_seconds()/60/60)

days since century start: 6443
seconds since century start: 556744291.859124
minutes since century start: 9279071.530985398
hours since century start: 154651.19218308997


In [23]:
# create time object of current time
t1 = datetime.datetime.now().time()

# create timedelta object of 5 seconds
t2 = datetime.timedelta(seconds=5)

print(t1) 
print(t2)

# add 5 seconds to current time
print(t1 + t2)

19:15:20.048757
<class 'datetime.time'>


TypeError: unsupported operand type(s) for +: 'datetime.time' and 'datetime.timedelta'

<h3>datetime.time</h3>

In [19]:
date_and_time_now = datetime.datetime.now()
time_now = date_and_time_now.time()

# date and time
print(date_and_time_now,'\n') 

# just time
print(time_now)

2017-08-22 19:12:32.397839 

19:12:32.397839


<h4>You can do arithmetic operations on datetime objects</h4>
<li>You can use timedelta objects to calculate new dates or times from a given date

In [25]:
# get todays date in datetime.DATE object
today = datetime.date.today()
print(today)

2017-08-22


In [26]:
# add 5 days to today's date
five_days_later = today + datetime.timedelta(days = 5)
print(five_days_later)

2017-08-27


In [27]:
now = datetime.datetime.today()
print(now)

2017-08-22 19:21:49.859243


In [28]:
five_minutes_and_five_seconds_later = now + datetime.timedelta(minutes = 5,
                                                               seconds = 5)
print(five_minutes_and_five_seconds_later)

2017-08-22 19:26:54.859243


In [29]:
five_minutes_and_five_seconds_earlier = now + datetime.timedelta(minutes = -5,
                                                                 seconds = -5)
print(five_minutes_and_five_seconds_earlier)

2017-08-22 19:16:44.859243


***Can't use timedelta on TIME objects.***

* If you do, you'll get a TypeError exception

In [30]:
# Return the time component of current time (drop day)
time_now = datetime.datetime.now().time() 
print(time_now)

19:22:34.917735


In [33]:
# add 30 seconds
thirty_seconds = datetime.timedelta(seconds = 30)

time_later = time_now + thirty_seconds

TypeError: unsupported operand type(s) for +: 'datetime.time' and 'datetime.timedelta'

Bug or feature? There is confusion to if we crossed more than a day or not

But this is Python! We can always get around something by writing a new function!

Let's write a small function to get around this problem

In [34]:
# function to create artifical datetime object for 1/1/500 and the hour, 
#   minute, and second from the 1st arg
#      - it then adds the delta argument to this object and THEN gets 
#        the time delta
def add_to_time(time_object,time_delta):
    import datetime
    
    # create date object of 1/1/500 and add the time from 1st argument 
    temp_datetime_object = datetime.datetime(500,1,1,
                                             time_object.hour,
                                             time_object.minute,
                                             time_object.second)
    #print(temp_datetime_object)
    return (temp_datetime_object + time_delta).time()

In [35]:
# Test it by adding 30 seconds to right now
time_now = datetime.datetime.now().time()
thirty_seconds = datetime.timedelta(seconds = 30)

print(time_now,'\n\n',add_to_time(time_now,thirty_seconds))

19:25:28.220427 

 19:25:58


<h2>datetime and strings</h2>

More often than not, a program will need to get the date or time from a string:
* From a website (bus/train timings)
* From a file (date or datetime associated with a stock price)
* From the user (from the input statement)

Python needs to parse the string so that it correctly creates a date or time object


<h4>datetime.strptime</h4>

* This grabs time from a string + creates a date/datetime/time object
* The programmer needs to tell the function what format the string is using
* See http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html for how to specify the format

In [36]:
test_date = '01-Apr-03'

# create date object from the given date above
date_object = datetime.datetime.strptime(test_date,'%d-%b-%y')
print(date_object)

2003-04-01 00:00:00


In [37]:
#Unfortunately, there is no similar thing for time delta, so we have to be creative!
bus_travel_time = '2:15:30'

# split string into 3 strings via .split() and store each section into a proper variable
hours, minutes, seconds = bus_travel_time.split(':')

# create new time object with the stripped out time value from above
x = datetime.timedelta(hours = int(hours),minutes = int(minutes),seconds = int(seconds))
print(x)

2:15:30


In [39]:
# Function that will do this for a particular format
def get_timedelta(time_string):
    import datetime
    
    hours, minutes, seconds = time_string.split(':')
    return datetime.timedelta(hours = int(hours),minutes = int(minutes), seconds = int(seconds))

In [41]:
travel_time = '4:40:36'
print(get_timedelta(travel_time))

4:40:36


<h4>datetime.strftime</h4>
* This flips the strptime function + converts a datetime object to a string with a specified format

In [45]:
now = datetime.datetime.now()

# turn current datetime into a string
string_now = datetime.datetime.strftime(now,'%m/%d/%y %H:%M:%S')
print(now,'\n\n',string_now)

2017-07-21 15:39:55.809175 

 07/21/17 15:39:55


In [43]:
# Or you can use the default conversion
print(str(now)) 

2017-07-21 15:39:36.393175


In [165]:
# return dictionary of word counts
# Assume strings have at most 1 punctuation symbol at the end or 1 punctuation symbol at the beginning 
# ignore punctuation that appears anywhere else. F
def word_distribution(string):
    # split string into a list
    split_string = string.split(' ')
    
    # initiate dictionary
    result = {}    
    
    # for each item in the list
    for i in split_string:
        # if it doesn't end in a punctuation, keep i as i
        if i[-1].isalpha():  
            i = i
        # if it DOES end in a punctuation, remove the punctuation
        else:
            i = i[:len(i)-1]
            
        # if word isn't in dictionary yet, initiate count = 1
        if i not in result:
            result[i.lower()] = 1
        # if word is in dictionary, add to its count
        else:
            result[i.lower()] += 1
            
    return result

In [166]:
text_string = 'Hello. How are you? Please say hello if you don’t love me!'

word_distribution(text_string)

{'are': 1,
 'don’t': 1,
 'hello': 2,
 'how': 1,
 'if': 1,
 'love': 1,
 'me': 1,
 'please': 1,
 'say': 1,
 'you': 2}

In [93]:
text_string = "That's when I saw Jane (John's sister)!"

word_distribution(text_string)

{"(john's": 1,
 'i': 1,
 'jane': 1,
 'saw': 1,
 'sister)!': 1,
 "that's": 1,
 'when': 1}