## Working with Dates and Times Modules

#### Looking at some built-in python functions (non-Pandas)
datetime is a built-in module but not triggered by default so you still need to import it. 

In [1]:
import pandas as pd
import datetime as dt   # not an external library. It's a module (built-into the python language) that's not triggered by default

## Review of Python's datetime Module
A module is a python source file, sort of like an internal library that python loads on demand when needed. There is a datetime method within the datetime module. The datetime method creates a datetime object. A date object is a smaller version of datetime objects.

In [2]:
someday = dt.date(year = 2010, month = 4, day = 12)

In [3]:
# call it then .tab you'll see all methods/attributes available for date objects
someday.year
someday.month
someday.day

12

In [4]:
dt.datetime(2010, 1, 20, 17, 13, 57)  # Time will default to midnight (hr = 0, min = 0)
str(dt.datetime(2010, 1, 20, 17, 13, 57))

'2010-01-20 17:13:57'

In [5]:
str(someday)  # just the date because it's a date object

'2010-04-12'

In [6]:
sometime = dt.datetime(2010, 1, 20, 17, 13, 57)

In [7]:
sometime.year
sometime.month
sometime.day
sometime.hour
sometime.minute
sometime.second

57

In [8]:
# Useful when working with dates and you want to do revenue by month
# No need for big complex IF statement. Just extract month value and create a new series

## The pandas Timestamp Object
Pandas version of datetime object. Doesn't need the time component, will default to 0,0,0 for midnight.

In [9]:
pd.Timestamp(ts_input = "2015-03-31")
pd.Timestamp(ts_input = "2015/03/31")
pd.Timestamp("2013, 11, 04")
pd.Timestamp("1/1/2015")
pd.Timestamp(ts_input = "12/19/2015")
pd.Timestamp(ts_input = "4/3/2000")
pd.Timestamp(ts_input = "2021-03-08 08:35:15")
pd.Timestamp("2021-03-08 6:13:29 PM")

Timestamp('2021-03-08 18:13:29')

In [10]:
# Can also pass python datetime objects
pd.Timestamp(dt.date(2015, 1, 1))

Timestamp('2015-01-01 00:00:00')

In [11]:
pd.Timestamp(dt.datetime(2000, 2, 3, 21, 35, 22 ))

Timestamp('2000-02-03 21:35:22')

In [12]:
# Why do we need Timestamp than Datetime object? More flexibility, methods, etc.

## The pandas DateTimeIndex Object
A collection of pandas Timestamps. Useful for setting as the index of a series or dataframe object.

In [13]:
dates = ["2016-01-02", "2016-04-12", "2009-09-07"]
pd.DatetimeIndex(data = dates) # will convert strings to pd ts, then store ts into a new object

DatetimeIndex(['2016-01-02', '2016-04-12', '2009-09-07'], dtype='datetime64[ns]', freq=None)

In [14]:
type(pd.DatetimeIndex(dates))

pandas.core.indexes.datetimes.DatetimeIndex

In [15]:
dates = [dt.date(2016, 1, 10), dt.date(1994, 6, 13), dt.date(2003, 12, 29)]
dtIndex = pd.DatetimeIndex(dates)

In [16]:
# Let's create a series
values = [100, 200, 300]
pd.Series(data = values, index = dtIndex)

2016-01-10    100
1994-06-13    200
2003-12-29    300
dtype: int64

## The pd.to_datetime() Method
Convenient method to convert existing object into a pandas time-related object. String, date, datetime object, etc. One difference between pd.Timestamp is that if you pass a list to pd.to_datetime it will return a DatetimeIndex object. **The most common use case for this method is to pass an existing series and converting to a Timestamp object**

In [17]:
pd.to_datetime("2001-04-19") #Returns TS
pd.to_datetime(dt.date(2015, 1, 1)) # TS
pd.to_datetime(dt.datetime(2015, 1, 1, 14, 25, 20)) # TS
pd.to_datetime(["2015-01-03", "2014/02/08", "2016", "July 4th, 1996"]) #DTI

DatetimeIndex(['2015-01-03', '2014-02-08', '2016-01-01', '1996-07-04'], dtype='datetime64[ns]', freq=None)

In [18]:
# defaults to a string Series
times = pd.Series(data = ["2015-01-03", "2014/02/08", "2016", "July 4th, 1996"])
times

0        2015-01-03
1        2014/02/08
2              2016
3    July 4th, 1996
dtype: object

In [19]:
pd.to_datetime(arg = times) # dtype is now datetime64. Normalizes structure

0   2015-01-03
1   2014-02-08
2   2016-01-01
3   1996-07-04
dtype: datetime64[ns]

In [20]:
# if data is bad, need to be careful
dates2 = pd.Series(data = ["July 4th, 1996", "10/4/1991", "Hello", "2015-02-31"])
dates2

0    July 4th, 1996
1         10/4/1991
2             Hello
3        2015-02-31
dtype: object

In [21]:
# Look at errors = "coerce" param. Default is 'raise'
pd.to_datetime(dates2, errors = "coerce")

0   1996-07-04
1   1991-10-04
2          NaT
3          NaT
dtype: datetime64[ns]

In [22]:
# Working with UNIX time, which is a way to store time in seconds
# from/since 1970-01-01
pd.to_datetime(arg = [1349720105, 1349806505, 1349979305, 1349892905, 1350065705], unit= "s")

DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05',
               '2012-10-11 18:15:05', '2012-10-10 18:15:05',
               '2012-10-12 18:15:05'],
              dtype='datetime64[ns]', freq=None)

In [23]:
# How to filter on NaN or NaT?
dates2 = pd.to_datetime(arg = dates2, errors = "coerce")
dates2

0   1996-07-04
1   1991-10-04
2          NaT
3          NaT
dtype: datetime64[ns]

In [24]:
dates_df = pd.DataFrame(data = dates2)

In [25]:
dates_df

Unnamed: 0,0
0,1996-07-04
1,1991-10-04
2,NaT
3,NaT


In [26]:
dates_df.dtypes
dates_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
0    2 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 112.0 bytes


In [27]:
dates_df[0].isnull()

0    False
1    False
2     True
3     True
Name: 0, dtype: bool

## Create Range of Dates with the pd.date_range() Method, Part 1
Returns a DateTimeIndex

In [28]:
times = pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "D")

In [29]:
type(times)

pandas.core.indexes.datetimes.DatetimeIndex

In [30]:
times[0]

Timestamp('2016-01-01 00:00:00', freq='D')

In [31]:
# freq = "D", "1D", "2" (two days), "H" hour, "6H", B" business days, "W" -SUN, "W-FRI", etc.
# "M" month end, "MS" month start, "A" annual, "A-DEC", etc.
pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "D")
pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "B")
pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "W")
pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "W-FRI")
pd.date_range(start = "2016-01-01", end = "2016-01-10", freq = "6H")
pd.date_range(start = "2016-01-01", end = "2016-12-31", freq = "M")
pd.date_range(start = "2016-01-01", end = "2016-12-31", freq = "MS")
pd.date_range(start = "2016-01-01", end = "2050-12-31", freq = "A-SEP")

DatetimeIndex(['2016-09-30', '2017-09-30', '2018-09-30', '2019-09-30',
               '2020-09-30', '2021-09-30', '2022-09-30', '2023-09-30',
               '2024-09-30', '2025-09-30', '2026-09-30', '2027-09-30',
               '2028-09-30', '2029-09-30', '2030-09-30', '2031-09-30',
               '2032-09-30', '2033-09-30', '2034-09-30', '2035-09-30',
               '2036-09-30', '2037-09-30', '2038-09-30', '2039-09-30',
               '2040-09-30', '2041-09-30', '2042-09-30', '2043-09-30',
               '2044-09-30', '2045-09-30', '2046-09-30', '2047-09-30',
               '2048-09-30', '2049-09-30', '2050-09-30'],
              dtype='datetime64[ns]', freq='A-SEP')

## Create Range of Dates with the pd.date_range() Method, Part 2

In [32]:
# start and periods (the number of results/timestamps we want to generate)
pd.date_range(start = "2012-09-12", periods = 25, freq = "D")
pd.date_range(start = "2012-09-09", periods = 50, freq = "B") # business days
pd.date_range(start = "19810529", periods = 200, freq = "M")
pd.date_range(start = "2012-09-09", periods = 50, freq = "W-Tue")
pd.date_range(start = "2012-09-12", periods = 50, freq = "6H")
pd.date_range(start = "1981-05-29", end = "2018-04-21", freq = "D")

DatetimeIndex(['1981-05-29', '1981-05-30', '1981-05-31', '1981-06-01',
               '1981-06-02', '1981-06-03', '1981-06-04', '1981-06-05',
               '1981-06-06', '1981-06-07',
               ...
               '2018-04-12', '2018-04-13', '2018-04-14', '2018-04-15',
               '2018-04-16', '2018-04-17', '2018-04-18', '2018-04-19',
               '2018-04-20', '2018-04-21'],
              dtype='datetime64[ns]', length=13477, freq='D')

## Create Range of Dates with the pd.date_range() Method, Part 3

In [33]:
pd.date_range(end = "1999-12-31", periods = 40, freq = "D")
pd.date_range(end = "2018-04-20", start = "2000-01-01", freq = "W-SUN")
pd.date_range(end = "2018-04-20", periods = 50, freq = "MS")
pd.date_range(end = "2018-04-20", periods = 100, freq = "8H")

DatetimeIndex(['2018-03-18 00:00:00', '2018-03-18 08:00:00',
               '2018-03-18 16:00:00', '2018-03-19 00:00:00',
               '2018-03-19 08:00:00', '2018-03-19 16:00:00',
               '2018-03-20 00:00:00', '2018-03-20 08:00:00',
               '2018-03-20 16:00:00', '2018-03-21 00:00:00',
               '2018-03-21 08:00:00', '2018-03-21 16:00:00',
               '2018-03-22 00:00:00', '2018-03-22 08:00:00',
               '2018-03-22 16:00:00', '2018-03-23 00:00:00',
               '2018-03-23 08:00:00', '2018-03-23 16:00:00',
               '2018-03-24 00:00:00', '2018-03-24 08:00:00',
               '2018-03-24 16:00:00', '2018-03-25 00:00:00',
               '2018-03-25 08:00:00', '2018-03-25 16:00:00',
               '2018-03-26 00:00:00', '2018-03-26 08:00:00',
               '2018-03-26 16:00:00', '2018-03-27 00:00:00',
               '2018-03-27 08:00:00', '2018-03-27 16:00:00',
               '2018-03-28 00:00:00', '2018-03-28 08:00:00',
               '2018-03-

## The .dt Accessor
Similar to working with strings .str.xxxxx. 

In [34]:
# Create a DTIndex
bunch_of_dates = pd.date_range(start = "2000-01-01", end = "2010-12-31", freq = "24D")
bunch_of_dates

DatetimeIndex(['2000-01-01', '2000-01-25', '2000-02-18', '2000-03-13',
               '2000-04-06', '2000-04-30', '2000-05-24', '2000-06-17',
               '2000-07-11', '2000-08-04',
               ...
               '2010-05-20', '2010-06-13', '2010-07-07', '2010-07-31',
               '2010-08-24', '2010-09-17', '2010-10-11', '2010-11-04',
               '2010-11-28', '2010-12-22'],
              dtype='datetime64[ns]', length=168, freq='24D')

In [35]:
# Next, create a pd Series of those dates. Each one is a pandas TS
s = pd.Series(data = bunch_of_dates)
s.head()

0   2000-01-01
1   2000-01-25
2   2000-02-18
3   2000-03-13
4   2000-04-06
dtype: datetime64[ns]

In [37]:
# Want to extract the day. Can't just do s.day. Need to s.dt.day
s.dt.day  # returns a new series
s.dt.month  # can see which month is most common
s.dt.dayofweek  # Maybe what day of week stock performs best, or tours occur, etc.
s.dt.weekday_name

0       Saturday
1        Tuesday
2         Friday
3         Monday
4       Thursday
5         Sunday
6      Wednesday
7       Saturday
8        Tuesday
9         Friday
10        Monday
11      Thursday
12        Sunday
13     Wednesday
14      Saturday
15       Tuesday
16        Friday
17        Monday
18      Thursday
19        Sunday
20     Wednesday
21      Saturday
22       Tuesday
23        Friday
24        Monday
25      Thursday
26        Sunday
27     Wednesday
28      Saturday
29       Tuesday
         ...    
138       Sunday
139    Wednesday
140     Saturday
141      Tuesday
142       Friday
143       Monday
144     Thursday
145       Sunday
146    Wednesday
147     Saturday
148      Tuesday
149       Friday
150       Monday
151     Thursday
152       Sunday
153    Wednesday
154     Saturday
155      Tuesday
156       Friday
157       Monday
158     Thursday
159       Sunday
160    Wednesday
161     Saturday
162      Tuesday
163       Friday
164       Monday
165     Thursd

In [42]:
mask_quarter = s.dt.is_quarter_start
s[mask_quarter]

0     2000-01-01
19    2001-04-01
38    2002-07-01
137   2009-01-01
dtype: datetime64[ns]

In [45]:
mask_month = s.dt.is_month_start
s[mask_month]

0     2000-01-01
19    2001-04-01
38    2002-07-01
104   2006-11-01
109   2007-03-01
137   2009-01-01
142   2009-05-01
dtype: datetime64[ns]

## The pandas-datareader Library
Allows us to fetch dataset from the internet. Allows us to query an online datasource. Used to be part of pandas but it became so large that they split the development teams. Can pull from sources such as: FRED, World Bank, Morningstar, OECD Statistics, Eurostat, etc.

Install instructions: Open terminal > source activate root > conda install pandas-datareader

Documentation: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html

## Import financial dataset with pandas_datareader Library

In [58]:
import pandas as pd
import datetime as dt
import pandas_datareader.data as web  # we can use web.xxxx 
# import pandas_datareader as pdr

In [71]:
# params for DataReader: name = the company you want to pull info "Stock Symbol"
# data_source = Yahoo, Google, etc.; start/end = date range
company = "TCEHY" # BABA, MSFT
start = "2010-01-01"
end = "2018-12-31"  # Future dates are fine

# Returns a DTIndex
stocks_tcehy = web.DataReader(name = company, data_source = "morningstar", start = start, end = end)
stocks_tcehy.head()
stocks_tcehy.dtypes
type(stocks_tcehy)

pandas.core.frame.DataFrame

In [63]:
stocks_tcehy.values

array([[4.344000e+00, 4.366000e+00, 4.320000e+00, 4.320000e+00,
        0.000000e+00],
       [4.416000e+00, 4.416000e+00, 4.366000e+00, 4.366000e+00,
        6.148000e+04],
       [4.452000e+00, 4.480000e+00, 4.410000e+00, 4.474000e+00,
        7.569000e+04],
       ...,
       [5.147000e+01, 5.174000e+01, 5.106000e+01, 5.129000e+01,
        3.168302e+06],
       [5.139000e+01, 5.174000e+01, 5.111000e+01, 5.173000e+01,
        1.996679e+06],
       [5.051000e+01, 5.106000e+01, 5.036000e+01, 5.104000e+01,
        2.961017e+06]])

In [64]:
stocks_tcehy.columns

Index(['Close', 'High', 'Low', 'Open', 'Volume'], dtype='object')

In [68]:
stocks_tcehy.index

MultiIndex(levels=[['TCEHY'], [2010-01-01 00:00:00, 2010-01-04 00:00:00, 2010-01-05 00:00:00, 2010-01-06 00:00:00, 2010-01-07 00:00:00, 2010-01-08 00:00:00, 2010-01-11 00:00:00, 2010-01-12 00:00:00, 2010-01-13 00:00:00, 2010-01-14 00:00:00, 2010-01-15 00:00:00, 2010-01-18 00:00:00, 2010-01-19 00:00:00, 2010-01-20 00:00:00, 2010-01-21 00:00:00, 2010-01-22 00:00:00, 2010-01-25 00:00:00, 2010-01-26 00:00:00, 2010-01-27 00:00:00, 2010-01-28 00:00:00, 2010-01-29 00:00:00, 2010-02-01 00:00:00, 2010-02-02 00:00:00, 2010-02-03 00:00:00, 2010-02-04 00:00:00, 2010-02-05 00:00:00, 2010-02-08 00:00:00, 2010-02-09 00:00:00, 2010-02-10 00:00:00, 2010-02-11 00:00:00, 2010-02-12 00:00:00, 2010-02-15 00:00:00, 2010-02-16 00:00:00, 2010-02-17 00:00:00, 2010-02-18 00:00:00, 2010-02-19 00:00:00, 2010-02-22 00:00:00, 2010-02-23 00:00:00, 2010-02-24 00:00:00, 2010-02-25 00:00:00, 2010-02-26 00:00:00, 2010-03-01 00:00:00, 2010-03-02 00:00:00, 2010-03-03 00:00:00, 2010-03-04 00:00:00, 2010-03-05 00:00:00, 201

In [69]:
stocks_tcehy.axes

[MultiIndex(levels=[['TCEHY'], [2010-01-01 00:00:00, 2010-01-04 00:00:00, 2010-01-05 00:00:00, 2010-01-06 00:00:00, 2010-01-07 00:00:00, 2010-01-08 00:00:00, 2010-01-11 00:00:00, 2010-01-12 00:00:00, 2010-01-13 00:00:00, 2010-01-14 00:00:00, 2010-01-15 00:00:00, 2010-01-18 00:00:00, 2010-01-19 00:00:00, 2010-01-20 00:00:00, 2010-01-21 00:00:00, 2010-01-22 00:00:00, 2010-01-25 00:00:00, 2010-01-26 00:00:00, 2010-01-27 00:00:00, 2010-01-28 00:00:00, 2010-01-29 00:00:00, 2010-02-01 00:00:00, 2010-02-02 00:00:00, 2010-02-03 00:00:00, 2010-02-04 00:00:00, 2010-02-05 00:00:00, 2010-02-08 00:00:00, 2010-02-09 00:00:00, 2010-02-10 00:00:00, 2010-02-11 00:00:00, 2010-02-12 00:00:00, 2010-02-15 00:00:00, 2010-02-16 00:00:00, 2010-02-17 00:00:00, 2010-02-18 00:00:00, 2010-02-19 00:00:00, 2010-02-22 00:00:00, 2010-02-23 00:00:00, 2010-02-24 00:00:00, 2010-02-25 00:00:00, 2010-02-26 00:00:00, 2010-03-01 00:00:00, 2010-03-02 00:00:00, 2010-03-03 00:00:00, 2010-03-04 00:00:00, 2010-03-05 00:00:00, 20

## Selecting from a DataFrame with a DateTimeIndex

In [82]:
stocks_tcehy = web.DataReader(name = company, data_source = "morningstar", start = start, end = end)
stocks_tcehy.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Close,High,Low,Open,Volume
Symbol,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
TCEHY,2010-01-01,4.344,4.366,4.32,4.32,0
TCEHY,2010-01-04,4.416,4.416,4.366,4.366,61480
TCEHY,2010-01-05,4.452,4.48,4.41,4.474,75690
TCEHY,2010-01-06,4.482,4.482,4.47,4.48,56010
TCEHY,2010-01-07,4.3,4.3,4.254,4.298,49355


In [89]:
stocks_tcehy.iloc[3]

Close         4.482
High          4.482
Low           4.470
Open          4.480
Volume    56010.000
Name: (TCEHY, 2010-01-06 00:00:00), dtype: float64

In [90]:
stocks_tcehy.index.names

FrozenList(['Symbol', 'Date'])

In [91]:
type(stocks_tcehy.index)

pandas.core.indexes.multi.MultiIndex

In [97]:
stocks_tcehy.index.get_level_values(1)  # only called on MI objects

DatetimeIndex(['2010-01-01', '2010-01-04', '2010-01-05', '2010-01-06',
               '2010-01-07', '2010-01-08', '2010-01-11', '2010-01-12',
               '2010-01-13', '2010-01-14',
               ...
               '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12',
               '2018-04-13', '2018-04-16', '2018-04-17', '2018-04-18',
               '2018-04-19', '2018-04-20'],
              dtype='datetime64[ns]', name='Date', length=2166, freq=None)

In [111]:
# Using .loc[] with Symbol still part of the MI object
# Below won't work because I dropped Sybmol and inplace = True
# stocks_tcehy.loc[("TCEHY", "2010-01-06"), "High"]

### Had to remove the "Symbol" from the index

In [106]:
stocks_tcehy.get_values()
stocks_tcehy.reset_index(level = "Symbol", drop = True, inplace = True)

In [107]:
stocks_tcehy.head()

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-01,4.344,4.366,4.32,4.32,0
2010-01-04,4.416,4.416,4.366,4.366,61480
2010-01-05,4.452,4.48,4.41,4.474,75690
2010-01-06,4.482,4.482,4.47,4.48,56010
2010-01-07,4.3,4.3,4.254,4.298,49355


In [120]:
# Now with Symbol in the index I can use .loc
stocks_tcehy.loc["2010-01-06"] # returns a series
stocks_tcehy.iloc[5:300].max()
stocks_tcehy.iloc[:500].max()
stocks_tcehy.ix["2016-01-01"]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """


Close     19.62
High      19.62
Low       19.62
Open      19.62
Volume     0.00
Name: 2016-01-01 00:00:00, dtype: float64

In [145]:
stocks_tcehy.loc["2012-01-01" : "2012-12-31"]

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-01-02,4.0280,4.028,4.028,4.028,0
2012-01-03,4.1040,4.156,4.078,4.078,635555
2012-01-04,4.0860,4.120,4.016,4.016,641255
2012-01-05,4.0620,4.078,4.024,4.044,182805
2012-01-06,3.9900,4.032,3.958,3.958,166165
2012-01-09,4.0240,4.026,3.958,3.958,582665
2012-01-10,4.1700,4.188,4.142,4.142,181595
2012-01-11,4.3200,4.338,4.256,4.256,91285
2012-01-12,4.3400,4.358,4.306,4.358,326820
2012-01-13,4.3480,4.398,4.322,4.326,466600


In [124]:
# pd.DateOffset() allows for custom frequency
# Challenge: What was the stock price on my birthday year over year?
# First create a date range of my birthday ever year
birthdays = pd.date_range(start = "1981-05-29", end = "2018-12-31", 
             freq = pd.DateOffset(years = 1))

In [131]:
# Then need to see if stocks index are in my birthdays .isin()
mask = stocks_tcehy.index.isin(values = birthdays)  # retuns array

### Difference between using .loc[] versus just [] to filter
Simple [] extraction creates a separate DF. .loc[] is better to modify the values inplace -- Still not 100% clear

In [138]:
# My attempt
# type(stocks_tcehy[mask])
stocks_tcehy[mask]

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-05-29,5.536,5.574,5.51,5.512,140700
2013-05-29,7.786,7.846,7.754,7.77,164220
2014-05-29,14.25,14.3,14.1,14.255,191854
2015-05-29,19.96,20.15,19.93,19.995,248755
2017-05-29,35.565,35.565,35.565,35.565,0


In [140]:
# Teacher's method
# type(stocks_tcehy.loc[mask])
stocks_tcehy.loc[mask]

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-05-29,5.536,5.574,5.51,5.512,140700
2013-05-29,7.786,7.846,7.754,7.77,164220
2014-05-29,14.25,14.3,14.1,14.255,191854
2015-05-29,19.96,20.15,19.93,19.995,248755
2017-05-29,35.565,35.565,35.565,35.565,0


## Timestamp Object Attributes

In [175]:
# params for DataReader: name = the company you want to pull info "Stock Symbol"
# data_source = Yahoo, Google, etc.; start/end = date range
company = "TCEHY" # BABA, MSFT
start = "2010-01-01"
end = "2018-12-31"  # Future dates are fine

# Returns a DTIndex
stocks_tcehy = web.DataReader(name = company, data_source = "morningstar", start = start, end = end)
stocks_tcehy.reset_index(level = "Symbol", drop = True, inplace = True)
stocks_tcehy.head()

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-01,4.344,4.366,4.32,4.32,0
2010-01-04,4.416,4.416,4.366,4.366,61480
2010-01-05,4.452,4.48,4.41,4.474,75690
2010-01-06,4.482,4.482,4.47,4.48,56010
2010-01-07,4.3,4.3,4.254,4.298,49355


In [176]:
someday = stocks_tcehy.index[500]
someday # now a TS object

Timestamp('2011-12-02 00:00:00')

In [177]:
someday.day
someday.month
someday.year
someday.weekday_name
someday.is_month_end  # bool
someday.is_month_start # bool

False

In [181]:
# Let's use these attributes to modify original DF
# Add new column that stores weekday name of each value
# loc = column location; value = how it calcs the values that populate
# this new series (use .index.weekday_name attr). This will look at all 
# index TS values, get attribute (weekday_name), create a SERIES of these
# weekday names, then populate it through the DF.

stocks_tcehy.columns
stocks_tcehy.insert(loc = 0, column = "Day of Week", value = stocks_tcehy.index.weekday_name)

ValueError: cannot insert Day of Week, already exists

In [182]:
stocks_tcehy.head()

Unnamed: 0_level_0,Day of Week,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-01-01,Friday,4.344,4.366,4.32,4.32,0
2010-01-04,Monday,4.416,4.416,4.366,4.366,61480
2010-01-05,Tuesday,4.452,4.48,4.41,4.474,75690
2010-01-06,Wednesday,4.482,4.482,4.47,4.48,56010
2010-01-07,Thursday,4.3,4.3,4.254,4.298,49355


In [183]:
# let's add another column to the right of "Day of Week" (loc = 2).
# Let's call it start of month
stocks_tcehy.insert(loc = 1, column = "Is Start of Month", value = stocks_tcehy.index.is_month_start)


In [186]:
# Want to see the values for those that are on the start of a month
stocks_tcehy["Is Start of Month"]
stocks_tcehy[stocks_tcehy["Is Start of Month"]]

Unnamed: 0_level_0,Day of Week,Is Start of Month,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2010-01-01,Friday,True,4.3440,4.366,4.320,4.320,0
2010-02-01,Monday,True,3.6600,3.692,3.660,3.678,37825
2010-03-01,Monday,True,3.9720,4.000,3.800,3.800,27950
2010-04-01,Thursday,True,3.9860,4.006,3.800,3.800,56675
2010-06-01,Tuesday,True,3.8060,3.884,3.800,3.800,133505
2010-07-01,Thursday,True,3.3400,3.340,3.262,3.320,254029
2010-09-01,Wednesday,True,3.7200,3.800,3.652,3.800,50190
2010-10-01,Friday,True,4.3720,4.400,4.360,4.400,67890
2010-11-01,Monday,True,4.8100,4.840,4.780,4.794,536620
2010-12-01,Wednesday,True,4.5220,4.566,4.474,4.510,192640


## The .truncate() Method
Convenient method to pull/filter values from Series/DF. Used for slicing operations on objects in pandas with a datetimeIndex. Can be called on Series and DataFrames. It wants two values: before and after.

In [187]:
# params for DataReader: name = the company you want to pull info "Stock Symbol"
# data_source = Yahoo, Google, etc.; start/end = date range
company = "TCEHY" # BABA, MSFT
start = "2010-01-01"
end = "2018-12-31"  # Future dates are fine

# Returns a DTIndex
stocks_tcehy = web.DataReader(name = company, data_source = "morningstar", start = start, end = end)
stocks_tcehy.reset_index(level = "Symbol", drop = True, inplace = True)
# stocks_tcehy.sort_index()
stocks_tcehy.head()

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-01,4.344,4.366,4.32,4.32,0
2010-01-04,4.416,4.416,4.366,4.366,61480
2010-01-05,4.452,4.48,4.41,4.474,75690
2010-01-06,4.482,4.482,4.47,4.48,56010
2010-01-07,4.3,4.3,4.254,4.298,49355


In [189]:
stocks_tcehy.truncate(before = "2011-02-05", after = "2011-02-28")
stocks_tcehy.truncate(before = "2012-06-07", after = "2013-02-28")

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-06-07,5.7260,5.8140,5.6300,5.6300,935875
2012-06-08,5.7240,5.7340,5.6560,5.6560,148125
2012-06-11,5.8240,5.9410,5.8240,5.9200,182120
2012-06-12,5.8520,5.8780,5.7700,5.8780,104725
2012-06-13,5.7960,5.8380,5.7960,5.8200,166620
2012-06-14,5.8340,5.8612,5.7600,5.7600,209515
2012-06-15,5.9400,5.9700,5.8840,5.8840,61280
2012-06-18,6.0100,6.0800,5.9800,6.0800,113960
2012-06-19,6.1440,6.2020,6.0940,6.0940,244200
2012-06-20,6.1700,6.1700,6.1200,6.1200,215710


In [191]:
# Another way to pull the list
stocks_tcehy.loc["2012-06-07" : "2013-02-28"]

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-06-07,5.7260,5.8140,5.6300,5.6300,935875
2012-06-08,5.7240,5.7340,5.6560,5.6560,148125
2012-06-11,5.8240,5.9410,5.8240,5.9200,182120
2012-06-12,5.8520,5.8780,5.7700,5.8780,104725
2012-06-13,5.7960,5.8380,5.7960,5.8200,166620
2012-06-14,5.8340,5.8612,5.7600,5.7600,209515
2012-06-15,5.9400,5.9700,5.8840,5.8840,61280
2012-06-18,6.0100,6.0800,5.9800,6.0800,113960
2012-06-19,6.1440,6.2020,6.0940,6.0940,244200
2012-06-20,6.1700,6.1700,6.1200,6.1200,215710


## The pd.DateOffset Objects
Ways to modify existing times. So far we have retrieved dates but we haven't changed them - add days, subtract weeks, add years, etc. dt.datetime.now() will give you current time. You can change the timezone. Can pass days, months, years, etc. to method (pd.DateOffset(days = 5)).

In [206]:
stocks_goog = web.DataReader(name = "GOOG", data_source = "morningstar", 
              start = dt.date(2000, 1, 1), end = dt.datetime.now())
stocks_goog.reset_index(level = 0, inplace = True)

stocks_goog.head()

Unnamed: 0_level_0,Symbol,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-03-27,GOOG,556.9312,566.4451,551.4064,566.4451,13087
2014-03-28,GOOG,558.457,564.8794,557.1406,559.7136,41115
2014-03-31,GOOG,555.4453,565.4478,555.4054,565.3381,10801
2014-04-01,GOOG,565.6074,566.8939,557.1805,557.1805,7953
2014-04-02,GOOG,565.4478,603.1743,560.651,579.1702,147099


In [208]:
stocks_goog.index  # Have to make sure DF isn't MI, otherwise will return MI instead of DTI

DatetimeIndex(['2014-03-27', '2014-03-28', '2014-03-31', '2014-04-01',
               '2014-04-02', '2014-04-03', '2014-04-04', '2014-04-07',
               '2014-04-08', '2014-04-09',
               ...
               '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17',
               '2018-04-18', '2018-04-19', '2018-04-20', '2018-04-23',
               '2018-04-24', '2018-04-25'],
              dtype='datetime64[ns]', name='Date', length=1065, freq=None)

In [210]:
# Want to take current dates and add 5 days to each
stocks_goog.index + pd.DateOffset(days = 5)

DatetimeIndex(['2014-04-01', '2014-04-02', '2014-04-05', '2014-04-06',
               '2014-04-07', '2014-04-08', '2014-04-09', '2014-04-12',
               '2014-04-13', '2014-04-14',
               ...
               '2018-04-17', '2018-04-18', '2018-04-21', '2018-04-22',
               '2018-04-23', '2018-04-24', '2018-04-25', '2018-04-28',
               '2018-04-29', '2018-04-30'],
              dtype='datetime64[ns]', name='Date', length=1065, freq=None)

In [221]:
stocks_baba = web.DataReader(name = "BABA", data_source = "morningstar",
                            start = dt.date(2000, 1, 1), end = dt.datetime.now())
stocks_baba.reset_index(level = 0, inplace = True)
stocks_baba.sort_index(inplace = True)  # DataReader sorts index by default FYI

In [223]:
stocks_baba.head()

Unnamed: 0_level_0,Symbol,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-09-19,BABA,93.89,99.7,89.95,92.7,271879435
2014-09-22,BABA,89.89,92.95,89.5,92.7,66657827
2014-09-23,BABA,87.17,90.48,86.62,88.94,39009788
2014-09-24,BABA,90.57,90.57,87.22,88.47,32088108
2014-09-25,BABA,88.92,91.5,88.5,91.09,28597506


In [235]:
# Want to add or subtract weeks or years. Doesn't have inplace.
stocks_baba.index + pd.DateOffset(weeks = 2)
stocks_baba.index - pd.DateOffset(years = 1)

# Can even add time
stocks_baba.index + pd.DateOffset(hours = 6)

# Can do multiple params
stocks_baba.index - pd.DateOffset(years = 1, months = 3, days = 10, hours = 3, minutes = 30)

DatetimeIndex(['2013-06-08 20:30:00', '2013-06-11 20:30:00',
               '2013-06-12 20:30:00', '2013-06-13 20:30:00',
               '2013-06-14 20:30:00', '2013-06-15 20:30:00',
               '2013-06-18 20:30:00', '2013-06-19 20:30:00',
               '2013-06-20 20:30:00', '2013-06-21 20:30:00',
               ...
               '2017-01-05 20:30:00', '2017-01-06 20:30:00',
               '2017-01-07 20:30:00', '2017-01-08 20:30:00',
               '2017-01-09 20:30:00', '2017-01-12 20:30:00',
               '2017-01-13 20:30:00', '2017-01-14 20:30:00',
               '2017-01-15 20:30:00', '2017-01-16 20:30:00'],
              dtype='datetime64[ns]', name='Date', length=941, freq=None)

## More fun with pd.DataOffset Objects
What if we wanted to round dates to closest end/start of month/quarter?
Need to use a new list of DateOffsets. Hidden in custom module in Pandas

pd.tseries.offsets.MonthEnd(), .MonthBegin, etc.

The + = Look at date, then find the next monthend
The - = Look at date, then find the last monthend. Rounds to month end
Ex. 2000-01-03 will round to 2000-01-31
If original date is already on monthend, then will go to next month

### However, it's recommended to add another import statement to make syntax easier
from pandas.tseries.offsets import *

In [254]:
import pandas as pd
import datetime as dt
import pandas_datareader.data as web
from pandas.tseries.offsets import *  # Now can call MonthEnd(), MonthBegin() directly

In [241]:
stocks_amzn = web.DataReader(name = "AMZN", data_source = "morningstar",
               start = dt.datetime(2000, 1, 1), end = dt.datetime.now())
stocks_amzn.reset_index(level = 0, drop = False, inplace = True)

stocks_amzn.head()

Unnamed: 0_level_0,Symbol,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-03,AMZN,89.375,89.5625,79.5,89.375,16120899
2000-01-04,AMZN,82.375,91.5,81.75,87.4688,17467200
2000-01-05,AMZN,69.75,75.125,68.0,70.5,38451400
2000-01-06,AMZN,63.0,72.6875,63.0,71.125,18724400
2000-01-07,AMZN,69.6875,70.5,66.1875,69.0,10425700


In [245]:
stocks_amzn.info()
stocks_amzn.describe
stocks_amzn.describe()  # For integer/float values (mean, std, min, max, etc.)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4782 entries, 2000-01-03 to 2018-05-01
Data columns (total 6 columns):
Symbol    4782 non-null object
Close     4782 non-null float64
High      4782 non-null float64
Low       4782 non-null float64
Open      4782 non-null float64
Volume    4782 non-null int64
dtypes: float64(4), int64(1), object(1)
memory usage: 261.5+ KB


Unnamed: 0,Close,High,Low,Open,Volume
count,4782.0,4782.0,4782.0,4782.0,4782.0
mean,232.57462,235.056904,229.814597,232.574402,6440737.0
std,306.388277,308.9832,303.460317,306.583519,5352246.0
min,5.97,6.1,5.51,5.95,0.0
25%,38.9425,39.6125,38.28,38.9625,3549868.0
50%,83.1525,84.5,81.5,83.0606,5417458.0
75%,299.4975,303.1125,296.14,299.7525,7833632.0
max,1598.39,1638.1,1590.89,1634.01,104404600.0


In [249]:
# What if we wanted to round dates to closest end/start of month/quarter?
# Need to use a new list of DateOffsets. Hidden in custom module in Pandas

# pd.tseries.offsets.MonthEnd()
# + = Look at date, then find the next monthend
# - = Look at date, then find the last monthend. Rounds to month end
# Ex. 2000-01-03 will round to 2000-01-31
# If original date is already on monthend, then will go to next month
stocks_amzn.index + pd.tseries.offsets.MonthEnd()

DatetimeIndex(['2000-01-31', '2000-01-31', '2000-01-31', '2000-01-31',
               '2000-01-31', '2000-01-31', '2000-01-31', '2000-01-31',
               '2000-01-31', '2000-01-31',
               ...
               '2018-04-30', '2018-04-30', '2018-04-30', '2018-04-30',
               '2018-04-30', '2018-04-30', '2018-04-30', '2018-04-30',
               '2018-05-31', '2018-05-31'],
              dtype='datetime64[ns]', name='Date', length=4782, freq=None)

In [250]:
stocks_amzn.index + pd.tseries.offsets.MonthBegin()  # + = Next beginning of month

DatetimeIndex(['2000-02-01', '2000-02-01', '2000-02-01', '2000-02-01',
               '2000-02-01', '2000-02-01', '2000-02-01', '2000-02-01',
               '2000-02-01', '2000-02-01',
               ...
               '2018-05-01', '2018-05-01', '2018-05-01', '2018-05-01',
               '2018-05-01', '2018-05-01', '2018-05-01', '2018-05-01',
               '2018-05-01', '2018-06-01'],
              dtype='datetime64[ns]', name='Date', length=4782, freq=None)

In [251]:
stocks_amzn.index - pd.tseries.offsets.MonthBegin() # previous beginning of month

DatetimeIndex(['2000-01-01', '2000-01-01', '2000-01-01', '2000-01-01',
               '2000-01-01', '2000-01-01', '2000-01-01', '2000-01-01',
               '2000-01-01', '2000-01-01',
               ...
               '2018-04-01', '2018-04-01', '2018-04-01', '2018-04-01',
               '2018-04-01', '2018-04-01', '2018-04-01', '2018-04-01',
               '2018-04-01', '2018-04-01'],
              dtype='datetime64[ns]', name='Date', length=4782, freq=None)

In [252]:
stocks_amzn.index

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14',
               ...
               '2018-04-18', '2018-04-19', '2018-04-20', '2018-04-23',
               '2018-04-24', '2018-04-25', '2018-04-26', '2018-04-27',
               '2018-04-30', '2018-05-01'],
              dtype='datetime64[ns]', name='Date', length=4782, freq=None)

In [261]:
# Have now imported:  from pandas.tseries.offsets import *
# can now call these methods directly without long syntax
# Non-vectorized warning means that it isn't using the built-in pandas processes to speed it up. Ignore it.

stocks_amzn.index - MonthEnd()
stocks_amzn.index + BMonthBegin()  # business month start, end
stocks_amzn.index + QuarterEnd()  # next (or last -) available end of each quarter
stocks_amzn.index + QuarterBegin() # first available day of next starting quarter
stocks_amzn.index - BQuarterEnd()



DatetimeIndex(['1999-12-31', '1999-12-31', '1999-12-31', '1999-12-31',
               '1999-12-31', '1999-12-31', '1999-12-31', '1999-12-31',
               '1999-12-31', '1999-12-31',
               ...
               '2018-03-30', '2018-03-30', '2018-03-30', '2018-03-30',
               '2018-03-30', '2018-03-30', '2018-03-30', '2018-03-30',
               '2018-03-30', '2018-03-30'],
              dtype='datetime64[ns]', name='Date', length=4782, freq=None)

In [266]:
stocks_amzn.index + YearEnd()
stocks_amzn.index - YearBegin() # rounds to previous/last beginning of year
stocks_amzn.index - BYearEnd() # rounds to previous/last business year end



DatetimeIndex(['1999-12-31', '1999-12-31', '1999-12-31', '1999-12-31',
               '1999-12-31', '1999-12-31', '1999-12-31', '1999-12-31',
               '1999-12-31', '1999-12-31',
               ...
               '2017-12-29', '2017-12-29', '2017-12-29', '2017-12-29',
               '2017-12-29', '2017-12-29', '2017-12-29', '2017-12-29',
               '2017-12-29', '2017-12-29'],
              dtype='datetime64[ns]', name='Date', length=4782, freq=None)

## The Timedelta Object
A timestamp is something that can be marked on a calendar. The Timedelta is a duration, or a timespan, a difference between times, etc. Doesn't have an associated year or date with it. Rather it represents a distance in a timespan. Delta is mathematical expression that equals "change over time."

Two ways to create a Timedelta: Subtract two timestamps and get distance between them.

In [272]:
timeA = pd.Timestamp("2016-03-31 04:35:16 PM")
timeB = pd.Timestamp("2016-03-20 02:16:29 AM")

In [276]:
timeB - timeA  # represented negatively. So it's flexible.

Timedelta('-12 days +09:41:13')

In [273]:
timeA - timeB  # returns a Timedelta object. It's a duration.

Timedelta('11 days 14:18:47')

In [274]:
type(timeA)

pandas._libs.tslib.Timestamp

In [279]:
# Can't provide a years param. Just use days. Different from dateoffset
pd.Timedelta(days = 3, minutes = 45, hours = 12, weeks = 8)

Timedelta('59 days 12:45:00')

In [282]:
pd.Timedelta("5 minutes")
pd.Timedelta("6 hours 12 minutes")
pd.Timedelta("14 days 6 hours 12 minutes")

Timedelta('14 days 06:12:00')

## Timedeltas in a Dataset

In [286]:
shipping = pd.read_csv("ecommerce.csv", index_col = "ID", 
            parse_dates = ["order_date", "delivery_date"])
shipping.head()

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26
5,1992-07-21,1997-11-20
7,1993-09-02,1998-06-10


In [291]:
# Let's find the duration between these two dates
# Add the new returned series of TDs as a new column in DF
shipping['delivery_date'] - shipping['order_date']  # returns a series of TDs
shipping["delivery_time"] = shipping['delivery_date'] - shipping['order_date']

In [293]:
shipping.head()  # Can add different datatypes together (TS and TD)

Unnamed: 0_level_0,order_date,delivery_date,delivery_time
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,257 days
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days
5,1992-07-21,1997-11-20,1948 days
7,1993-09-02,1998-06-10,1742 days


In [299]:
shipping["twice_as_long"] = shipping['delivery_date'] + shipping["delivery_time"] #Returns a datetime series

In [300]:
shipping.head()

Unnamed: 0_level_0,order_date,delivery_date,delivery_time,twice_as_long
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1998-05-24,1999-02-05,257 days,1999-10-20
2,1992-04-22,1998-03-06,2144 days,2004-01-18
4,1991-02-10,1992-08-26,563 days,1994-03-12
5,1992-07-21,1997-11-20,1948 days,2003-03-22
7,1993-09-02,1998-06-10,1742 days,2003-03-18


In [301]:
shipping.dtypes

order_date        datetime64[ns]
delivery_date     datetime64[ns]
delivery_time    timedelta64[ns]
twice_as_long     datetime64[ns]
dtype: object

In [306]:
mask = shipping["delivery_time"] > "3000 days"   # returns Bool which can filter
shipping[mask]

Unnamed: 0_level_0,order_date,delivery_date,delivery_time,twice_as_long
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
32,1990-01-20,1998-07-24,3107 days,2007-01-25
130,1990-04-02,1999-08-16,3423 days,2008-12-29
151,1991-01-29,1999-08-05,3110 days,2008-02-09
229,1990-04-13,1998-11-17,3140 days,2007-06-23
314,1990-03-07,1999-12-25,3580 days,2009-10-13
331,1990-09-18,1999-12-19,3379 days,2009-03-20
348,1990-02-27,1999-01-04,3233 days,2007-11-11
392,1990-12-24,1999-12-04,3267 days,2008-11-13
590,1990-03-25,1998-12-20,3192 days,2007-09-16
634,1991-04-04,1999-07-21,3030 days,2007-11-06


In [308]:
shipping[mask].max()  # max of each column

order_date       1991-07-03 00:00:00
delivery_date    1999-12-25 00:00:00
delivery_time     3583 days 00:00:00
twice_as_long    2009-10-13 00:00:00
dtype: object

In [310]:
shipping["delivery_time"].max()
shipping['delivery_time'].min()

Timedelta('8 days 00:00:00')

In [316]:
now = dt.datetime.now()
now

datetime.datetime(2018, 5, 8, 6, 50, 34, 378168)

In [323]:
now - shipping['order_date'][1]

Timedelta('7289 days 06:50:34.378168')

In [326]:
now - shipping['order_date'].head()

ID
1   7289 days 06:50:34.378168
2   9512 days 06:50:34.378168
4   9949 days 06:50:34.378168
5   9422 days 06:50:34.378168
7   9014 days 06:50:34.378168
Name: order_date, dtype: timedelta64[ns]