# TimeSeries

## What is TimeSeries data ?

Time series data is an important form of structured data in many different fields, such as **finance, economics, ecology, neuroscience, and physics.** 

Time series can also be irregular without a fixed unit of time or offset between units.

    - Timestamps, specific instants in time
    - Fixed periods, such as the month January 2007 or the full year 2010
    - Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals

# Datetime



In [2]:
from datetime import datetime

now = datetime.now()

now

datetime.datetime(2022, 8, 14, 7, 31, 32, 510514)

In [3]:
now.year, now.month, now.day

(2022, 8, 14)

In [5]:
now = datetime.now()
format = "%A, %d %b %Y %H:%M:%S %Z"
now.strftime(format)

'Sunday, 14 Aug 2022 07:36:15 '

In [7]:
registration_date = datetime(2020, 8, 15)
format = "RSVP by %A, %d-%b-%Y"
registration_date.strftime(format)

'RSVP by Saturday, 15-Aug-2020'

In [8]:
submission_date = datetime.fromtimestamp(1595485143)
format = "Submit before %A, %#I%p %d/%m/%Y"
# check notes below
submission_date.strftime(format)
# Submit before Thursday, 2PM 23/07/2020

'Submit before Thursday, 12PM 23/07/2020'

Commonly used classes in the datetime module are:

1. date Class
2. time Class
3. datetime Class
4. timedelta Class

## datetime.date

In [9]:
## Date object to represent a date
## method 1
import datetime
d = datetime.date(2019, 4, 13)
print(d)


## method 2
from datetime import date
a = date(2019, 4, 13)
print(a)

2019-04-13
2019-04-13


In [5]:
## Get current date from "date" module
from datetime import date

today = date.today() 

print(today)
print("Current year:", today.year)
print("Current month:", today.month)
print("Current day:", today.day)

2022-08-13
Current year: 2022
Current month: 8
Current day: 13


In [1]:
## Get the date from timestamp
from datetime import date

timestamp = date.fromtimestamp(1326244364)
print("Date =", timestamp)

Date = 2012-01-11


## datetime.time

A time object instantiated from the time class represents the local time.

In [11]:
from datetime import time

# time(hour = 0, minute = 0, second = 0)
a = time()
print("a =", a)

# time(hour, minute and second)
b = time(11, 34, 56)
print("b =", b)

# time(hour, minute and second)
c = time(hour = 11, minute = 34, second = 56)
print("c =", c)

# time(hour, minute, second, microsecond)
d = time(11, 34, 56, 234566)
print("d =", d)

a = 00:00:00
b = 11:34:56
c = 11:34:56
d = 11:34:56.234566


In [8]:
## microsecond : its default value 0
from datetime import time

a = time(11, 34, 56)

print("hour =", a.hour)
print("minute =", a.minute)
print("second =", a.second)
print("microsecond =", a.microsecond)

hour = 11
minute = 34
second = 56
microsecond = 0


## datetime.datetime

The datetime module has a class named dateclass that can contain information from both **date and time objects.**




In [13]:
from datetime import datetime

#datetime(year, month, day)
a = datetime(2018, 11, 28)
print(a)

# datetime(year, month, day, hour, minute, second, microsecond)
b = datetime(2017, 11, 28, 23, 55, 59, 342380)
print(b)

2018-11-28 00:00:00
2017-11-28 23:55:59.342380


## datetime.timedelta

**timedelta represents the temporal difference between two datetime objects:**

In [11]:
## method 1 : without timedelta
from datetime import datetime, date

t1 = date(year = 2018, month = 7, day = 12)
t2 = date(year = 2017, month = 12, day = 23)
t3 = t1 - t2
print("t3 =", t3)

t4 = datetime(year = 2018, month = 7, day = 12, hour = 7, minute = 9, second = 33)
t5 = datetime(year = 2019, month = 6, day = 10, hour = 5, minute = 55, second = 13)
t6 = t4 - t5
print("t6 =", t6)

print("type of t3 =", type(t3)) 
print("type of t6 =", type(t6))  

t3 = 201 days, 0:00:00
t6 = -333 days, 1:14:20
type of t3 = <class 'datetime.timedelta'>
type of t6 = <class 'datetime.timedelta'>


In [12]:
## method 2 : without timedelta
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
print(delta.days)
print(delta.seconds)

926
56700



**datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)**

You can add (or subtract) a timedelta or multiple thereof to a datetime object to yield a new shifted object:

In [18]:
timedelta(12)

datetime.timedelta(days=12)

In [14]:
from datetime import timedelta
start = datetime(2011, 1, 7)

In [8]:
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [11]:
start - 2 * timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

In [17]:
## Printing negative timedelta object

from datetime import timedelta

t1 = timedelta(seconds = 33)
t2 = timedelta(seconds = 54)
t3 = t1 - t2

print("t3 =", t3)
print("t3 =", abs(t3))

t3 = -1 day, 23:59:39
t3 = 0:00:21


## Parsing

### Parse from datetime to string

In [21]:
from datetime import datetime
stamp = datetime(2011, 1, 3)

print(str(stamp))

print(stamp.strftime('%Y-%m-%d'))

2011-01-03 00:00:00
2011-01-03


In [22]:
value = '2011-01-03'

datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [23]:
datestrs = ['7/6/2011', '8/6/2011']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

### Parse from string to date

In [25]:
from dateutil.parser import parse
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [26]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In international locales, day appearing before month is very common, so you can pass
**dayfirst=True** to indicate this:

In [27]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

## python timezone (pytz)

In [15]:
from pytz import common_timezones, all_timezones

print("Number of Common Timezones : ", len(common_timezones))
print("Number of All Timezones : ", len(all_timezones))
print("Difference between all timezones and common timezones : ", list(set(all_timezones) - set(common_timezones)))

Number of Common Timezones :  439
Number of All Timezones :  594
Difference between all timezones and common timezones :  ['Australia/Canberra', 'GB-Eire', 'Asia/Thimbu', 'Etc/UTC', 'Pacific/Samoa', 'Asia/Dacca', 'PST8PDT', 'Etc/GMT+12', 'Etc/GMT-3', 'Japan', 'Jamaica', 'Etc/GMT0', 'Hongkong', 'America/Porto_Acre', 'America/Montreal', 'Asia/Chungking', 'EST5EDT', 'America/Louisville', 'GMT+0', 'Australia/Queensland', 'Australia/NSW', 'Portugal', 'Africa/Asmera', 'NZ', 'Asia/Katmandu', 'NZ-CHAT', 'Greenwich', 'Singapore', 'Canada/Saskatchewan', 'Etc/Universal', 'US/Aleutian', 'Asia/Harbin', 'Brazil/DeNoronha', 'Europe/Nicosia', 'Asia/Saigon', 'America/Godthab', 'Etc/GMT-14', 'Asia/Rangoon', 'MST7MDT', 'Australia/Tasmania', 'Etc/GMT+8', 'Etc/UCT', 'CET', 'Antarctica/South_Pole', 'Eire', 'America/Indianapolis', 'Etc/Zulu', 'Canada/Yukon', 'Cuba', 'Asia/Calcutta', 'America/Atka', 'WET', 'Mexico/BajaNorte', 'Brazil/West', 'Asia/Ujung_Pandang', 'Zulu', 'America/Ensenada', 'Etc/GMT+5', 'Austr

In [36]:
import pytz

format = "Timezone: %H:%M:%S %Z"

now = datetime.now()
print(now.strftime(format))

now = datetime.now(pytz.utc)
print(now.strftime(format))
# Timezone 07:24:29 UTC


Timezone: 22:57:50 
Timezone: 14:57:50 UTC


In [16]:
import datetime
import pytz

d = datetime.datetime.now()
timezone = pytz.timezone("Asia/Singapore")
d_aware = timezone.localize(d)
print(d_aware)
print(d_aware.tzinfo)

2022-08-14 07:50:10.640654+08:00
Asia/Singapore


## Pandas datetime library

### Pandas to_datetime

In [None]:
import pandas as pd
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)

In [38]:
## It also handles values that should be considered missing ( None , empty string, etc.):
idx = pd.to_datetime(datestrs + [None])
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [39]:
idx[2]

NaT

### Pandas date_range()

pandas.date_range() is one of the general functions in Pandas which is used to return a fixed frequency DatetimeIndex.

Syntax: pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)

In [42]:
# importing pandas as pd
import pandas as pd
  
per1 = pd.date_range(start ='1-1-2018', 
         end ='1-05-2018', freq ='5H')
  
for val in per1:
    print(val)

2018-01-01 00:00:00
2018-01-01 05:00:00
2018-01-01 10:00:00
2018-01-01 15:00:00
2018-01-01 20:00:00
2018-01-02 01:00:00
2018-01-02 06:00:00
2018-01-02 11:00:00
2018-01-02 16:00:00
2018-01-02 21:00:00
2018-01-03 02:00:00
2018-01-03 07:00:00
2018-01-03 12:00:00
2018-01-03 17:00:00
2018-01-03 22:00:00
2018-01-04 03:00:00
2018-01-04 08:00:00
2018-01-04 13:00:00
2018-01-04 18:00:00
2018-01-04 23:00:00


In [43]:
# importing pandas as pd
import pandas as pd
  
# Specify start and periods, the number of periods (days).
dRan1 = pd.date_range(start ='1-1-2018', periods = 13)
  
# Specify end and periods, the number of periods (days).
dRan2 = pd.date_range(end ='1-1-2018', periods = 13)
  
# Specify start, end, and periods; the frequency 
# is generated automatically (linearly spaced).
dRan3 = pd.date_range(start ='01-03-2017', 
            end ='1-1-2018', periods = 13)
  
print(dRan1, "\n\n", dRan2, '\n\n', dRan3)

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
               '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12',
               '2018-01-13'],
              dtype='datetime64[ns]', freq='D') 

 DatetimeIndex(['2017-12-20', '2017-12-21', '2017-12-22', '2017-12-23',
               '2017-12-24', '2017-12-25', '2017-12-26', '2017-12-27',
               '2017-12-28', '2017-12-29', '2017-12-30', '2017-12-31',
               '2018-01-01'],
              dtype='datetime64[ns]', freq='D') 

 DatetimeIndex(['2017-01-03 00:00:00', '2017-02-02 06:00:00',
               '2017-03-04 12:00:00', '2017-04-03 18:00:00',
               '2017-05-04 00:00:00', '2017-06-03 06:00:00',
               '2017-07-03 12:00:00', '2017-08-02 18:00:00',
               '2017-09-02 00:00:00', '2017-10-02 06:00:00',
               '2017-11-01 12:00:00', '2017-12-01 18:00:00',
               '2018-01-01 00:00:00'],
       

In [53]:
# importing pandas as pd
import pandas as pd
  
# Specify start and periods, the number of periods (days).
dRan1 = pd.date_range(start ='1-1-2018', 
       periods = 13, tz=pytz.timezone('Asia/Singapore'))
  
dRan1

DatetimeIndex(['2018-01-01 00:00:00+08:00', '2018-01-02 00:00:00+08:00',
               '2018-01-03 00:00:00+08:00', '2018-01-04 00:00:00+08:00',
               '2018-01-05 00:00:00+08:00', '2018-01-06 00:00:00+08:00',
               '2018-01-07 00:00:00+08:00', '2018-01-08 00:00:00+08:00',
               '2018-01-09 00:00:00+08:00', '2018-01-10 00:00:00+08:00',
               '2018-01-11 00:00:00+08:00', '2018-01-12 00:00:00+08:00',
               '2018-01-13 00:00:00+08:00'],
              dtype='datetime64[ns, Asia/Singapore]', freq='D')

### Pandas period_range() 

pandas.period_range() is one of the general functions in Pandas which is used to return a fixed frequency PeriodIndex, with day (calendar) as the default frequency.

In [54]:
# importing pandas as pd
import pandas as pd

# period_range with freq = day
per1 = pd.period_range(start ='2018-12-20', end ='2019-01-01', freq ='D')

# period_range with freq = month
per2 = pd.period_range(start ='2018-12-20',end ='2019-12-01', freq ='M')

print(per1, "\n\n", per2)

PeriodIndex(['2018-12-20', '2018-12-21', '2018-12-22', '2018-12-23',
             '2018-12-24', '2018-12-25', '2018-12-26', '2018-12-27',
             '2018-12-28', '2018-12-29', '2018-12-30', '2018-12-31',
             '2019-01-01'],
            dtype='period[D]', freq='D') 

 PeriodIndex(['2018-12', '2019-01', '2019-02', '2019-03', '2019-04', '2019-05',
             '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11',
             '2019-12'],
            dtype='period[M]', freq='M')


### TimeDelta Ranges (pd.timedelta_range)

Pandas provides function named **timedelta_range()** just like **date_range()** and **period_range()** to create range of time deltas.

In [55]:
pd.timedelta_range(start="1 day", periods=10)

TimedeltaIndex([ '1 days',  '2 days',  '3 days',  '4 days',  '5 days',
                 '6 days',  '7 days',  '8 days',  '9 days', '10 days'],
               dtype='timedelta64[ns]', freq='D')

In [56]:
pd.timedelta_range(start="1 day", periods=10, freq="30D")

TimedeltaIndex([  '1 days',  '31 days',  '61 days',  '91 days', '121 days',
                '151 days', '181 days', '211 days', '241 days', '271 days'],
               dtype='timedelta64[ns]', freq='30D')

In [57]:
pd.timedelta_range(start="1 day", periods=10, freq="10H")

TimedeltaIndex(['1 days 00:00:00', '1 days 10:00:00', '1 days 20:00:00',
                '2 days 06:00:00', '2 days 16:00:00', '3 days 02:00:00',
                '3 days 12:00:00', '3 days 22:00:00', '4 days 08:00:00',
                '4 days 18:00:00'],
               dtype='timedelta64[ns]', freq='10H')

In [58]:
pd.timedelta_range(start="1 day", end="2 day", freq="4H")

TimedeltaIndex(['1 days 00:00:00', '1 days 04:00:00', '1 days 08:00:00',
                '1 days 12:00:00', '1 days 16:00:00', '1 days 20:00:00',
                '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='4H')

In [59]:
pd.timedelta_range(start="1 hour", end="2 hour", freq="10min")

TimedeltaIndex(['0 days 01:00:00', '0 days 01:10:00', '0 days 01:20:00',
                '0 days 01:30:00', '0 days 01:40:00', '0 days 01:50:00',
                '0 days 02:00:00'],
               dtype='timedelta64[ns]', freq='10T')

In [60]:
pd.timedelta_range(start="1 min", end="5 min", freq="T")

TimedeltaIndex(['0 days 00:01:00', '0 days 00:02:00', '0 days 00:03:00',
                '0 days 00:04:00', '0 days 00:05:00'],
               dtype='timedelta64[ns]', freq='T')

In [61]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("1 days")

TimedeltaIndex([ '2 days',  '3 days',  '4 days',  '5 days',  '6 days',
                 '7 days',  '8 days',  '9 days', '10 days', '11 days'],
               dtype='timedelta64[ns]', freq='D')

We can move time delta ranges by adding or subtracting time delta from it. 

In [66]:
pd.timedelta_range(start="1 day", periods=10) - pd.Timedelta("1 days")

TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days', '5 days',
                '6 days', '7 days', '8 days', '9 days'],
               dtype='timedelta64[ns]', freq='D')

In [63]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2 days")

TimedeltaIndex([ '3 days',  '4 days',  '5 days',  '6 days',  '7 days',
                 '8 days',  '9 days', '10 days', '11 days', '12 days'],
               dtype='timedelta64[ns]', freq='D')

In [64]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H")

TimedeltaIndex([ '3 days 05:00:00',  '4 days 05:00:00',  '5 days 05:00:00',
                 '6 days 05:00:00',  '7 days 05:00:00',  '8 days 05:00:00',
                 '9 days 05:00:00', '10 days 05:00:00', '11 days 05:00:00',
                '12 days 05:00:00'],
               dtype='timedelta64[ns]', freq='D')

In [19]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H30min")

TimedeltaIndex([ '3 days 05:30:00',  '4 days 05:30:00',  '5 days 05:30:00',
                 '6 days 05:30:00',  '7 days 05:30:00',  '8 days 05:30:00',
                 '9 days 05:30:00', '10 days 05:30:00', '11 days 05:30:00',
                '12 days 05:30:00'],
               dtype='timedelta64[ns]', freq='D')

### Pandas TimeZones

Pandas let us specify timezone when creating date ranges, timestamps, etc. 


In [69]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
print(rng.tz)

None


In [70]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="M", tz="US/Eastern")
print(rng.tz)

US/Eastern


In [71]:
ts = pd.Timestamp("1-1-2020")
print(ts)
print(ts.tz)

2020-01-01 00:00:00
None


In [72]:
ts = pd.Timestamp("1-1-2020", tz="Asia/Calcutta")
print(ts)
print(ts.tz)

2020-01-01 00:00:00+05:30
Asia/Calcutta


### tz_localize()

Pandas provides method tz_localize() to set a timezone for date ranges, timestamp which does not have any timezone set previously. It returns modified date range, timestamp with timestamp passed to tz_localize() as set.


In [27]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31'],
              dtype='datetime64[ns]', freq='M')
Timezone :  None


In [28]:
rng = rng.tz_localize("US/Eastern")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-31 00:00:00-05:00', '2020-02-29 00:00:00-05:00',
               '2020-03-31 00:00:00-04:00', '2020-04-30 00:00:00-04:00',
               '2020-05-31 00:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq=None)
Timezone :  US/Eastern


In [29]:
ts = pd.Timestamp("1-1-2020")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 00:00:00
Timezone :  None


In [30]:
ts = ts.tz_localize("US/Central")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 00:00:00-06:00
Timezone :  US/Central


In [74]:
import datetime
import pytz

my_birthday = datetime.datetime(1985, 10, 20, 17, 55)
brothers_birthday = datetime.datetime(1992, 6, 25, 18, 30)

In [75]:
indy = pytz.timezone("America/Indianapolis")
my_birthday = indy.localize(my_birthday)
brothers_birthday = indy.localize(brothers_birthday)

In [76]:
brothers_birthday

datetime.datetime(1992, 6, 25, 18, 30, tzinfo=<DstTzInfo 'America/Indianapolis' EST-1 day, 19:00:00 STD>)

### tz_convert()

We can convert date ranges and timestamp from one timezone to another timezone using tz_convert() method.

In [77]:
ts = pd.Timestamp("1-1-2020", tz="US/Central")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 00:00:00-06:00
Timezone :  US/Central


In [78]:
ts = ts.tz_convert("US/Eastern")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 01:00:00-05:00
Timezone :  US/Eastern


In [79]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="D", tz="US/Eastern")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-01 00:00:00-05:00', '2020-01-02 00:00:00-05:00',
               '2020-01-03 00:00:00-05:00', '2020-01-04 00:00:00-05:00',
               '2020-01-05 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')
Timezone :  US/Eastern


In [80]:
rng = rng.tz_convert("US/Central")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2019-12-31 23:00:00-06:00', '2020-01-01 23:00:00-06:00',
               '2020-01-02 23:00:00-06:00', '2020-01-03 23:00:00-06:00',
               '2020-01-04 23:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq='D')
Timezone :  US/Central


In [81]:
rng = rng.tz_convert("Asia/Calcutta")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-01 10:30:00+05:30', '2020-01-02 10:30:00+05:30',
               '2020-01-03 10:30:00+05:30', '2020-01-04 10:30:00+05:30',
               '2020-01-05 10:30:00+05:30'],
              dtype='datetime64[ns, Asia/Calcutta]', freq='D')
Timezone :  Asia/Calcutta


## Time Series Basics

A basic kind of time series object in pandas is a Series indexed by timestamps, which
is often represented external to pandas as Python strings or datetime objects:

In [82]:
import pandas as pd
import numpy as np
from datetime import datetime

dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
datetime(2011, 1, 7), datetime(2011, 1, 8),
datetime(2011, 1, 10), datetime(2011, 1, 12)]

ts = pd.Series(np.random.randn(6), index=dates)

ts

2011-01-02    0.487512
2011-01-05    1.070614
2011-01-07   -2.051296
2011-01-08   -3.027782
2011-01-10   -0.398243
2011-01-12    0.291199
dtype: float64

In [86]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

Like other Series, arithmetic operations between differently indexed time series auto‐
matically align on the dates:

In [87]:
ts + ts[::2]

2011-01-02    0.975024
2011-01-05         NaN
2011-01-07   -4.102593
2011-01-08         NaN
2011-01-10   -0.796486
2011-01-12         NaN
dtype: float64

Recall that ts[::2] selects every second element in ts .
pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond
resolution:

In [88]:
ts.index.dtype

dtype('<M8[ns]')

Scalar values from a DatetimeIndex are pandas Timestamp objects:

In [89]:
stamp = ts.index[0]

stamp

Timestamp('2011-01-02 00:00:00')

## Indexing, Selection, Subsetting

In [90]:
stamp = ts.index[2]

ts[stamp]

-2.0512964015566757

In [93]:
ts['1/10/2011']

-0.39824305725472137

In [94]:
ts['20110110']

-0.39824305725472137

For longer time series, a year or only a year and month can be passed to easily select
slices of data:

In [95]:
longer_ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000))

longer_ts

2000-01-01   -0.362929
2000-01-02   -0.078025
2000-01-03    1.374693
2000-01-04   -0.379321
2000-01-05   -0.404504
                ...   
2002-09-22    0.527831
2002-09-23    1.350248
2002-09-24   -0.570526
2002-09-25   -0.575170
2002-09-26   -0.367157
Freq: D, Length: 1000, dtype: float64

In [96]:
longer_ts['2001']

2001-01-01   -1.440389
2001-01-02    0.878728
2001-01-03    0.068348
2001-01-04    0.384738
2001-01-05   -1.354334
                ...   
2001-12-27   -0.629507
2001-12-28   -0.468092
2001-12-29   -0.930306
2001-12-30    0.812587
2001-12-31   -0.043869
Freq: D, Length: 365, dtype: float64

In [97]:
longer_ts['2001-05']

2001-05-01   -2.273047
2001-05-02   -1.099983
2001-05-03    0.971888
2001-05-04    0.833413
2001-05-05    0.647367
2001-05-06    0.868880
2001-05-07   -0.450436
2001-05-08   -3.219856
2001-05-09   -0.715843
2001-05-10    0.471729
2001-05-11   -3.239578
2001-05-12    0.217159
2001-05-13   -0.043131
2001-05-14   -0.001210
2001-05-15    0.038861
2001-05-16    1.080035
2001-05-17    0.739001
2001-05-18   -1.462857
2001-05-19    0.678147
2001-05-20   -1.311016
2001-05-21   -0.010343
2001-05-22   -0.518714
2001-05-23    0.433617
2001-05-24    0.612924
2001-05-25   -0.892959
2001-05-26   -0.180982
2001-05-27    0.173832
2001-05-28    1.001356
2001-05-29   -0.083821
2001-05-30   -2.032613
2001-05-31   -0.230824
Freq: D, dtype: float64

In [98]:
ts[datetime(2011, 1, 7):]

2011-01-07   -2.051296
2011-01-08   -3.027782
2011-01-10   -0.398243
2011-01-12    0.291199
dtype: float64

There is an equivalent instance method, truncate , that slices a Series between two
dates:

In [100]:
ts

2011-01-02    0.487512
2011-01-05    1.070614
2011-01-07   -2.051296
2011-01-08   -3.027782
2011-01-10   -0.398243
2011-01-12    0.291199
dtype: float64

In [101]:
ts.truncate(after='1/9/2011')

2011-01-02    0.487512
2011-01-05    1.070614
2011-01-07   -2.051296
2011-01-08   -3.027782
dtype: float64

In [102]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')

long_df = pd.DataFrame(np.random.randn(100, 4),index=dates,columns=['Colorado', 'Texas','New York', 'Ohio'])

long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-0.272906,0.11022,0.222743,0.236006
2001-05-09,-2.010424,0.340716,2.09308,0.150531
2001-05-16,-0.923903,0.187398,-1.839861,1.043822
2001-05-23,-0.216278,0.23551,-2.032027,-0.483069
2001-05-30,0.001162,0.198646,0.078993,-0.373432


## Time Series with Duplicate Indices

In [107]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000','1/2/2000', '1/3/2000'])

dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

We can tell that the index is not unique by checking its is_unique property:

In [108]:
dup_ts.index.is_unique

False

Indexing into this time series will now either produce scalar values or slices depend‐
ing on whether a timestamp is duplicated:

In [109]:
dup_ts['1/3/2000']

4

In [110]:
dup_ts['1/2/2000']

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

Suppose you wanted to aggregate the data having non-unique timestamps. One way
to do this is to use groupby and pass level=0 :

In [111]:
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [112]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64