# Datetime module

## Date Datatypes

Data types to represent time series data:

* **datetime64[ns]**:  
    - This data type is mainly used for representing timestamps. 
    - It can be on the day level such as '2022-05-20' and May 1st, 2022. 
    - The precision of a timestamp can even be on a nanosecond level.

* **timedelta64[ns]**: 
    - This data type can be used for expressing differences in times. 
    - The units can be 
        - days, 
        - hours, 
        - minutes, and so on. 
    - It can take on both positive and negative values. 
        - if we subtract a future date from today, we will end up having a negative timedelta value.

* **period[freq]**: 
    - This data type represents fixed durations such as 
        - month, 
        - quarter, and 
        - year. 
    - it is similar to timedelta but the durations are fixed.
        - a period[M] data type can take the value of 2022-01 but cannot be 1 month or 2 months.

### Manipulating date and time data using Python’s datetime module

5 main object classes
- datetime 
    - allows us to manipulate dates and times together (month, day, year, hour, second, microsecond).
- date
    - allows us to manipulate dates only (month, day, year).
- time
    - allows us to manipulate time only (hour, minute, second, microsecond).
- timedelta 
    - used for measuring duration, the difference between two dates or times.
- tzinfo
    - used for dealing with time zones. We won’t be covering this one in this tutorial.

### Datetime is both a module and a class within that module

2 ways to work with dateime object

1. Pass the arguments to datetime starting with the largest time unit and ending with the smallest 
    - (year, month, day, hours, minutes, second).
2. To get the current datetime you can easily do so using:
    - .now() function.

In [33]:
# import datetime
import datetime
from datetime import datetime
from datetime import timezone

In [57]:
# Create a datetime object for a custom date.
my_date = datetime(year=2022, month=1, day=14)
print(my_date)

2022-01-14 00:00:00


In [58]:
# Create a date-specific object, import date and time from datetime

from datetime import time, date

my_time = time(hour=15, minute=30, second=24)
print(my_time)

my_date = date(year=2022, month=1, day=14)
print(my_date)

15:30:24
2022-01-14


In [59]:
# Combine the two, use combine() function.

date_time = datetime.combine(my_date, my_time)
print(date_time)

2022-01-14 15:30:24


In [5]:
# create datetime object 
datetime_object1 = datetime(2021, 6, 1, 15, 23, 25)
datetime_object1

datetime.datetime(2021, 6, 1, 15, 23, 25)

##### Today’s date and time in different formats

In [6]:
# get current date
datetime_object2 = datetime.now()
datetime_object2

datetime.datetime(2023, 10, 29, 6, 56, 47, 189756)

In [31]:
# Get current Time and Date
current_time = datetime.now()
print(current_time)

2023-10-29 09:43:55.824011


##### Get only either the Date or Time
Using now() returns both date and time together. 

if we want just the date or just the time, then we can use date() or time() function on the datetime object to get the respective value.

In [56]:
curr_date_time = datetime.now()
print("Date: ", curr_date_time.date())
print("Time: ", curr_date_time.time())

Date:  2023-10-30
Time:  21:50:28.298392


##### Unix timestamp / Epoch time calculation

Unix timestamps are commonly used for files in operating systems. Often they show up in datasets as well.

First, we can get the current Unix timestamp

In [156]:
from datetime import timezone
dt_now = datetime.now(timezone.utc)
print(dt_now)
print(dt_now.tzinfo)
print(dt_now.timestamp()) # the unix timestamp.

2023-11-05 00:48:08.774456+00:00
UTC
1699145288.774456


In [37]:
current_time_rec = datetime.now()
print(current_time_rec)
current_time_utc = datetime.now(tz=timezone.utc)
print('To check UTC time to my local time:' , current_time_utc)

2023-10-29 09:47:36.852275
To check UTC time to my local time: 2023-10-29 07:47:36.852493+00:00


UTC time is 2 hours less than my local time.

##### To always generate a UTC datetime regardless of the local time zone

In [175]:
datetime.utcnow()

datetime.datetime(2023, 11, 5, 5, 13, 39, 715636)

##### Convert Unix timestamp to datetime format

In [176]:
import pytz

In [177]:
for tz in pytz.all_timezones:
    print(tz)

Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Juba
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa
Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Timbuktu
Africa/Tripoli
Africa/Tunis
Africa/Windhoek
America/Adak
America/Anchorage
America/Anguilla
America/Antigua
America/Araguaina
America/Argentina/Buenos_Aires
America/Argentina/Catamarca
America/Argentina/ComodRivad

In [178]:
utc_timestamp = 1377050861.206272
unix_ts_dt = datetime.fromtimestamp(utc_timestamp, timezone.utc)
print(unix_ts_dt)
print(unix_ts_dt.astimezone(pytz.timezone("Africa/Johannesburg")))
print(unix_ts_dt.astimezone(pytz.timezone("Africa/Gaborone")))

2013-08-21 02:07:41.206272+00:00
2013-08-21 04:07:41.206272+02:00
2013-08-21 04:07:41.206272+02:00


# Time zones settings

In [179]:
import pytz
timezone = pytz.timezone("Africa/Johannesburg")
dtz = timezone.localize(d)
print(dtz.tzinfo)
print(dtz)

Africa/Johannesburg
2023-11-05 02:29:48.271419+02:00


In [180]:
shanghai_dt = dtz.astimezone(pytz.timezone("Asia/Shanghai"))
print(shanghai_dt)

2023-11-05 08:29:48.271419+08:00


In [181]:
# if interested in the whole list of different time zones
for tz in pytz.all_timezones:
    print(tz)

Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Juba
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa
Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Timbuktu
Africa/Tripoli
Africa/Tunis
Africa/Windhoek
America/Adak
America/Anchorage
America/Anguilla
America/Antigua
America/Araguaina
America/Argentina/Buenos_Aires
America/Argentina/Catamarca
America/Argentina/ComodRivad

ZoneInfo is available and it solves many of the issues seen with pytz,
- DON’T USE pytz, use ZoneInfo

### Using Time Zone In Practice

In [182]:
!pip install python-dateutil



In [183]:
from dateutil import tz
import dateutil.parser

In [184]:
# Last ride
last_no_tz=datetime(2017,12,30,15,9,3)
last_tz=datetime(2017,12,30,15,9,3,tzinfo= None)
print(last_no_tz)
print(last_tz)

2017-12-30 15:09:03
2017-12-30 15:09:03


In [185]:
# First ride
first_no_tz=datetime(2017,10,1,15,23,25)
first_tz=datetime(2017,10,1,15,23,25)
print(first_no_tz)
print(first_tz)

2017-10-01 15:23:25
2017-10-01 15:23:25


##### Get timezones of other areas

In [186]:
# generates a UTC date, attaches a time zone of UTC to that date, then converts the time to U.S./Eastern. 

import pytz
from pytz import timezone
utc = pytz.utc
eastern = timezone('US/Eastern')
date = datetime.utcnow().astimezone(utc)
date

datetime.datetime(2023, 11, 5, 3, 20, 30, 187568, tzinfo=<UTC>)

In [187]:
date.astimezone(eastern)

datetime.datetime(2023, 11, 4, 23, 20, 30, 187568, tzinfo=<DstTzInfo 'US/Eastern' EDT-1 day, 20:00:00 DST>)

##### Use Python datetime methods to format time zones

In [188]:
datetime.strftime(date, "%Z")

'UTC'

In [189]:
datetime.strftime(date.astimezone(eastern), "%Z")

'EDT'

### Date Palindromes
You may have heard that February 2, 2020 was the first true date palindrome in 909 years. In other words, 02/02/2020 is the same backward and forward.

In [190]:
datetime.strftime(datetime(2020, 2, 2, 2, 2, 2), "%Y%m%d")

'20200202'

In [192]:
 datetime.strftime(datetime(2020, 2, 2, 2, 2, 2), "%Y%m%d")[::-1]

'20200202'

### Conversion datetime object

If you don’t need to have millisecond information
- drop using split() method. 

Object returned by datetime.datetime.now() is a datetime class
- convert it to a string if you want to apply split().

In [39]:
current_time_without_ms = str(datetime.now()).split('.')[0] # after applying, split(), the response will be a list of string objects. select the 1st element in list
print(current_time_without_ms)

2023-10-29 09:54:24


# TL;DR: the cheatsheet

###  Parse date strings

In [195]:
format = '%Y-%m-%dT%H:%M:%S%z'
datestring = '2016-09-20T16:43:45-07:00'

# python 2.7
# d = dateutil.parser.parse(datestring) 

# python 3.2+
d = datetime.strptime(datestring, format)
d

datetime.datetime(2016, 9, 20, 16, 43, 45, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200)))

# Parsing and Formating

### How to handling date and time strings with datetime module

To convert objects from strings to datetime objects and vice versa. 

Datetime includes two useful methods:
- strptime()
    - use strptime() to read strings containing date and time information and convert them to datetime objects.
    - strptime() we have to take into consideration that it can’t turn any string into a date and time, and therefore we’ll have to indicate the time format ourselves.
    -  strptime() takes two arguments:
        - string — time in string format
        - format — specific formatting of the time in the string
    - formatting codes required to help strptime() interpret our string input:
        - %a	Sun	Weekday as locale’s abbreviated name.
        - %A	Sunday	Weekday as locale’s full name.
        - %w	0	Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.
        - %d	08	Day of the month as a zero-padded decimal number.
        - %-d	8	Day of the month as a decimal number. (Platform specific)
        - %b	Sep	Month as locale’s abbreviated name.
        - %B	September	Month as locale’s full name.
        - %m	09	Month as a zero-padded decimal number.
        - %-m	9	Month as a decimal number. (Platform specific)
        - %y	13	Year without century as a zero-padded decimal number.
        - %Y	2013	Year with century as a decimal number.
        - %H	07	Hour (24-hour clock) as a zero-padded decimal number.
        - %-H	7	Hour (24-hour clock) as a decimal number. (Platform specific)
        - %I	07	Hour (12-hour clock) as a zero-padded decimal number.
        - %-I	7	Hour (12-hour clock) as a decimal number. (Platform specific)
        - %p	AM	Locale’s equivalent of either AM or PM.
        - %M	06	Minute as a zero-padded decimal number.
        - %-M	6	Minute as a decimal number. (Platform specific)
        - %S	05	Second as a zero-padded decimal number.
        - %-S	5	Second as a decimal number. (Platform specific)
        - %f	000000	Microsecond as a decimal number, zero-padded to 6 digits.
        - %z	+0000	UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive).
        - %Z	UTC	Time zone name (empty string if the object is naive).
        - %j	251	Day of the year as a zero-padded decimal number.
        - %-j	251	Day of the year as a decimal number. (Platform specific)
        - %U	36	Week number of the year (Sunday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0.
        - %-U	36	Week number of the year (Sunday as the first day of the week) as a decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. (Platform specific)
        - %W	35	Week number of the year (Monday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Monday are considered to be in week 0.
        - %-W	35	Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. (Platform specific)
        - %c	Sun Sep 8 07:06:05 2013	Locale’s appropriate date and time representation.
        - %x	09/08/13	Locale’s appropriate date representation.
        - %X	07:06:05	Locale’s appropriate time representation.
        - %%	%	A literal '%' character.
- strftime().
    - strftime() to convert datetime objects back into strings.   

##### Use the above list to customize different date formats.

In [135]:
d = datetime.now() #today's datetime
d

datetime.datetime(2023, 11, 5, 2, 20, 42, 782905)

In [136]:
print(d.strftime("%A %d/%m/%Y")) # date to string

Sunday 05/11/2023


#####  Two popular strings being converted to date format

### Parsing

Use when there’s a date as a string and want to parse it to a datetime object. 

##### Use Python built-in method to parse dates, strptime.

In [23]:
date_string = '2020-11-27'

# Create date object with time format yyyy-mm-dd
date = datetime.strptime(date_string, "%Y-%m-%d")
date

datetime.datetime(2020, 11, 27, 0, 0)

In [167]:
 datetime.strptime("2020-01-01 14:00", "%Y-%m-%d %H:%M")

datetime.datetime(2020, 1, 1, 14, 0)

In [52]:
date_example_1 = "2021-08-06 16:00:00"
date_example_2 = "2021/08/06 16/00/00"
date_example_3 = "06 August 21 - 16:00:00"
date_example_4 = "06 August 21 - 4 PM"
date_example_5 = "Aug 2021, 06 / 4 PM"
date_time_obj_example_1 = datetime.strptime(date_example_1, '%Y-%m-%d %H:%M:%S')
date_time_obj_example_2 = datetime.strptime(date_example_2, '%Y/%m/%d %H/%M/%S')
date_time_obj_example_3 = datetime.strptime(date_example_3, '%d %B %y - %H:%M:%S')
date_time_obj_example_4 = datetime.strptime(date_example_4, '%d %B %y - %I %p')
date_time_obj_example_5 = datetime.strptime(date_example_5, '%b %Y, %d / %I %p')
print(date_time_obj_example_1)
print(date_time_obj_example_2)
print(date_time_obj_example_3)
print(date_time_obj_example_4)
print(date_time_obj_example_5)

2021-08-06 16:00:00
2021-08-06 16:00:00
2021-08-06 16:00:00
2021-08-06 16:00:00
2021-08-06 16:00:00


This takes the string “2020–01–01 14:00” and parses it to a datetime object.

In [137]:
date_string = '2016-02-01 12:00PM'
print(datetime.strptime(date_string, '%Y-%m-%d %I:%M%p'))

2016-02-01 12:00:00


In [138]:
date_string = '02/01/2016'
d2 = datetime.strptime(date_string, '%m/%d/%Y')
print(d2)

2016-02-01 00:00:00


### dateutil module to Parse datetime object

##### dateutil is a great library that extends Python’s datetime functionality.

In [168]:
!pip install python-dateutil



In [169]:
 import dateutil.parser as parser

In [170]:
parser.parse("2020-01-01 14:00")

datetime.datetime(2020, 1, 1, 14, 0)

In [171]:
parser.parse("01-01-2020 2:00pm")

datetime.datetime(2020, 1, 1, 14, 0)

In [172]:
parser.parse("2020-01-01T14:00")

datetime.datetime(2020, 1, 1, 14, 0)

dateutil isn’t always perfect when parsing dates. 
- For example, 2/1/2020 could be either 
    - Feb. 1 or 
    - Jan. 2.
    

##### When to use dateutil and strptime
- The dateutil module works best when parsing unstructured text with a variety of date formats. 
- If the date format is known, it’s best to use strptime.

### Formatting

Starting with a datetime object and want to build a formatted string. 

##### Python provides the strftime method for this.

In [53]:
example_time = "2021-08-06 16:00:00"
datetime_object = datetime.strptime(example_time, '%Y-%m-%d %H:%M:%S')
date_example_1 = datetime_object.strftime('%Y-%m-%d %H:%M:%S')
date_example_2 = datetime_object.strftime('%Y/%m/%d %H/%M/%S')
date_example_3 = datetime_object.strftime('%d %B %y - %H:%M:%S')
date_example_4 = datetime_object.strftime('%d %B %y - %I %p')
date_example_5 = datetime_object.strftime('%b %Y, %d / %I %p')
print(date_example_1)
print(date_example_2)
print(date_example_3)
print(date_example_4)
print(date_example_5)

2021-08-06 16:00:00
2021/08/06 16/00/00
06 August 21 - 16:00:00
06 August 21 - 04 PM
Aug 2021, 06 / 04 PM


In [174]:
date = datetime.strptime("2020-01-01 14:00", "%Y-%m-%d %H:%M")
datetime.strftime(date, "%a %B %d, %Y %I:%M %p")

'Wed January 01, 2020 02:00 PM'

### Accessing Individual Components with datetime module

- Access individual components of adatetime object using the datetime’s attributes
    - year, 
    - day, and 
    - hour attributes

In [60]:
from datetime import datetime

date_time = datetime(year=2022, month=1, day=14, 
                     hour=15, minute=30, second=24)
print(date_time)

print("Year: ", date_time.year)
print("Month: ", date_time.month)
print("Day: ", date_time.day)
print("Hour: ", date_time.hour)
print("Minutes: ", date_time.minute)
print("Seconds: ", date_time.second)

2022-01-14 15:30:24
Year:  2022
Month:  1
Day:  14
Hour:  15
Minutes:  30
Seconds:  24


In [61]:
# extract year data
year = datetime_object2.year
year

2023

In [62]:
# extract month data
month = datetime_object2.month
month

10

In [63]:
# extract day data
day = datetime_object2.day 
day

29

In [64]:
# extract hourly data
hour = datetime_object2.hour
hour

6

- Find the day of the week using the .weekday() function.
    -  Python counts weekdays from 0, starting on Monday.
        - 0 = Monday
        - 1 = Tuesday
        - 2 = Wednesday
        - 3 = Thursday
        - 4 = Friday
        - 5 = Saturday
        - 6 = Sunday

In [16]:
# extract weekday data
datetime_object2.weekday()

6

In [44]:
print(f"{year}-{month:02d}-{day:02d} {hour:02d}")

2023-10-29 06


### Working with timestamp

Looking into Unix timestamp AKA Posix timestamp.

Devices send timestamp information to servers instead of pure date and time as it takes less space.

Unix timestamp is simply the number of seconds passed since 00:00:00 UTC, January 1st, 1970.

In [46]:
# to convert 2021–09–10 16:54:23 UTC time to a timestamp

example_time = "2021-09-10 16:54:23"
datetime_object = datetime.strptime(example_time, '%Y-%m-%d %H:%M:%S')
utc_time = datetime_object.replace(tzinfo=timezone.utc)
current_timestamp = datetime.timestamp(utc_time)
print(current_timestamp)

1631292863.0


In [47]:
# do the conversion to time and date, you can do that by using fromtimestamp() method

timestamp = 1631292863
time_and_date_utc = datetime.fromtimestamp(timestamp, tz=timezone.utc)
print(time_and_date_utc)

2021-09-10 16:54:23+00:00


# Timedeltas, DateOffsets and Periods in Datetime

- More special time objects designed to make Timestamp arithmetic easy and suit special needs for marking time in your data.
- different time objects in Python and Pandas

### Timedeltas 
- Are objects representing the difference between two points in time, between two timestamps.
    - Use several different units of measure.
    - Can be either positive or negative.

### How to measure time span

##### Do arithmetic on different datetime objects.

In [10]:
# find out the number of seconds elapsed between the previous two datetimes.

duration = datetime_object2 - datetime_object1
duration

datetime.timedelta(days=879, seconds=56002, microseconds=189756)

In [9]:
duration.total_seconds()

76001602.189756

##### Difference in datetime calculation

In [139]:
from datetime import timedelta
d = datetime.now()
date_string = '2/01/2016'
d2 = datetime.strptime(date_string, '%m/%d/%Y')
print(d - d2)

2834 days, 2:29:48.271419


##### Print the difference of two datetimes in days, weeks or years

In [140]:
date_diff = (d - d2)/timedelta(days=1)
print('date_diff = {} days'.format(date_diff))

date_diff = 2834.1040309192013 days


In [141]:
date_diff = (d - d2)/timedelta(weeks=1)
print('date_diff = {} weeks'.format(date_diff))

date_diff = 404.8720044170288 weeks


In [142]:
date_diff = (d - d2)/timedelta(days=365)
print('date_diff = {} years'.format(date_diff))

date_diff = 7.764668577860825 years


##### Working With Timedelta


Timedelta class comes in handy in situations where we want to measure duration as it represents the amount of time between two dates or times.

Useful when we want to add or subtract from dates or times.
- find out what was the date and time in the past (or what will be in the future)
    - substract or add some sort of value to our time object.

timedelta can be used with any number of weeks, days, hours, minutes.

It can be used with time units as small as a microsecond or as large as 2.7 million years!

##### How timedelta objects and datetime objects can be used together to do mathematical operations.

In [19]:
# import timedelta
from datetime import timedelta

In [21]:
# create a 27 day timedelta
td = timedelta(days=27)

#add 27 days to current date
days_later_27 = datetime_object2 + td
days_later_27

datetime.datetime(2023, 11, 25, 6, 56, 47, 189756)

##### Want to know, what date and time would be in the future or was in the past if you add or substract certain amount of days, hours.

In [48]:
# Timedelta option using
start_time_and_date = "2021-08-10 17:45:00"
date_time_obj = datetime.strptime(start_time_and_date, '%Y-%m-%d %H:%M:%S')

# Calculating future date
future_time_and_date = date_time_obj + timedelta(days=2, hours=3, minutes=22)
# Calculating past date
past_time_and_date = date_time_obj + timedelta(days=-4, hours=-2, minutes=-13)

print(past_time_and_date)
print(future_time_and_date)

2021-08-06 15:32:00
2021-08-12 21:07:00


##### Adding a Certain Number of Days

In [65]:
# Create a date/time object
date_time = datetime(year=2022, month=1, day=14, hour=15, minute=30, second=24)
print(date_time)

# Add 2 days
date_time = date_time + timedelta(days=2)
print("Date after adding 2 days: ", date_time)

2022-01-14 15:30:24
Date after adding 2 days:  2022-01-16 15:30:24


##### Adding Other Date/Time Parameters like hours , years

In [66]:
# Add 2 days, 3 hours and 20 seconds
date_time = date_time + timedelta(days=2, hours=3, seconds=20)
print("Date after adding 2 days, 3 hours and 20 seconds: ", date_time)

Date after adding 2 days, 3 hours and 20 seconds:  2022-01-18 18:30:44


##### Subtracting a Certain Number of Days

In [67]:
# Create a date/time object
date_time = datetime(year=2022, month=2, day=1, hour=15, minute=30, second=24)
print(date_time)

# Subtracting two days
date_time = date_time + timedelta(days=-2)
print("Date after subtracting 3 days: ", date_time)

2022-02-01 15:30:24
Date after subtracting 3 days:  2022-01-30 15:30:24


##### Mixed Calculations: adds days, and hours but subtracts minutes

In [68]:
# Create a date/time object
date_time_mixed = datetime(year=2022, month=2, day=1, hour=15, minute=30, second=24)
print(date_time_mixed)

# Adding 3 days, subtracting 4 minutes, and adding 2 hours
date_time_mixed = date_time_mixed + timedelta(days=3, minutes=-4, hours=2)
print("Date after adding 3 days, subtracting 4 minutes, and adding 2 hours: ", date_time_mixed)

2022-02-01 15:30:24
Date after adding 3 days, subtracting 4 minutes, and adding 2 hours:  2022-02-04 17:26:24


##### Want to know the difference, lets say in days/hours/minutes seconds between two dates, or timestamps.

In [50]:
date1 = "2021-08-06 15:32:00"
date2 = "2021-08-12 21:07:00"

date_time_obj_1 = datetime.strptime(date1, '%Y-%m-%d %H:%M:%S')
date_time_obj_2 = datetime.strptime(date2, '%Y-%m-%d %H:%M:%S')

diff = date_time_obj_2 - date_time_obj_1
print(diff)

6 days, 5:35:00


# Timedeltas, DateOffsets and Periods in Pandas

###### Creating a Timedelta is subtracting a timestamp from another

In [83]:
date_1 = pd.to_datetime('2023-11-01 02:32:45')
date_2 = pd.to_datetime('2023-10-15 14:02:34')

In [84]:
date_1 - date_2

Timedelta('16 days 12:30:11')

##### Datetime plus/minus a certain period of time

In [143]:
print(d + timedelta(seconds=1)) # today + one second
print(d + timedelta(minutes=1)) # today + one minute
print(d + timedelta(hours=1)) # today + one hour
print(d + timedelta(days=1)) # today + one day
print(d + timedelta(weeks=1)) # today + one week
print(d + timedelta(days=1)*365) # today + one year

2023-11-05 02:29:49.271419
2023-11-05 02:30:48.271419
2023-11-05 03:29:48.271419
2023-11-06 02:29:48.271419
2023-11-12 02:29:48.271419
2024-11-04 02:29:48.271419


##### Datetime comparisons

In [144]:
# d is no more than 6 years (assume each year has 365 days) after d2?
print(d < (d2 +(timedelta(days=365*6))))
# d is more than 6 years (assume each year has 52 weeks) after d2?
print(d > (d2 +(timedelta(weeks=52*6))))
# d2 is not the same date as d?
print(d != d2)
# d2 is the same date as d?
print(d == d2) 

False
True
True
False


##### Constructing a Timedelta object using Pandas.Timedelta. 
- need to specify as argument(s) what is the timedelta duration you would like to have.
    - use strings as arguments or 
    - provide integers as keyword arguments
    - other methods you can find in the Parsing timedeltas

Possible values:
* 'W', 'D', 'T', 'S', 'L', 'U', or 'N'
* 'days' or 'day'
* 'hours', 'hour', 'hr', or 'h'
* 'minutes', 'minute', 'min', or 'm'
* 'seconds', 'second', or 'sec'
* 'milliseconds', 'millisecond', 'millis', or 'milli'
* 'microseconds', 'microsecond', 'micros', or 'micro'
* 'nanoseconds', 'nanosecond', 'nanos', 'nano', or 'ns'.

kwargs
- Available kwargs: 
    - days, 
    - seconds, 
    - microseconds, 
    - milliseconds, 
    - minutes, 
    - hours, 
    - weeks

In [85]:
pd.Timedelta('3 days')

Timedelta('3 days 00:00:00')

In [86]:
pd.Timedelta('3 day')

Timedelta('3 days 00:00:00')

In [87]:
pd.Timedelta('3 d')

Timedelta('3 days 00:00:00')

In [89]:
pd.Timedelta(days=3)

Timedelta('3 days 00:00:00')

##### To increase a Timestamp’s value with a specific amount of time, we can use timedeltas to make this operation.

In [90]:
date = pd.to_datetime('2023-12-21')

In [91]:
timedelt = pd.Timedelta(days=4)

In [92]:
date + timedelt

Timestamp('2023-12-25 00:00:00')

##### Extract Timedeltas from Timestamps and do operations with entire date ranges without looping

In [93]:
drange  = pd.date_range(end='2023-02', periods=4, freq="M")
drange

DatetimeIndex(['2022-10-31', '2022-11-30', '2022-12-31', '2023-01-31'], dtype='datetime64[ns]', freq='M')

In [94]:
tdelt = pd.Timedelta(hours= 9)

In [95]:
drange - tdelt

DatetimeIndex(['2022-10-30 15:00:00', '2022-11-29 15:00:00',
               '2022-12-30 15:00:00', '2023-01-30 15:00:00'],
              dtype='datetime64[ns]', freq=None)

##### Create a new Series based on the original user creation date

In [98]:
sa_temperatures['date_plus_30_years'] = sa_temperatures['date'] + pd.Timedelta(weeks=51)

In [99]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28
...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04


##### How many days (months, years, seconds) ago the users were created
- use the Timedelta objects marking the difference between today and the creation date to divide with another Timedelta.
- Timedelta objects have attributes and methods to access certain parts of the data or perform further operations.

In [100]:
todays = datetime.today()
todays

datetime.datetime(2023, 11, 1, 3, 8, 57, 331239)

In [101]:
sa_temperatures['How_many_days_ago'] = todays - sa_temperatures['date']

In [103]:
sa_temperatures['How_many_years_ago'] = sa_temperatures.How_many_days_ago / pd.Timedelta(days=365)

In [104]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years,How_many_days_ago,How_many_years_ago
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24,10531 days 03:08:57.331239,28.852414
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25,10530 days 03:08:57.331239,28.849675
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26,10529 days 03:08:57.331239,28.846935
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27,10528 days 03:08:57.331239,28.844195
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28,10527 days 03:08:57.331239,28.841455
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01,1271 days 03:08:57.331239,3.482551
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02,1270 days 03:08:57.331239,3.479812
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03,1269 days 03:08:57.331239,3.477072
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04,1268 days 03:08:57.331239,3.474332


### DateOffsets
- DateOffests are closely related to Timedeltas, however, working slightly differently.
- The different logic behind the two:
    - sometimes the difference between two consecutive days at midnight is not 24 hours: 
        - e.g. daylight saving can shake things up a bit. 
            - 1 day Timedelta will always add 24 hours to the timestamp, 
            - while DateOffset will move the timestamp forward to the next day and then exact hour of the original timestamp.
- Offset objects offer lots of opportunities to move forward or backward timestamps to a certain logical point in time instead of by a fixed period.

In [106]:
timeser = pd.Timestamp('2023-11-01 00:00:00', tz='Europe/Helsinki')
timeser

Timestamp('2023-11-01 00:00:00+0200', tz='Europe/Helsinki')

In [107]:
timeser + pd.Timedelta(days=1)

Timestamp('2023-11-02 00:00:00+0200', tz='Europe/Helsinki')

In [108]:
timeser + pd.DateOffset(days=1)

Timestamp('2023-11-02 00:00:00+0200', tz='Europe/Helsinki')

In [109]:
sa_temperatures['date_name'] = sa_temperatures['date'].dt.day_name()

In [110]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years,How_many_days_ago,How_many_years_ago,date_name
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24,10531 days 03:08:57.331239,28.852414,Sunday
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25,10530 days 03:08:57.331239,28.849675,Monday
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26,10529 days 03:08:57.331239,28.846935,Tuesday
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27,10528 days 03:08:57.331239,28.844195,Wednesday
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28,10527 days 03:08:57.331239,28.841455,Thursday
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01,1271 days 03:08:57.331239,3.482551,Saturday
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02,1270 days 03:08:57.331239,3.479812,Sunday
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03,1269 days 03:08:57.331239,3.477072,Monday
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04,1268 days 03:08:57.331239,3.474332,Tuesday


In [113]:
# To tell them when a transaction will be processed
offset = pd.offsets.BusinessHour(start= '09:00')
sa_temperatures['Business_day'] = sa_temperatures['date'].apply(lambda x: offset.rollforward(x))
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years,How_many_days_ago,How_many_years_ago,date_name,Business_day
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24,10531 days 03:08:57.331239,28.852414,Sunday,1995-01-02 09:00:00
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25,10530 days 03:08:57.331239,28.849675,Monday,1995-01-02 09:00:00
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26,10529 days 03:08:57.331239,28.846935,Tuesday,1995-01-03 09:00:00
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27,10528 days 03:08:57.331239,28.844195,Wednesday,1995-01-04 09:00:00
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28,10527 days 03:08:57.331239,28.841455,Thursday,1995-01-05 09:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01,1271 days 03:08:57.331239,3.482551,Saturday,2020-05-11 09:00:00
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02,1270 days 03:08:57.331239,3.479812,Sunday,2020-05-11 09:00:00
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03,1269 days 03:08:57.331239,3.477072,Monday,2020-05-11 09:00:00
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04,1268 days 03:08:57.331239,3.474332,Tuesday,2020-05-12 09:00:00


If it happened to be applied on a Saturday, instead of flipping to the next day the offset looked for the next Monday instead.

In [116]:
sa_temperatures['Business_day_name'] = sa_temperatures['Business_day'].dt.day_name()

In [117]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years,How_many_days_ago,How_many_years_ago,date_name,Business_day,Business_day_name
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24,10531 days 03:08:57.331239,28.852414,Sunday,1995-01-02 09:00:00,Monday
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25,10530 days 03:08:57.331239,28.849675,Monday,1995-01-02 09:00:00,Monday
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26,10529 days 03:08:57.331239,28.846935,Tuesday,1995-01-03 09:00:00,Tuesday
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27,10528 days 03:08:57.331239,28.844195,Wednesday,1995-01-04 09:00:00,Wednesday
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28,10527 days 03:08:57.331239,28.841455,Thursday,1995-01-05 09:00:00,Thursday
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01,1271 days 03:08:57.331239,3.482551,Saturday,2020-05-11 09:00:00,Monday
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02,1270 days 03:08:57.331239,3.479812,Sunday,2020-05-11 09:00:00,Monday
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03,1269 days 03:08:57.331239,3.477072,Monday,2020-05-11 09:00:00,Monday
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04,1268 days 03:08:57.331239,3.474332,Tuesday,2020-05-12 09:00:00,Tuesday


### Periods
- A Period is a certain time period (hour, day, week, month, quarter) represented in a time object.

main idea behind it
- while a Timestamp represents a certain moment in time, 
- a Period is used when your data is marked with the object is associated with a periodic recurrence, a timespan that the data is linked to.
    - even though it can very well be in such high resolution that it points to a certain second for instance

Example:
- sales data: 
    - every time you sell a donut in your shop you make a record of that sale, and the time of the sale (a Timestamp) is associated with each transaction. 
        - 100 purchases a day are leading to up to 100 different Timestamp. 
        - sum up your revenue at the end of the day, however, you have the sum of the revenue and the day (the Period) it was realized. 
        - Next day, next 100 Timestamps for the transactions, but your daily revenue gets only 1 Period mark — no need for specific Timestamp. 
        - Your daily Periods can be summed up into monthly, quarterly, and yearly Periods as well.

In [118]:
trans = '2023-11-01 04:56:45'

pd.to_datetime(trans)

Timestamp('2023-11-01 04:56:45')

In [119]:
pd.Period(trans, freq='H')

Period('2023-11-01 04:00', 'H')

Right off the bat, there seems to be no huge difference between a Timestamp and a Period — 
- the point starts to come through when we start to play with the arithmetics and the methods/attributes associated with Periods. 
    - The frequency parameter for example plays a huge role in the logic: 
        - when adding or subtracting to a Period, the amount specified in the frequency parameter will be added

In [120]:
pd.Period(trans, freq='3H')

Period('2023-11-01 04:00', '3H')

In [121]:
pd.Period(trans,freq='3H') + 1

Period('2023-11-01 07:00', '3H')

##### Creating Period ranges are one of the basic use cases I think when working with these objects
- the whole point is to have some sort of regular recurrence when using these types

In [125]:
pd.period_range(start='2023-11-01 08:00:00', end='2023-11-01 17:00:00', freq='H')

PeriodIndex(['2023-11-01 08:00', '2023-11-01 09:00', '2023-11-01 10:00',
             '2023-11-01 11:00', '2023-11-01 12:00', '2023-11-01 13:00',
             '2023-11-01 14:00', '2023-11-01 15:00', '2023-11-01 16:00',
             '2023-11-01 17:00'],
            dtype='period[H]', freq='H')

##### Creating a Period range where the frequency marks the quarters, the start and end dates:
- Periods encompass a given time span, and the object can return the beginning and the end of these timespans. 

In [127]:
next_Q = pd.period_range(start='2023-04-01', end='2025-03-31', freq='Q')

In [128]:
next_Q.start_time

DatetimeIndex(['2023-04-01', '2023-07-01', '2023-10-01', '2024-01-01',
               '2024-04-01', '2024-07-01', '2024-10-01', '2025-01-01'],
              dtype='datetime64[ns]', freq='QS-OCT')

In [129]:
next_Q.end_time

DatetimeIndex(['2023-06-30 23:59:59.999999999',
               '2023-09-30 23:59:59.999999999',
               '2023-12-31 23:59:59.999999999',
               '2024-03-31 23:59:59.999999999',
               '2024-06-30 23:59:59.999999999',
               '2024-09-30 23:59:59.999999999',
               '2024-12-31 23:59:59.999999999',
               '2025-03-31 23:59:59.999999999'],
              dtype='datetime64[ns]', freq=None)

##### Convert Timestamps to Periods (and vice versa): Consolidate into Quarter for reporting

In [130]:
sa_temperatures['date_quater_start'] = sa_temperatures['date'].apply(lambda x: pd.Period(x , freq='Q').start_time)

In [132]:
sa_temperatures['date_quarter'] = sa_temperatures['date'].apply(lambda x: pd.Period(x, freq='Q'))

In [133]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years,How_many_days_ago,How_many_years_ago,date_name,Business_day,Business_day_name,date_quater_start,date_quarter
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24,10531 days 03:08:57.331239,28.852414,Sunday,1995-01-02 09:00:00,Monday,1995-01-01,1995Q1
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25,10530 days 03:08:57.331239,28.849675,Monday,1995-01-02 09:00:00,Monday,1995-01-01,1995Q1
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26,10529 days 03:08:57.331239,28.846935,Tuesday,1995-01-03 09:00:00,Tuesday,1995-01-01,1995Q1
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27,10528 days 03:08:57.331239,28.844195,Wednesday,1995-01-04 09:00:00,Wednesday,1995-01-01,1995Q1
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28,10527 days 03:08:57.331239,28.841455,Thursday,1995-01-05 09:00:00,Thursday,1995-01-01,1995Q1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01,1271 days 03:08:57.331239,3.482551,Saturday,2020-05-11 09:00:00,Monday,2020-04-01,2020Q2
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02,1270 days 03:08:57.331239,3.479812,Sunday,2020-05-11 09:00:00,Monday,2020-04-01,2020Q2
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03,1269 days 03:08:57.331239,3.477072,Monday,2020-05-11 09:00:00,Monday,2020-04-01,2020Q2
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04,1268 days 03:08:57.331239,3.474332,Tuesday,2020-05-12 09:00:00,Tuesday,2020-04-01,2020Q2


### Lambda Functions on Datetime objects

##### Find the first or last date of the current month or the previous month or the next month with  Lambda functions

In [54]:
from datetime import datetime, timedelta

Current Month: The First day of the Month for a given Date

Use thereplace function to change the day to 1st day of the month.

In [None]:
# Get First day of the month for a give Date
first_day = lambda date: date.replace(day = 1)

Current Month: The Last day of the Month for a given Date

- First, it finds the first day of the current month
- Move to next month by adding 32 days
- Again find the first day of the next month
- Move back one day to find the last day of the Month.

In [None]:
# Last day of the month for a given date
last_day = lambda date: (date.replace(day=1) + timedelta(days=32)).replace(day=1) + timedelta(days=-1)

Alternatively:

In [None]:
# Shorter Version
last_day = lambda date: first_day(first_day(date) + timedelta(days=32)) + timedelta(days=-1)

Previous Month: The Last day of the Previous Month for a given Date

In [None]:
# Get Last day of the Previous Month for a given Date
last_day_of_prev_month = lambda date: date.replace(day = 1) + timedelta(days=-1)

Previous Month: The First day of the Previous Month for a given Date

In [None]:
# Get First day of the Previous Month for a give Date
first_day_of_prev_month = lambda date: (date.replace(day = 1) + timedelta(days=-1)).replace(day = 1)

In [None]:
# Shorter Version
first_day_of_prev_month = lambda date: first_day(last_day_of_prev_month(date))

Next Month: The First day of the Next Month for a given Date

In [None]:
# Get First day of the next month
first_day_of_next_month = lambda date: (date.replace(day=1) + timedelta(days=32)).replace(day=1)

Next Month: The Last day of the Next Month for a given Date

In [None]:
# Last Day of the next month
last_day_of_next_month = lambda date: (date.replace(day = 1) + timedelta(days = 63)).replace(day=1) + timedelta(days=-1)

In [None]:
# Shorter Version
last_day_of_next_month = lambda date: last_day(first_day_of_next_month(date))

# Pandas Module

### Working with datetime objects in pandas.

Similar to the datetime module, pandas also has 
- datetime and 
- timedelta objects with similar functionality to that of the datetime module.

convert datetime and duration strings into pandas Datetime objects using the following functions:
- to_datetime() — converts dates and times in string format to Python datetime objects.
- to_timedelta() — used for measuring duration.

These functions do a great job at converting strings to Python datetime as they detect the date’s format automatically, without requiring us to define it as we did for strptime().

Pandas is still able to correctly parse the string and return a datetime object.

In [1]:
# import pandas 
import pandas as pd
# create date object 
date2 = pd.to_datetime("4th of oct, 2020")
date2

Timestamp('2020-10-04 00:00:00')

### Extracting Individual Components in Pandas

To extract the month, hour, or minute from a date and store it as a new column inside a pandas dataframe. 
Do this using dt attributes.

Use df['date'].dt.month to extract the month from a pandas column that includes the full date and store it inside a new column.

Use df['date'].dt.weekday to extract the number of the day, with 
- 0 representing Monday
- 1 represnting Tuesday
- 2 representing Wednesday

To extract the name of the weekday straight away, we can conveniently use df['date'].dt.weekday_name

### Working with Dates in Pandas DataFrame

In [2]:
import pandas as pd

# Read City Temperatures from a `csv` file
city_temperatures = pd.read_csv('city_temperature.csv')

# As this dataset it too big, so we will be working with one country 
# (This step is optional)
sa_temperatures = city_temperatures[city_temperatures.Country == 'South Africa']

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [3]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,Month,Day,Year,AvgTemperature
197797,Africa,South Africa,,Capetown,1,1,1995,66.8
197798,Africa,South Africa,,Capetown,1,2,1995,67.8
197799,Africa,South Africa,,Capetown,1,3,1995,66.9
197800,Africa,South Africa,,Capetown,1,4,1995,69.5
197801,Africa,South Africa,,Capetown,1,5,1995,70.6
...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,5,9,2020,64.0
207059,Africa,South Africa,,Capetown,5,10,2020,59.1
207060,Africa,South Africa,,Capetown,5,11,2020,64.0
207061,Africa,South Africa,,Capetown,5,12,2020,61.3


In [4]:
sa_temperatures['City'].value_counts()

Capetown    9266
Name: City, dtype: int64

In [5]:
# Create a pandas Series object called `date`
date = pd.to_datetime(sa_temperatures[['Year', 'Month', 'Day']])

# Add new columns to our dataset with this
sa_temperatures.insert(len(sa_temperatures.columns), 'date', date)

# See sample
sa_temperatures.sample(10)

Unnamed: 0,Region,Country,State,City,Month,Day,Year,AvgTemperature,date
200622,Africa,South Africa,,Capetown,9,26,2002,59.2,2002-09-26
205440,Africa,South Africa,,Capetown,12,5,2015,75.4,2015-12-05
203311,Africa,South Africa,,Capetown,2,5,2010,67.1,2010-02-05
199776,Africa,South Africa,,Capetown,6,2,2000,60.3,2000-06-02
204483,Africa,South Africa,,Capetown,4,22,2013,63.3,2013-04-22
203331,Africa,South Africa,,Capetown,2,25,2010,71.3,2010-02-25
206953,Africa,South Africa,,Capetown,1,25,2020,66.5,2020-01-25
199158,Africa,South Africa,,Capetown,9,23,1998,57.6,1998-09-23
202389,Africa,South Africa,,Capetown,7,29,2007,57.3,2007-07-29
203727,Africa,South Africa,,Capetown,3,28,2011,65.7,2011-03-28


In [6]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,Month,Day,Year,AvgTemperature,date
197797,Africa,South Africa,,Capetown,1,1,1995,66.8,1995-01-01
197798,Africa,South Africa,,Capetown,1,2,1995,67.8,1995-01-02
197799,Africa,South Africa,,Capetown,1,3,1995,66.9,1995-01-03
197800,Africa,South Africa,,Capetown,1,4,1995,69.5,1995-01-04
197801,Africa,South Africa,,Capetown,1,5,1995,70.6,1995-01-05
...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,5,9,2020,64.0,2020-05-09
207059,Africa,South Africa,,Capetown,5,10,2020,59.1,2020-05-10
207060,Africa,South Africa,,Capetown,5,11,2020,64.0,2020-05-11
207061,Africa,South Africa,,Capetown,5,12,2020,61.3,2020-05-12


In [7]:
sa_temperatures.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9266 entries, 197797 to 207062
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   Region          9266 non-null   object        
 1   Country         9266 non-null   object        
 2   State           0 non-null      object        
 3   City            9266 non-null   object        
 4   Month           9266 non-null   int64         
 5   Day             9266 non-null   int64         
 6   Year            9266 non-null   int64         
 7   AvgTemperature  9266 non-null   float64       
 8   date            9266 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(3), object(4)
memory usage: 723.9+ KB


##### Reverse: From Date to Day, Month, Year

In [77]:
# how we can extract values of Month, Day, and Year if we have full date values.

# Dropping Month, Day, and Year column
sa_temperatures = sa_temperatures.drop(columns=['Month', 'Day', 'Year'])

In [78]:
# add back Month, Day, and Year columns using pandas’ magic DateTime property dt
# extract many elements from date like day , month , year , day_of_year , day_of_week 

sa_temperatures['Month'] = sa_temperatures['date'].dt.month
sa_temperatures.sample(10)

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month
204141,Africa,South Africa,,Capetown,56.3,2012-05-15,5
202011,Africa,South Africa,,Capetown,55.3,2006-07-16,7
205359,Africa,South Africa,,Capetown,59.3,2015-09-15,9
200073,Africa,South Africa,,Capetown,64.0,2001-03-26,3
202484,Africa,South Africa,,Capetown,64.1,2007-11-01,11
198433,Africa,South Africa,,Capetown,57.7,1996-09-28,9
199729,Africa,South Africa,,Capetown,62.6,2000-04-16,4
204358,Africa,South Africa,,Capetown,73.5,2012-12-18,12
199597,Africa,South Africa,,Capetown,65.7,1999-12-06,12
198612,Africa,South Africa,,Capetown,71.6,1997-03-26,3


In [79]:
sa_temperatures['Day'] = sa_temperatures['date'].dt.day
sa_temperatures['Year'] = sa_temperatures['date'].dt.year
sa_temperatures['day_of_week'] = sa_temperatures['date'].dt.day_of_week
sa_temperatures.sample(10)

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week
200212,Africa,South Africa,,Capetown,53.1,2001-08-12,8,12,2001,6
202507,Africa,South Africa,,Capetown,66.5,2007-11-24,11,24,2007,5
200977,Africa,South Africa,,Capetown,58.4,2003-09-16,9,16,2003,1
198025,Africa,South Africa,,Capetown,56.8,1995-08-17,8,17,1995,3
206490,Africa,South Africa,,Capetown,63.2,2018-10-19,10,19,2018,4
205584,Africa,South Africa,,Capetown,56.5,2016-04-26,4,26,2016,1
206718,Africa,South Africa,,Capetown,60.4,2019-06-04,6,4,2019,1
204451,Africa,South Africa,,Capetown,69.7,2013-03-21,3,21,2013,3
201893,Africa,South Africa,,Capetown,65.2,2006-03-20,3,20,2006,0
203047,Africa,South Africa,,Capetown,58.0,2009-05-17,5,17,2009,6


In [80]:
def days_of_the_week(x):
    if x == 0:
        return "Monday"
    elif x == 1:
        return "Tuesday"
    elif x == 2:
        return "Wednesday"
    elif x == 3:
        return "Thursday"
    elif x == 4:
        return "Friday"
    elif x == 5:
        return "Saturday"
    else:
        return "Sunday"

In [81]:
sa_temperatures['days_of_the_week'] = sa_temperatures['day_of_week'].apply(lambda v: days_of_the_week(v))

In [82]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday
...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday


In [197]:
sa_temperatures

Unnamed: 0,Region,Country,State,City,AvgTemperature,date,Month,Day,Year,day_of_week,days_of_the_week,date_plus_30_years,How_many_days_ago,How_many_years_ago,date_name,Business_day,Business_day_name,date_quater_start,date_quarter
197797,Africa,South Africa,,Capetown,66.8,1995-01-01,1,1,1995,6,Sunday,1995-12-24,10531 days 03:08:57.331239,28.852414,Sunday,1995-01-02 09:00:00,Monday,1995-01-01,1995Q1
197798,Africa,South Africa,,Capetown,67.8,1995-01-02,1,2,1995,0,Monday,1995-12-25,10530 days 03:08:57.331239,28.849675,Monday,1995-01-02 09:00:00,Monday,1995-01-01,1995Q1
197799,Africa,South Africa,,Capetown,66.9,1995-01-03,1,3,1995,1,Tuesday,1995-12-26,10529 days 03:08:57.331239,28.846935,Tuesday,1995-01-03 09:00:00,Tuesday,1995-01-01,1995Q1
197800,Africa,South Africa,,Capetown,69.5,1995-01-04,1,4,1995,2,Wednesday,1995-12-27,10528 days 03:08:57.331239,28.844195,Wednesday,1995-01-04 09:00:00,Wednesday,1995-01-01,1995Q1
197801,Africa,South Africa,,Capetown,70.6,1995-01-05,1,5,1995,3,Thursday,1995-12-28,10527 days 03:08:57.331239,28.841455,Thursday,1995-01-05 09:00:00,Thursday,1995-01-01,1995Q1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207058,Africa,South Africa,,Capetown,64.0,2020-05-09,5,9,2020,5,Saturday,2021-05-01,1271 days 03:08:57.331239,3.482551,Saturday,2020-05-11 09:00:00,Monday,2020-04-01,2020Q2
207059,Africa,South Africa,,Capetown,59.1,2020-05-10,5,10,2020,6,Sunday,2021-05-02,1270 days 03:08:57.331239,3.479812,Sunday,2020-05-11 09:00:00,Monday,2020-04-01,2020Q2
207060,Africa,South Africa,,Capetown,64.0,2020-05-11,5,11,2020,0,Monday,2021-05-03,1269 days 03:08:57.331239,3.477072,Monday,2020-05-11 09:00:00,Monday,2020-04-01,2020Q2
207061,Africa,South Africa,,Capetown,61.3,2020-05-12,5,12,2020,1,Tuesday,2021-05-04,1268 days 03:08:57.331239,3.474332,Tuesday,2020-05-12 09:00:00,Tuesday,2020-04-01,2020Q2


In [196]:
sa_temperatures.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9266 entries, 197797 to 207062
Data columns (total 19 columns):
 #   Column              Non-Null Count  Dtype          
---  ------              --------------  -----          
 0   Region              9266 non-null   object         
 1   Country             9266 non-null   object         
 2   State               0 non-null      object         
 3   City                9266 non-null   object         
 4   AvgTemperature      9266 non-null   float64        
 5   date                9266 non-null   datetime64[ns] 
 6   Month               9266 non-null   int64          
 7   Day                 9266 non-null   int64          
 8   Year                9266 non-null   int64          
 9   day_of_week         9266 non-null   int64          
 10  days_of_the_week    9266 non-null   object         
 11  date_plus_30_years  9266 non-null   datetime64[ns] 
 12  How_many_days_ago   9266 non-null   timedelta64[ns]
 13  How_many_years_ago  9266 n

### Specify date columns as DateTime objects
- Pandas to treat columns as DateTime parsing dates

This will change the type of the columns datetime64.

In [None]:
df = pd.read_csv('capital-onebike.csv',parse_dates=['col date','col2 date'])

In [None]:
df['date_col']=pd.to_datetime(df['date_col'],format='%Y-%m-%d %H:%M:%S')

pandas has a pd.read_excel(), pd.read_json(), and even a pd.read_clipboard() function to read tabular data that you've copied from a document or website? Most have date parsing functionality

##### Calculate Duration time in seconds

In [None]:
df['cal_col']=df['End date']-df['Start date']
df['cal_col_in-secs']=(bike['End date']-bike['Start date']).dt.total_seconds()

##### Calculate How many days bike out of the station?

In [None]:
bike['Start date'].max()-bike['Start date'].min()

In [None]:
bike.duration_time.sum()/timedelta(days=91)

##### Group_by Member type and duration

In [None]:
bike.groupby('Member type')['duration_second'].mean()

count values with groupby(‘column’).size(). Or

call groupby(‘column’).first() for the first row of each column

##### Group by time
Group by time with pandas resample. 
Notice that resample() method can only be used with DateTime or timestamp objects. 
- ‘M’ is for months, 
- ‘Y’ is years, 
- ‘D’ is days.

In [None]:
bike.resample('M',on='Start date')['duration_second'].mean()

##### Calculate How many joy rides?

In [None]:
# Create joyrides
joyrides = (bike['Start station'] == bike['End station'])
# Total number of joyrides
print("{} rides were joyrides".format(joyrides.sum()))
# Median of all rides
print("The median duration overall was {:.2f} seconds".format(bike['duration_second'].median()))
# Median of joyrides
print("The median duration for joyrides was {:.2f} seconds".format(bike[joyrides]['duration_second'].median()))

##### Do members and casual riders drop off at the same rate over October to December, or does one drop off faster than the other?

In [None]:
# Resample rides to be monthly on the basis of Start date
monthly_rides = bike.resample('M', on = 'Start date')['Member type']
# Take the ratio of the .value_counts() over the total number of rides
print(monthly_rides.value_counts() / monthly_rides.size())

##### Figure out if keeping Member rides only would be enough to stabilize the numbers of users throughout the fall

In [None]:
# Group rides by member type, and resample to the month
grouped = bike.groupby('Member type').resample('M', on = 'Start date')
# Print the median duration for each group
print(grouped['duration_second'].median())

##### How long per day?

In [None]:
# Add a column for the weekday of the start of the ride
bike['Ride start weekday'] = bike['Start date'].dt.strftime('%A')
# Print the median trip time per weekday
print(bike.groupby('Ride start weekday')['duration_second'].median())

##### How much time elapsed between rides?

In [None]:
# Shift the index of the end date up one; now subract it from the start date
bike['Time since'] = bike['Start date'] — (bike['End date'].shift(1))
# Move from a timedelta to a number of seconds, which is easier to work with
bike['Time since'] = bike['Time since'].dt.total_seconds()
# Resample to the month
monthly = bike.resample('M', on = 'Start date')
# Print the average hours between rides each month
print(monthly['Time since'].mean()/(60*60))

# Data Manipulation in Pandas

* Time series data contains data points attached to sequential timestamps.
    * Stock prices over time
    * Daily, weekly, monthly sales
    * Periodic measurements in a process
    * Power or gas consumption rates over time
* Pandas is a great library for working on dates and times. It provides numerous functions for the process of analyzing and manipulating time series data.

In [1]:
import pandas as pd

df_sales = pd.read_csv("C:/Users/Siphamandla Mandindi/Documents/Explore_Data_Science/Sprint_3_Python_for_DS/500_exercises_to_master_pandas-main/Data/sales.csv")

df_sales.head()

Unnamed: 0,purchase_date,sold_date,qty
0,2021-05-31 00:00:00,2021-07-15,316
1,2021-06-30 07:00:00,2022-01-12,495
2,2021-07-30 13:00:00,2021-09-14,312
3,2021-08-31 13:00:00,2022-02-05,416
4,2021-09-28 07:00:00,2021-12-30,349


In [2]:
df_sales.dtypes

purchase_date    object
sold_date        object
qty               int64
dtype: object

In [3]:
df_sales = df_sales.astype{"purchase_date": "datetime64[ns]",
                            "sold_date": "datetime64[ns]"})

df_sales.dtypes

purchase_date    datetime64[ns]
sold_date        datetime64[ns]
qty                       int64
dtype: object

## Exercise 1: Day

* Date manipulation functions are available via the `.dt` accessor.

In [4]:
df_sales["sold_date"].dt.day

0    15
1    12
2    14
3     5
4    30
Name: sold_date, dtype: int64

## Exercise 2: Month

In [6]:
# method 1

df_sales["sold_date"].dt.month

0     7
1     1
2     9
3     2
4    12
Name: sold_date, dtype: int64

## Exercise 3: Year

In [5]:
df_sales["sold_date"].dt.year

0    2021
1    2022
2    2021
3    2022
4    2021
Name: sold_date, dtype: int64

## Exercise 4: Converting to Periods datatype

- This data type represents fixed durations such as 
    - month, 
    - quarter, and 
    - year. 
- it is similar to timedelta but the durations are fixed.
    - a `period[M]` data type can take the value of 2022-01 but cannot be 1 month or 2 months.

In [8]:
df_sales["sold_date"].dt.to_period("M")

0    2021-07
1    2022-01
2    2021-09
3    2022-02
4    2021-12
Name: sold_date, dtype: period[M]

In [9]:
df_sales["sold_month"] = df_sales["sold_date"].dt.to_period("M")

df_sales.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [10]:
df_sales.dtypes

purchase_date    datetime64[ns]
sold_date        datetime64[ns]
qty                       int64
sold_month            period[M]
dtype: object

## Exercise 5: Month name using `.month_name()` method

In [12]:
df_sales["sold_month_name"] = df_sales["sold_date"].dt.month_name()

In [13]:
df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December


## Exercise 6: Day of the week using the `.dayofweek()` & `day_name()` methods

In [16]:
df_sales["sold_day_ofthe_week"] = df_sales["sold_date"].dt.dayofweek

df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_name,sold_day_ofthe_week
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July,3,3
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January,2,2
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September,1,1
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,5
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December,3,3


In [17]:
df_sales.drop('sold_day_name',axis= 1, inplace= True)

In [18]:
df_sales["sold_day_name"] = df_sales["sold_date"].dt.day_name()

In [19]:
df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July,3,Thursday
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January,2,Wednesday
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September,1,Tuesday
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,Saturday
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December,3,Thursday


## Exercise 7: isocalendar

- The isocalendar can be used for getting the 
    - ISO year, 
    - week number, and 
    - weekday from a date in a single step. 
- It returns a DataFrame that contains these pieces of information in separate columns.

In [22]:
df_iso =  df_sales["sold_date"].dt.isocalendar()

df_iso

Unnamed: 0,year,week,day
0,2021,28,4
1,2022,2,3
2,2021,37,2
3,2022,5,6
4,2021,52,4


* The first value of the sold date column is "2021-07-15" which is in the 
    - 28th week of 2021 and is 
    - the 4th day of the week. 
        - The weekday value for 
            - Monday is 1st day = 0
            - Tuesday is 2nd day = 1
            - Wednesday is 3rd day = 2
            - the 4th day is Thursday = 3
            - Friday is the 5th day = 4
            - Saturday is the 6th day = 5
            - Sunday is the 7th day = 6

* When using the dayofweek method, the value for Monday is 0 so it returns 3 for Thursday.

In [21]:
df_sales["sold_date"].dt.isocalendar().week

0    28
1     2
2    37
3     5
4    52
Name: week, dtype: UInt32

## Exercise 8: Hour

In [23]:
df_sales["purchase_date_hour"] = df_sales["purchase_date"].dt.hour

In [24]:
df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name,purchase_date_hour
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July,3,Thursday,0
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January,2,Wednesday,7
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September,1,Tuesday,13
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,Saturday,13
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December,3,Thursday,7


## Exercise 9: Minutes

In [25]:
df_sales["purchase_date"].dt.minute

0    0
1    0
2    0
3    0
4    0
Name: purchase_date, dtype: int64

## Exercise 10:  is it Month end?

In [26]:
df_sales["purchase_date"].dt.is_month_end

0     True
1     True
2    False
3     True
4    False
Name: purchase_date, dtype: bool

## Execise 11: is it Weekend?

In [27]:
df_sales["sold_day_name"]

0     Thursday
1    Wednesday
2      Tuesday
3     Saturday
4     Thursday
Name: sold_day_name, dtype: object

In [28]:
df_sales["sold_date"].dt.day_name().isin(["Saturday", "Sunday"])

0    False
1    False
2    False
3     True
4    False
Name: sold_date, dtype: bool

In [30]:
df_sales[df_sales["sold_date"].dt.day_name().isin(["Saturday", "Sunday"])]

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name,purchase_date_hour
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,Saturday,13


## Exercise 12: Time difference between dates

In [31]:
df_sales["sold_date"] - df_sales["purchase_date"]

0    45 days 00:00:00
1   195 days 17:00:00
2    45 days 11:00:00
3   157 days 11:00:00
4    92 days 17:00:00
dtype: timedelta64[ns]

In [32]:
df_sales["time_difference"] = df_sales["sold_date"] - df_sales["purchase_date"]

df_sales.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name,purchase_date_hour,time_difference
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July,3,Thursday,0,45 days 00:00:00
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January,2,Wednesday,7,195 days 17:00:00
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September,1,Tuesday,13,45 days 11:00:00
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,Saturday,13,157 days 11:00:00
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December,3,Thursday,7,92 days 17:00:00


In [33]:
df_sales.dtypes

purchase_date           datetime64[ns]
sold_date               datetime64[ns]
qty                              int64
sold_month                   period[M]
sold_month_name                 object
sold_day_ofthe_week              int64
sold_day_name                   object
purchase_date_hour               int64
time_difference        timedelta64[ns]
dtype: object

In [36]:
# Time difference in days

df_sales["time_difference"].dt.days

0     45
1    195
2     45
3    157
4     92
Name: time_difference, dtype: int64

## Exercise 13: Time difference with Pandas Timedelta

* Pandas Timedelta does not support Month and Year

In [37]:
df_sales["time_difference"] / pd.Timedelta(days=1)

0     45.000000
1    195.708333
2     45.458333
3    157.458333
4     92.708333
Name: time_difference, dtype: float64

In [38]:
df_sales["time_difference"] / pd.Timedelta(hours=1)

0    1080.0
1    4697.0
2    1091.0
3    3779.0
4    2225.0
Name: time_difference, dtype: float64

## Exercise 14: Time difference with NumPy timedelta

* Pandas Timedelta does not support Month and Year but numpy does

In [40]:
import numpy as np

df_sales["time_difference"] / np.timedelta64(1, 'M')

0    1.478470
1    6.429975
2    1.493528
3    5.173275
4    3.045922
Name: time_difference, dtype: float64

In [41]:
df_sales["time_difference"] / np.timedelta64(1, 'Y')

0    0.123206
1    0.535831
2    0.124461
3    0.431106
4    0.253827
Name: time_difference, dtype: float64

In [42]:
df_sales["time_difference"] / np.timedelta64(1, 'D')

0     45.000000
1    195.708333
2     45.458333
3    157.458333
4     92.708333
Name: time_difference, dtype: float64

## Exercise 15: Add interval with DateOffset

In [46]:
df_sales["sold_date_limit"] = df_sales["sold_date"] + pd.DateOffset(days=2)

df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name,purchase_date_hour,time_difference,sold_date_limit
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July,3,Thursday,0,45 days 00:00:00,2021-07-17
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January,2,Wednesday,7,195 days 17:00:00,2022-01-14
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September,1,Tuesday,13,45 days 11:00:00,2021-09-16
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,Saturday,13,157 days 11:00:00,2022-02-07
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December,3,Thursday,7,92 days 17:00:00,2022-01-01


In [47]:
df_sales["sold_date_timesend"] = df_sales["sold_date"] + pd.DateOffset(hours=4)

In [48]:
df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name,purchase_date_hour,time_difference,sold_date_limit,sold_date_timesend
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,July,3,Thursday,0,45 days 00:00:00,2021-07-17,2021-07-15 04:00:00
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,January,2,Wednesday,7,195 days 17:00:00,2022-01-14,2022-01-12 04:00:00
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,September,1,Tuesday,13,45 days 11:00:00,2021-09-16,2021-09-14 04:00:00
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,February,5,Saturday,13,157 days 11:00:00,2022-02-07,2022-02-05 04:00:00
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,December,3,Thursday,7,92 days 17:00:00,2022-01-01,2021-12-30 04:00:00


In [49]:
df_sales["purchase_date"] = df_sales["purchase_date"] + pd.DateOffset(minutes=30)

df_sales

Unnamed: 0,purchase_date,sold_date,qty,sold_month,sold_month_name,sold_day_ofthe_week,sold_day_name,purchase_date_hour,time_difference,sold_date_limit,sold_date_timesend
0,2021-05-31 00:30:00,2021-07-15,316,2021-07,July,3,Thursday,0,45 days 00:00:00,2021-07-17,2021-07-15 04:00:00
1,2021-06-30 07:30:00,2022-01-12,495,2022-01,January,2,Wednesday,7,195 days 17:00:00,2022-01-14,2022-01-12 04:00:00
2,2021-07-30 13:30:00,2021-09-14,312,2021-09,September,1,Tuesday,13,45 days 11:00:00,2021-09-16,2021-09-14 04:00:00
3,2021-08-31 13:30:00,2022-02-05,416,2022-02,February,5,Saturday,13,157 days 11:00:00,2022-02-07,2022-02-05 04:00:00
4,2021-09-28 07:30:00,2021-12-30,349,2021-12,December,3,Thursday,7,92 days 17:00:00,2022-01-01,2021-12-30 04:00:00


## Exercise 16: Subtraction with DateOffset

* In order to do subtract instead of adding, we can either 
    - do subtraction or 
    - use a negative value inside the function.
    

The DateOffset function supports the following units:

* years
* months
* weeks
* days
* hours
* minutes
* seconds
* microseconds
* nanoseconds

In [50]:
df_sales["purchase_date"] - pd.DateOffset(hours=5)

0   2021-05-30 19:30:00
1   2021-06-30 02:30:00
2   2021-07-30 08:30:00
3   2021-08-31 08:30:00
4   2021-09-28 02:30:00
Name: purchase_date, dtype: datetime64[ns]

In [51]:
df_sales["purchase_date"] + pd.DateOffset(hours=-5)

0   2021-05-30 19:30:00
1   2021-06-30 02:30:00
2   2021-07-30 08:30:00
3   2021-08-31 08:30:00
4   2021-09-28 02:30:00
Name: purchase_date, dtype: datetime64[ns]

## Exercise 17: Add interval with Timedelta

The TimeDelta function supports the following units:
* W and w represent a week
* D and d represent a day
* H and h represent an hour
* T and t represent a minute
* S and s represent a second
* L and l represent a millisecond
* U and u represent a microsecond
* N and n represent a nanosecond

In [53]:
df_sales["sold_date"] + pd.Timedelta(value=10, unit="D")

0   2021-07-25
1   2022-01-22
2   2021-09-24
3   2022-02-15
4   2022-01-09
Name: sold_date, dtype: datetime64[ns]

In [54]:
df_sales["purchase_date"] + pd.Timedelta(value=5, unit="H")

0   2021-05-31 05:30:00
1   2021-06-30 12:30:00
2   2021-07-30 18:30:00
3   2021-08-31 18:30:00
4   2021-09-28 12:30:00
Name: purchase_date, dtype: datetime64[ns]

## Exercise 18: The `to_datetime()` function

* The to_datetime function can be used for converting a 
    - Scalar, 
    - Pandas Series, or 
    - Pandas DataFrame to a datetime object.

In [55]:
sample_df = pd.DataFrame({"year": [2021, 2022],
                          "month": [12, 4],
                          "day": [28, 19]})
sample_df

Unnamed: 0,year,month,day
0,2021,12,28
1,2022,4,19


In [56]:
# Convert Dataframe

sample_df["date"] = pd.to_datetime(sample_df)

In [57]:
sample_df

Unnamed: 0,year,month,day,date
0,2021,12,28,2021-12-28
1,2022,4,19,2022-04-19


In [58]:
# Convert string

pd.to_datetime("2022-11-28")

Timestamp('2022-11-28 00:00:00')

In [59]:
new_date = pd.to_datetime("2022-11-28")

sample_df["new_date"] = new_date

sample_df

Unnamed: 0,year,month,day,date,new_date
0,2021,12,28,2021-12-28,2022-11-28
1,2022,4,19,2022-04-19,2022-11-28


## Exercise 19: The `date_range()` function

In [60]:
pd.date_range(start="2021-12-28", periods=5, freq="D")

DatetimeIndex(['2021-12-28', '2021-12-29', '2021-12-30', '2021-12-31',
               '2022-01-01'],
              dtype='datetime64[ns]', freq='D')

In [61]:
df_dateR = pd.DataFrame({"Date": pd.date_range(start="2022-11-28", periods=5, freq="D"),
                         "Measurement": [1, 10, 25, 7, 12]})

df_dateR

Unnamed: 0,Date,Measurement
0,2022-11-28,1
1,2022-11-29,10
2,2022-11-30,25
3,2022-12-01,7
4,2022-12-02,12


In [62]:
pd.date_range(start="2022-11-28", end="2022-12-10", freq="D")

DatetimeIndex(['2022-11-28', '2022-11-29', '2022-11-30', '2022-12-01',
               '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05',
               '2022-12-06', '2022-12-07', '2022-12-08', '2022-12-09',
               '2022-12-10'],
              dtype='datetime64[ns]', freq='D')

In [63]:
pd.date_range(start="2022-11-28", end="2022-12-10", freq="2D")

DatetimeIndex(['2022-11-28', '2022-11-30', '2022-12-02', '2022-12-04',
               '2022-12-06', '2022-12-08', '2022-12-10'],
              dtype='datetime64[ns]', freq='2D')

# Filter Pandas DataFrame By Time

### How to filter by time across the different axis.

examine how to filter a Pandas DataFrame by time using the .between_time() , .at_time() and .loc methods in Pandas

In [6]:
import pandas as pd
import numpy as np

In [4]:
ts = pd.date_range('2022-03-04', periods=10, freq='12h20min')

In [7]:
df_row = pd.DataFrame({'ts': ts, 'qty': [np.random.randint(10, 100) for i in range(10)]})
df_row

Unnamed: 0,ts,qty
0,2022-03-04 00:00:00,20
1,2022-03-04 12:20:00,18
2,2022-03-05 00:40:00,80
3,2022-03-05 13:00:00,51
4,2022-03-06 01:20:00,71
5,2022-03-06 13:40:00,43
6,2022-03-07 02:00:00,98
7,2022-03-07 14:20:00,20
8,2022-03-08 02:40:00,88
9,2022-03-08 15:00:00,47


In [8]:
df_col = pd.DataFrame(np.random.randint(0,100,size=(5, 10)), columns = ts)
df_col

Unnamed: 0,2022-03-04 00:00:00,2022-03-04 12:20:00,2022-03-05 00:40:00,2022-03-05 13:00:00,2022-03-06 01:20:00,2022-03-06 13:40:00,2022-03-07 02:00:00,2022-03-07 14:20:00,2022-03-08 02:40:00,2022-03-08 15:00:00
0,34,17,21,70,6,61,23,49,20,23
1,16,7,73,47,47,4,16,17,60,60
2,1,20,88,39,5,97,3,4,35,90
3,49,11,19,47,69,85,92,46,76,43
4,56,56,58,64,94,38,96,39,34,90


## Exercise 20: Between Time

`.between_time()` is a Pandas DataFrame method that filters for rows in a Pandas DataFrame between a start and end time.
- `.between_time()` only filter for time regardless of the date.

Parameters are:
- start_time: datetime.time or str
- end_time : datetime.time or str
    - start_time and end_time argument excepts both 24-hour and 12-hour time format. 
    - start_time and end_time are inclusive by default. 
        - We can set the left and right bound as open or closed by defining the inclusive parameter.
- inclusive : {“both”, “neither”, “left”, “right”}, default “both”. 
    - Include boundaries; whether to set each bound as closed or open.
- axis: {0 or ‘index’, 1 or ‘columns’}, default 0.

This method is used to filter a DateTimeIndex therefore we must ensure that the ts column is set as the index using the `set_index()` method. 

### By Row

In [9]:
# 24 hour format
df_row.set_index('ts').between_time('14:20', '16:00').reset_index()

Unnamed: 0,ts,qty
0,2022-03-07 14:20:00,20
1,2022-03-08 15:00:00,47


In [10]:
# 12 hour format
df_row.set_index('ts').between_time('2:20PM', '4:00PM').reset_index()

Unnamed: 0,ts,qty
0,2022-03-07 14:20:00,20
1,2022-03-08 15:00:00,47


In [11]:
df_row.set_index('ts').between_time('2:20PM', '4:00PM', inclusive = 'right').reset_index()

#  row with time 14:20:00 is removed as the left bound is not included.

TypeError: between_time() got an unexpected keyword argument 'inclusive'

If we wish to get all the times that are not between the start_time and end_time we simply need to set the start_time to be later than the end_time.

In [12]:
# swap the start_time and end_time
df_row.set_index('ts').between_time('16:00', '14:20').reset_index()

Unnamed: 0,ts,qty
0,2022-03-04 00:00:00,20
1,2022-03-04 12:20:00,18
2,2022-03-05 00:40:00,80
3,2022-03-05 13:00:00,51
4,2022-03-06 01:20:00,71
5,2022-03-06 13:40:00,43
6,2022-03-07 02:00:00,98
7,2022-03-07 14:20:00,20
8,2022-03-08 02:40:00,88


### By Column

`.between_time()` also allow us to filter a DataFrame by time across columns. We can do so by defining the axis parameter as 1

In [13]:
df_col.between_time('14:20', '16:00', axis = 1)

Unnamed: 0,2022-03-07 14:20:00,2022-03-08 15:00:00
0,49,23
1,17,60
2,4,90
3,46,43
4,39,90


## Exercise: At Time
`.at_time()` is a Pandas DataFrame method that selects rows with the exact time instead of a range of time. 

Parameters are:

- time: timedatetime.time or str
- axis: {0 or ‘index’, 1 or ‘columns’}, default 0

This method is used to filter a DateTimeIndex therefore we must ensure that the ts column is set as the index by using the `set_index()` method. 

The time the parameter accepts both 24-hour and 12-hour time formats. 

Note that `.at_time() `only filter for time regardless of the date.

### By row:

In [None]:
# 24 hour format
df_row.set_index('ts').at_time('02:40').reset_index()

In [None]:
# 12 hour format
df_row.set_index('ts').at_time('2:40 AM').reset_index()

### By Columns:

`.at_time()` also allow us to filter a DataFrame by time across columns instead of rows. We can do so by simply defining the axis parameter as 1.

In [None]:
df_col.at_time('02:40', axis = 1)

# Filter between date & time

shows how to filter between two timestamps, taking into consideration both date and time e.g. between 2022-03-04 12:00 and 2022-03-06 15:00 . 

Let's define the start and end datetime as datetime.datetime type.

In [14]:
from datetime import datetime
start_datetime = datetime.strptime('2022-03-04 12:00:00', '%Y-%m-%d %H:%M:%S')
end_datetime = datetime.strptime('2022-03-06 15:00:00', '%Y-%m-%d %H:%M:%S')

## Exercise 1: To filter for rows

In [None]:
df_row.loc[(df_row['ts'] >= start_datetime) & (df_row['ts'] <= end_datetime)]

## Exercise 2: To filter for columns

In [None]:
df_col.loc[:, [i for i in df_col.columns if i >= start_datetime and i <= end_datetime]]