## Handling TimeSeries Data

**Question: How to handle time series data?**  
**Answer:** pandas has great support for time series and has an extensive set of tools for working with dates, times, and time-indexed data.


#### Remember
> Valid date strings can be converted to datetime objects using `to_datetime` function or as part of read functions.  
> `pandas.Datetime` objects in pandas support calculations, logical operations and convenient date-related properties using the `dt` accessor like `year`, `month`, `day`, `day_of_week`, `day_of_year`, `is_leap_year`, `week`, etc...  
> We can also access `datetime` methods using `dt` accessor like `day_name()`, `month_name()`, etc...  
> `pandas.Timedelta` Represents a duration, the difference between two dates or times. Many properties of timedelta can be accessed using `dt` like `components`, `days`, `seconds`, etc...  
> We can also access `timedelta` methods using `dt` accessor like `total_seconds()`.

In [31]:
ord('0'),ord('1')

(48, 49)

In [21]:
print(ord('A'),ord('B'),ord('Z'))
print(ord('a'),ord('b'),ord('z'))

65 66 90
97 98 122


In [23]:
max("ramARabcnZz")

'z'

### Reading .csv File - Online Store Sales Data

In [33]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-12-23','10-12-23','11-12-23','12-12-23','13-12-23']})
sample

Unnamed: 0,id,date
0,1,9-12-23
1,2,10-12-23
2,3,11-12-23
3,4,12-12-23
4,5,13-12-23


In [7]:
sample['date'].min()

'10-12-23'

In [3]:
sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      5 non-null      int64 
 1   date    5 non-null      object
dtypes: int64(1), object(1)
memory usage: 212.0+ bytes


In [None]:
pd.to_datetime()

In [93]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-12-23','10-12-23','11-12-23','12-12-23','13-12-23']})
sample

Unnamed: 0,id,date
0,1,9-12-23
1,2,10-12-23
2,3,11-12-23
3,4,12-12-23
4,5,13-12-23


In [95]:
sample['date'] = pd.to_datetime(sample['date'],format="%d-%m-%y")

In [97]:
sample

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [57]:
sample['date']=sample['date'].apply(pd.to_datetime)

In [59]:
sample

Unnamed: 0,id,date
0,1,2023-09-12
1,2,2023-10-12
2,3,2023-11-12
3,4,2023-12-12
4,5,2023-12-13


In [61]:
sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   id      5 non-null      int64         
 1   date    5 non-null      datetime64[ns]
dtypes: datetime64[ns](1), int64(1)
memory usage: 212.0 bytes


In [101]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-12-2023','10-12-2023','11-12-2023','12-12-2023','13-12-2023']})
sample

Unnamed: 0,id,date
0,1,9-12-2023
1,2,10-12-2023
2,3,11-12-2023
3,4,12-12-2023
4,5,13-12-2023


In [105]:
sample['date'] = pd.to_datetime(sample['date'],format="%d-%m-%Y")
sample

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [120]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-Dec-2023','10-Dec-2023','11-Dec-2023','12-Dec-2023','13-Dec-2023']})
sample

Unnamed: 0,id,date
0,1,9-Dec-2023
1,2,10-Dec-2023
2,3,11-Dec-2023
3,4,12-Dec-2023
4,5,13-Dec-2023


In [124]:
sample['date'] =pd.to_datetime(sample['date'],format='%d-%b-%Y')
sample

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [128]:
sample['date'].max()

Timestamp('2023-12-13 00:00:00')

In [130]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-Dec-2023','10-Dec-2023','11-Dec-2023','12-Dec-2023','13-Dec-2023']})
sample

Unnamed: 0,id,date
0,1,9-Dec-2023
1,2,10-Dec-2023
2,3,11-Dec-2023
3,4,12-Dec-2023
4,5,13-Dec-2023


In [134]:
sample['date'] = pd.to_datetime(sample['date'],dayfirst=True)
sample

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [138]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-12-12','10-12-13','11-12-14','12-12-15','13-12-16']})
sample

Unnamed: 0,id,date
0,1,9-12-12
1,2,10-12-13
2,3,11-12-14
3,4,12-12-15
4,5,13-12-16


In [140]:
sample['date']=pd.to_datetime(sample['date'],format='%d-%m-%y')
sample

Unnamed: 0,id,date
0,1,2012-12-09
1,2,2013-12-10
2,3,2014-12-11
3,4,2015-12-12
4,5,2016-12-13


In [146]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['09-12-12','10-12-11','11-12-11','12-12-09','13-12-08']})
sample

Unnamed: 0,id,date
0,1,09-12-12
1,2,10-12-11
2,3,11-12-11
3,4,12-12-09
4,5,13-12-08


In [148]:
sample['date']=pd.to_datetime(sample['date'], format="%y-%d-%m")
sample

Unnamed: 0,id,date
0,1,2009-12-12
1,2,2010-11-12
2,3,2011-11-12
3,4,2012-09-12
4,5,2013-08-12


In [162]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['09-08-2012','10-12-2011','11-12-2011','12-12-2009','13-12-2008']})
sample

Unnamed: 0,id,date
0,1,09-08-2012
1,2,10-12-2011
2,3,11-12-2011
3,4,12-12-2009
4,5,13-12-2008


In [164]:
sample['date']=pd.to_datetime(sample['date'], dayfirst=True)
sample

Unnamed: 0,id,date
0,1,2012-08-09
1,2,2011-12-10
2,3,2011-12-11
3,4,2009-12-12
4,5,2008-12-13


In [166]:
import pandas as pd
sample=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['09-08-12','10-12-11','11-12-11','12-12-09','13-12-08']})
sample

Unnamed: 0,id,date
0,1,09-08-12
1,2,10-12-11
2,3,11-12-11
3,4,12-12-09
4,5,13-12-08


In [168]:
sample['date']=pd.to_datetime(sample['date'], yearfirst=True)
sample

  sample['date']=pd.to_datetime(sample['date'], yearfirst=True)


Unnamed: 0,id,date
0,1,2009-08-12
1,2,2010-12-11
2,3,2011-12-11
3,4,2012-12-09
4,5,2013-12-08


In [170]:
import pandas as pd
sample_3=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['9-12-23','10-12-23','11-12-23','12-12-23','13-12-23']})
sample_3

Unnamed: 0,id,date
0,1,9-12-23
1,2,10-12-23
2,3,11-12-23
3,4,12-12-23
4,5,13-12-23


In [176]:
sample_3['date']=pd.to_datetime(sample_3['date'], format="%d-%m-%y")  
sample_3

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [172]:
sample_3['date']=pd.to_datetime(sample_3['date'], format="%d/%m/%Y")  
sample_3

ValueError: time data "9-12-23" doesn't match format "%d/%m/%Y", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [80]:
sample_3['date']=pd.to_datetime(sample_3['date'], format="%d-%m/%y") 
sample_3

ValueError: time data '9-12-23' does not match format '%d/%m/%y' (match)

In [15]:
sample_3['date']=pd.to_datetime(sample_3['date'], format="%m-%d-%y") #format in - and well as small y is mentioned so it works fine.
sample_3

ValueError: time data '13-12-23' does not match format '%m-%d-%y' (match)

In [178]:
sample_3['date']=pd.to_datetime(sample_3['date'], format="%y-%m-%d") #format in - and well as small y is mentioned so it works fine.
sample_3

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [180]:
sample_3['date']=pd.to_datetime(sample_3['date'], yearfirst=True) #format in - and well as small y is mentioned so it works fine.
sample_3

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


**We have to read our format in the way which the dataset is giving the values**

In [29]:
import pandas as pd
sample_4=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['09-12-23','10-12-23','11-12-23','12-12-23','13-12-23']})
sample_4

Unnamed: 0,id,date
0,1,09-12-23
1,2,10-12-23
2,3,11-12-23
3,4,12-12-23
4,5,13-12-23


In [31]:
sample_4['date']=pd.to_datetime(sample_4['date'], format="%d-%m-%y") 
sample_4

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [195]:
import pandas as pd
sample_5=pd.DataFrame({'id':[1,2,3],
                      'date':['01-January-23','01-February-23','01-March-23']})
sample_5

Unnamed: 0,id,date
0,1,01-January-23
1,2,01-February-23
2,3,01-March-23


In [197]:
sample_5['date']=pd.to_datetime(sample_5['date'], format="%d-%B-%y") 
sample_5

Unnamed: 0,id,date
0,1,2023-01-01
1,2,2023-02-01
2,3,2023-03-01


In [207]:
import pandas as pd
sample_4=pd.DataFrame({'id':[1,2,3,4,5],
                      'date':['09/12/2023','10/12/2023','11/12/2023','12/12/2023','13/12/2023']})
sample_4

Unnamed: 0,id,date
0,1,09/12/2023
1,2,10/12/2023
2,3,11/12/2023
3,4,12/12/2023
4,5,13/12/2023


In [209]:
sample_4['date']=pd.to_datetime(sample_4['date'], format="%d/%m/%Y") 
sample_4

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [100]:
#above is always displaying in yyyy-mm-dd format so use strftime again to see them in strins

In [221]:
sample_4

Unnamed: 0,id,date
0,1,2023-12-09
1,2,2023-12-10
2,3,2023-12-11
3,4,2023-12-12
4,5,2023-12-13


In [229]:
sample_4['date'].dt.strftime("%Y;%b;%d")

0    2023;Dec;09
1    2023;Dec;10
2    2023;Dec;11
3    2023;Dec;12
4    2023;Dec;13
Name: date, dtype: object

In [231]:
sample_5

Unnamed: 0,id,date
0,1,2023-01-01
1,2,2023-02-01
2,3,2023-03-01


In [266]:
import pandas as pd
sample_5=pd.DataFrame({'id':[1,2,3,4,5,6,7],
                      'date':['07/10/2024','08/10/2024','09/10/2024','10/10/2024','11/10/2024','12/10/2024','13/10/2024']})
sample_5

Unnamed: 0,id,date
0,1,07/10/2024
1,2,08/10/2024
2,3,09/10/2024
3,4,10/10/2024
4,5,11/10/2024
5,6,12/10/2024
6,7,13/10/2024


In [268]:
sample_5['date']=pd.to_datetime(sample_5['date'],format='%d/%m/%Y')

In [270]:
sample_5

Unnamed: 0,id,date
0,1,2024-10-07
1,2,2024-10-08
2,3,2024-10-09
3,4,2024-10-10
4,5,2024-10-11
5,6,2024-10-12
6,7,2024-10-13


In [272]:
sample_5.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   id      7 non-null      int64         
 1   date    7 non-null      datetime64[ns]
dtypes: datetime64[ns](1), int64(1)
memory usage: 244.0 bytes


In [274]:
sample_5['dayname']= sample_5['date'].dt.day_name()
sample_5

Unnamed: 0,id,date,dayname
0,1,2024-10-07,Monday
1,2,2024-10-08,Tuesday
2,3,2024-10-09,Wednesday
3,4,2024-10-10,Thursday
4,5,2024-10-11,Friday
5,6,2024-10-12,Saturday
6,7,2024-10-13,Sunday


In [57]:
sample_5['day']=sample_5['date'].dt.day_name()

In [249]:
sample_5['month']=sample_5['date'].dt.month_name()

In [276]:
sample_5

Unnamed: 0,id,date,dayname
0,1,2024-10-07,Monday
1,2,2024-10-08,Tuesday
2,3,2024-10-09,Wednesday
3,4,2024-10-10,Thursday
4,5,2024-10-11,Friday
5,6,2024-10-12,Saturday
6,7,2024-10-13,Sunday


In [280]:
sample_5['date'].dt.weekday

0    0
1    1
2    2
3    3
4    4
5    5
6    6
Name: date, dtype: int32

In [290]:
d={
    1: 'Monday', 
    2: 'Tuesday', 
    3: 'Wednesday', 
    4: 'Thursday', 
    5: 'Friday',
    6: 'Saturday', 
    0: 'Sunday'
} 
sample_5['new']= sample_5['date'].dt.weekday.map(d)
sample_5

Unnamed: 0,id,date,dayname,new
0,1,2024-10-07,Monday,Sunday
1,2,2024-10-08,Tuesday,Monday
2,3,2024-10-09,Wednesday,Tuesday
3,4,2024-10-10,Thursday,Wednesday
4,5,2024-10-11,Friday,Thursday
5,6,2024-10-12,Saturday,Friday
6,7,2024-10-13,Sunday,Saturday


In [294]:
sample_5['date'].dt.dayofweek

0    0
1    1
2    2
3    3
4    4
5    5
6    6
Name: date, dtype: int32

In [296]:
sample_5['date']

0   2024-10-07
1   2024-10-08
2   2024-10-09
3   2024-10-10
4   2024-10-11
5   2024-10-12
6   2024-10-13
Name: date, dtype: datetime64[ns]

In [300]:
sample_5['date'].dt.is_leap_year

0    True
1    True
2    True
3    True
4    True
5    True
6    True
Name: date, dtype: bool

In [58]:
sample_5

Unnamed: 0,id,date,day
0,1,2023-01-01,Sunday
1,2,2023-02-01,Wednesday
2,3,2023-03-01,Wednesday


Directive	Meaning	Example


%a	Abbreviated weekday name.	Sun, Mon, ...

%A	Full weekday name.	Sunday, Monday, ...

%w	Weekday as a decimal number.	0, 1, ..., 6

%d	Day of the month as a zero-padded decimal.	01, 02, ..., 31

%b	Abbreviated month name.	Jan, Feb, ..., Dec

%B	Full month name.	January, February, ...

%m	Month as a zero-padded decimal number.	01, 02, ..., 12

%y	Year without century as a zero-padded decimal number.	00, 01, ..., 99

%Y	Year with century as a decimal number.	2013, 2019 etc.

%H	Hour (24-hour clock) as a zero-padded decimal number.	00, 01, ..., 23

%-H	Hour (24-hour clock) as a decimal number.	0, 1, ..., 23

%I	Hour (12-hour clock) as a zero-padded decimal number.	01, 02, ..., 12

%-I	Hour (12-hour clock) as a decimal number.	1, 2, ... 12

%p	Locale’s AM or PM.	AM, PM

%M	Minute as a zero-padded decimal number.	00, 01, ..., 59

%-M	Minute as a decimal number.	0, 1, ..., 59

%S	Second as a zero-padded decimal number.	00, 01, ..., 59

%-S	Second as a decimal number.	0, 1, ..., 59

%f	Microsecond as a decimal number, zero-padded on the left.	000000 - 999999

%z	UTC offset in the form +HHMM or -HHMM.

%Z	Time zone name.	 

%j	Day of the year as a zero-padded decimal number.	001, 002, ..., 366


%-j	Day of the year as a decimal number.	1, 2, ..., 366

%U	Week number of the year (Sunday as the first day of the week). All days in a new year preceding the first Sunday are considered to be in week 0.	00, 01, ..., 53

%W	Week number of the year (Monday as the first day of the week). All days in a new year preceding the first Monday are considered to be in week 0.	00, 01, ..., 53

%c	Locale’s appropriate date and time representation.	Mon Sep 30 07:06:05 2013

%x	Locale’s appropriate date representation.	09/30/13

%X	Locale’s appropriate time representation.	07:06:05

%%	A literal '%' character.	%


In [320]:
from datetime import datetime
datetime.today()

datetime.datetime(2024, 10, 16, 10, 35, 25, 126799)

In [322]:
from datetime import datetime
datetime.today().date()

datetime.date(2024, 10, 16)

In [330]:
datetime.today().day

16

In [326]:
datetime.today().month

10

In [328]:
datetime.today().year

2024

In [332]:
df= pd.DataFrame({'name':['x1','x2','x3'],
                 'DOB':['1-10-2000','1-10-1999','1-10-1997']})

In [334]:
df

Unnamed: 0,name,DOB
0,x1,1-10-2000
1,x2,1-10-1999
2,x3,1-10-1997


In [362]:
df

Unnamed: 0,name,DOB,age
0,x1,2000-10-01,24
1,x2,1999-10-01,25
2,x3,1997-10-01,27


In [364]:

df['year'] = df['DOB'].dt.year

In [366]:
df

Unnamed: 0,name,DOB,age,year
0,x1,2000-10-01,24,2000
1,x2,1999-10-01,25,1999
2,x3,1997-10-01,27,1997


In [370]:
#way-1 fiter the people who are born in 1999
df[df['year']==1999]

Unnamed: 0,name,DOB,age,year
1,x2,1999-10-01,25,1999


In [400]:
# way-2 fiter the people who are born in 1999
df= df.set_index(df['DOB'])
df.loc['1999']

Unnamed: 0_level_0,name,DOB,age,year
DOB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1999-10-01,x2,1999-10-01,25,1999


In [402]:
df

Unnamed: 0_level_0,name,DOB,age,year
DOB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000-10-01,x1,2000-10-01,24,2000
1999-10-01,x2,1999-10-01,25,1999
1997-10-01,x3,1997-10-01,27,1997


In [410]:
df.reset_index(drop=True,inplace=True)
df

Unnamed: 0,name,DOB,age,year
0,x1,2000-10-01,24,2000
1,x2,1999-10-01,25,1999
2,x3,1997-10-01,27,1997


In [342]:
df['DOB']= pd.to_datetime(df['DOB'],format="%d-%m-%Y")

In [344]:
datetime.today().year

2024

In [346]:
df['DOB'].dt.year

0    2000
1    1999
2    1997
Name: DOB, dtype: int32

In [352]:
df['age']= datetime.today().year-df['DOB'].dt.year
df

Unnamed: 0,name,DOB,age
0,x1,2000-10-01,24
1,x2,1999-10-01,25
2,x3,1997-10-01,27


In [356]:
df.loc[df['age']==df['age'].max(),'name']

2    x3
Name: name, dtype: object

In [358]:
import numpy as np
import pandas as pd

In [360]:
df = pd.read_csv(r'C:\Users\LENOVO\Desktop\DA_NOTES\Machine_Learning_and_Deep_Learning-master\Module 2 - Python for Data Analysis\04. Pandas Reporting - Going Beyond Basics\data\online_store_sales.csv')

df.head(5)

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\LENOVO\\Desktop\\DA_NOTES\\Machine_Learning_and_Deep_Learning-master\\Module 2 - Python for Data Analysis\\04. Pandas Reporting - Going Beyond Basics\\data\\online_store_sales.csv'

In [74]:
df['Sales'].max()

22638.48

In [79]:
df.loc[df['Sales']==df['Sales'].max()]

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
2697,2698,CA-2015-145317,18/03/2015,23/03/2015,Standard Class,SM-20320,Sean Miller,Home Office,United States,Jacksonville,Florida,32216.0,South,TEC-MA-10002412,Technology,Machines,Cisco TelePresence System EX90 Videoconferenci...,22638.48


In [73]:
df['Country'].unique()

array(['United States'], dtype=object)

In [66]:
df['Segment'].unique()

array(['Consumer', 'Corporate', 'Home Office'], dtype=object)

In [71]:
df.shape[0]

9800

In [68]:
df.count()

Row ID           9800
Order ID         9800
Order Date       9800
Ship Date        9800
Ship Mode        9800
Customer ID      9800
Customer Name    9800
Segment          9800
Country          9800
City             9800
State            9800
Postal Code      9789
Region           9800
Product ID       9800
Category         9800
Sub-Category     9800
Product Name     9800
Sales            9800
dtype: int64

In [152]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Row ID         9800 non-null   int64  
 1   Order ID       9800 non-null   object 
 2   Order Date     9800 non-null   object 
 3   Ship Date      9800 non-null   object 
 4   Ship Mode      9800 non-null   object 
 5   Customer ID    9800 non-null   object 
 6   Customer Name  9800 non-null   object 
 7   Segment        9800 non-null   object 
 8   Country        9800 non-null   object 
 9   City           9800 non-null   object 
 10  State          9800 non-null   object 
 11  Postal Code    9789 non-null   float64
 12  Region         9800 non-null   object 
 13  Product ID     9800 non-null   object 
 14  Category       9800 non-null   object 
 15  Sub-Category   9800 non-null   object 
 16  Product Name   9800 non-null   object 
 17  Sales          9800 non-null   float64
dtypes: float

**What comes to my mind immediately after looking at the dataset?**

> 1. What are the different customer segments?  
> 2. How many sales records do we have in the dataset?  
> 3. Which region recorded maximum sales count?  
> 4. What are the different product categories?  
> 5. What is the minimum order amount and maximum order amount?  
> 6. What is the revenue generated in the year 2017?  
> 7. Which customer contributed to the maximum revenue in 2017 and how much?  
> 8. Which product category is doing best? (revenue and count)  
> 9. Are there more orders placed on weekends?  
> 10. How many days on average it takes for the products to get shipped? 

Try to understand that as a data analyst, first we should be capable to ask right questions. Answering these questions can be done with the help of Pandas module. We will learn later how to answer each of these questions. For now let's understand how to create new columns derived from the existing columns.

In [141]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Row ID         9800 non-null   int64  
 1   Order ID       9800 non-null   object 
 2   Order Date     9800 non-null   object 
 3   Ship Date      9800 non-null   object 
 4   Ship Mode      9800 non-null   object 
 5   Customer ID    9800 non-null   object 
 6   Customer Name  9800 non-null   object 
 7   Segment        9800 non-null   object 
 8   Country        9800 non-null   object 
 9   City           9800 non-null   object 
 10  State          9800 non-null   object 
 11  Postal Code    9789 non-null   float64
 12  Region         9800 non-null   object 
 13  Product ID     9800 non-null   object 
 14  Category       9800 non-null   object 
 15  Sub-Category   9800 non-null   object 
 16  Product Name   9800 non-null   object 
 17  Sales          9800 non-null   float64
dtypes: float

### pd.to_datetime()

In [153]:
df[['Order Date']].apply(pd.to_datetime)

  df[['Order Date']].apply(pd.to_datetime)


Unnamed: 0,Order Date
0,2017-08-11
1,2017-08-11
2,2017-12-06
3,2016-11-10
4,2016-11-10
...,...
9795,2017-05-21
9796,2016-12-01
9797,2016-12-01
9798,2016-12-01


**These many warnings! Let's learn how to handle them? 🥵**  
These warnings are generated for a reason. Since dates can be specified in various formats, for eg: DD/MM/YYYY or YYYY/MM/DD or MM/DD/YYYY etc...  

Here pandas is generating these warnings to warn you to **specify a format(of how dates are stored in the datetime column)** so that you can prevent any Parsing error in future.  

There are two ways to get rid of these warnings:  
**Way 1** Add parameter `dayfirst=True`   
**Way 2** Add parameter `format="%d/%m/%Y"`

In [33]:
pd.to_datetime(df['Ship Date'], dayfirst=True)

0      2017-11-11
1      2017-11-11
2      2017-06-16
3      2016-10-18
4      2016-10-18
          ...    
9795   2017-05-28
9796   2016-01-17
9797   2016-01-17
9798   2016-01-17
9799   2016-01-17
Name: Ship Date, Length: 9800, dtype: datetime64[ns]

In [34]:
pd.to_datetime(df['Ship Date'], format="%d/%m/%Y")

0      2017-11-11
1      2017-11-11
2      2017-06-16
3      2016-10-18
4      2016-10-18
          ...    
9795   2017-05-28
9796   2016-01-17
9797   2016-01-17
9798   2016-01-17
9799   2016-01-17
Name: Ship Date, Length: 9800, dtype: datetime64[ns]

In [155]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Row ID         9800 non-null   int64  
 1   Order ID       9800 non-null   object 
 2   Order Date     9800 non-null   object 
 3   Ship Date      9800 non-null   object 
 4   Ship Mode      9800 non-null   object 
 5   Customer ID    9800 non-null   object 
 6   Customer Name  9800 non-null   object 
 7   Segment        9800 non-null   object 
 8   Country        9800 non-null   object 
 9   City           9800 non-null   object 
 10  State          9800 non-null   object 
 11  Postal Code    9789 non-null   float64
 12  Region         9800 non-null   object 
 13  Product ID     9800 non-null   object 
 14  Category       9800 non-null   object 
 15  Sub-Category   9800 non-null   object 
 16  Product Name   9800 non-null   object 
 17  Sales          9800 non-null   float64
dtypes: float

In [35]:
#changing the type of the original columns
df['Ship Date'] = pd.to_datetime(df['Ship Date'], format="%d/%m/%Y")
df['Order Date'] = pd.to_datetime(df['Order Date'], format="%d/%m/%Y")

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Row ID         9800 non-null   int64         
 1   Order ID       9800 non-null   object        
 2   Order Date     9800 non-null   datetime64[ns]
 3   Ship Date      9800 non-null   datetime64[ns]
 4   Ship Mode      9800 non-null   object        
 5   Customer ID    9800 non-null   object        
 6   Customer Name  9800 non-null   object        
 7   Segment        9800 non-null   object        
 8   Country        9800 non-null   object        
 9   City           9800 non-null   object        
 10  State          9800 non-null   object        
 11  Postal Code    9789 non-null   float64       
 12  Region         9800 non-null   object        
 13  Product ID     9800 non-null   object        
 14  Category       9800 non-null   object        
 15  Sub-Category   9800 n

Initially, the values in `Order Date` and `Ship Date` were character strings and do not provide any datetime operations (e.g. extract the year, day of the week,…). By applying the `to_datetime` function, pandas interprets the strings and convert these to datetime (i.e. `datetime64[ns, UTC]`) objects.  

**Important Note**  
As many data sets do contain datetime information in one of the columns, pandas input function like `pandas.read_csv()` and `pandas.read_json()` can do the transformation to dates when reading the data using the `parse_dates` parameter with a list of the columns to read as Timestamp:  
<code>pd.read_csv(PATH, parse_dates=["cols"])</code>

Remember, the warnings while parsing dates?  
You can fix those warnings by passing either one of the two parameters: `dayfirst=True` or `date_format`.

In [36]:
df = pd.read_csv(r'C:\Users\LENOVO\Desktop\DA_NOTES\Machine_Learning_and_Deep_Learning-master\Module 2 - Python for Data Analysis\04. Pandas Reporting - Going Beyond Basics\data\online_store_sales.csv', parse_dates=["Order Date", "Ship Date"], dayfirst=True)

df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


In [182]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Row ID         9800 non-null   int64         
 1   Order ID       9800 non-null   object        
 2   Order Date     9800 non-null   datetime64[ns]
 3   Ship Date      9800 non-null   datetime64[ns]
 4   Ship Mode      9800 non-null   object        
 5   Customer ID    9800 non-null   object        
 6   Customer Name  9800 non-null   object        
 7   Segment        9800 non-null   object        
 8   Country        9800 non-null   object        
 9   City           9800 non-null   object        
 10  State          9800 non-null   object        
 11  Postal Code    9789 non-null   float64       
 12  Region         9800 non-null   object        
 13  Product ID     9800 non-null   object        
 14  Category       9800 non-null   object        
 15  Sub-Category   9800 n

In [37]:
#spaces in column names so just try to emove them
col_names = [ col.strip().lower().replace(' ', '_').replace('-', '_') for col in df.columns ]

print(col_names)

df.columns=col_names

['row_id', 'order_id', 'order_date', 'ship_date', 'ship_mode', 'customer_id', 'customer_name', 'segment', 'country', 'city', 'state', 'postal_code', 'region', 'product_id', 'category', 'sub_category', 'product_name', 'sales']


In [38]:
df['order_date'].min()

Timestamp('2015-01-03 00:00:00')

In [43]:
print("Orders starting from", df['order_date'].min(), "till", df['order_date'].max())

Orders starting from 2015-01-03 00:00:00 till 2018-12-30 00:00:00


In [42]:
print("Orders starting from", df['order_date'].min().strftime("%d-%b-%Y"), "till", df['order_date'].max().strftime("%d-%b-%Y"))

Orders starting from 03-Jan-2015 till 30-Dec-2018


In [188]:
df['order_date'].max() - df['order_date'].min()

Timedelta('1457 days 00:00:00')

In [44]:
df

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.6200
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.3680
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9795,9796,CA-2017-125920,2017-05-21,2017-05-28,Standard Class,SH-19975,Sally Hughsby,Corporate,United States,Chicago,Illinois,60610.0,Central,OFF-BI-10003429,Office Supplies,Binders,"Cardinal HOLDit! Binder Insert Strips,Extra St...",3.7980
9796,9797,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,Office Supplies,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.3680
9797,9798,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.1880
9798,9799,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.3760


In [46]:
df['diff_del_days']=df['ship_date']-df['order_date']
df

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales,diff_del_days
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600,3 days
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400,3 days
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.6200,4 days
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.3680,7 days
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9795,9796,CA-2017-125920,2017-05-21,2017-05-28,Standard Class,SH-19975,Sally Hughsby,Corporate,United States,Chicago,Illinois,60610.0,Central,OFF-BI-10003429,Office Supplies,Binders,"Cardinal HOLDit! Binder Insert Strips,Extra St...",3.7980,7 days
9796,9797,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,Office Supplies,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.3680,5 days
9797,9798,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.1880,5 days
9798,9799,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.3760,5 days


In [52]:
df['order_date'].dt.month_name()

0       November
1       November
2           June
3        October
4        October
          ...   
9795         May
9796     January
9797     January
9798     January
9799     January
Name: order_date, Length: 9800, dtype: object

### Creating a Column containing only the Order Month
By using `Timestamp` objects for dates, a lot of time-related properties are provided by pandas. For example the `month`, but also `year`, `quarter`,… All of these properties are accessible by the dt accessor like `year`, `month`, `day`, `day_of_week`, `day_of_year`, `is_leap_year`, `week`, etc. We can also access methods using `dt` accessor like `day_name()`, `month_name()`, etc. 

In [53]:
df['order_month_number'] = df['order_date'].dt.month

df.head()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10


### Calculating Delivery Time from Order Date and Ship Date

In [54]:
df['delivery_time'] = df['ship_date'] - df['order_date']

df.head()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,...,postal_code,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days


In [55]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 21 columns):
 #   Column              Non-Null Count  Dtype          
---  ------              --------------  -----          
 0   row_id              9800 non-null   int64          
 1   order_id            9800 non-null   object         
 2   order_date          9800 non-null   datetime64[ns] 
 3   ship_date           9800 non-null   datetime64[ns] 
 4   ship_mode           9800 non-null   object         
 5   customer_id         9800 non-null   object         
 6   customer_name       9800 non-null   object         
 7   segment             9800 non-null   object         
 8   country             9800 non-null   object         
 9   city                9800 non-null   object         
 10  state               9800 non-null   object         
 11  postal_code         9789 non-null   float64        
 12  region              9800 non-null   object         
 13  product_id          9800 non-null

### pandas.Timedelta 😱

Observe the data-type `timedelta64[ns]`. It is nothing but a difference between two dates or times.

### Creating a Column containing Delivery Time in Number of Days

`pandas.Timedelta` represents a duration, the difference between two dates or times. Many properties of timedelta can be accessed using `dt` like `components`, `days`, `seconds`, etc. We can also access `timedelta` methods using `dt` accessor like `total_seconds()`.

In [56]:
df.head()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,...,postal_code,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days


In [57]:
df['delivery_time_days'] = df['delivery_time'].dt.days

df.head()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days,3
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days,3
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days,4
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days,7
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days,7


In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 21 columns):
 #   Column              Non-Null Count  Dtype          
---  ------              --------------  -----          
 0   row_id              9800 non-null   int64          
 1   order_id            9800 non-null   object         
 2   order_date          9800 non-null   datetime64[ns] 
 3   ship_date           9800 non-null   datetime64[ns] 
 4   ship_mode           9800 non-null   object         
 5   customer_id         9800 non-null   object         
 6   customer_name       9800 non-null   object         
 7   segment             9800 non-null   object         
 8   country             9800 non-null   object         
 9   city                9800 non-null   object         
 10  state               9800 non-null   object         
 11  postal_code         9789 non-null   float64        
 12  region              9800 non-null   object         
 13  product_id          9800 non-null

In [60]:
df['delivery_time'].dt.components

Unnamed: 0,days,hours,minutes,seconds,milliseconds,microseconds,nanoseconds
0,3,0,0,0,0,0,0
1,3,0,0,0,0,0,0
2,4,0,0,0,0,0,0
3,7,0,0,0,0,0,0
4,7,0,0,0,0,0,0
...,...,...,...,...,...,...,...
9795,7,0,0,0,0,0,0
9796,5,0,0,0,0,0,0
9797,5,0,0,0,0,0,0
9798,5,0,0,0,0,0,0


In [59]:
df['delivery_time'].dt.total_seconds()

0       259200.0
1       259200.0
2       345600.0
3       604800.0
4       604800.0
          ...   
9795    604800.0
9796    432000.0
9797    432000.0
9798    432000.0
9799    432000.0
Name: delivery_time, Length: 9800, dtype: float64

### Improve Performance by Setting Date Column as the Index
```python
df = df.set_index(['date'])

# Modifying the index inplace
df.set_index(['date'], inplace = True)
```

#### Select data with a specific year and perform aggregation
```python
# select data with a specific year
df.loc['2018']
# select data with a specific day
df.loc['2018-5-1']
# select data using slicing operation
df.loc['2018-5-1':'2018-5-5']
# Applying aggregation within a date slicing
df.loc['2018-5-1':'2018-5-5', ['sales']].mean()
```



In [89]:
import datetime
current = datetime.datetime.now().year
current

2024

In [61]:
df.head()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days,3
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days,3
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days,4
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days,7
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days,7


In [62]:
df = df.set_index(['order_date'])

df.head()

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-11-08,1,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,...,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days,3
2017-11-08,2,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,...,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days,3
2017-06-12,3,CA-2017-138688,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,...,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days,4
2016-10-11,4,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,...,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days,7
2016-10-11,5,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,...,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days,7


In [63]:
df.loc['2017-11-08']

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-11-08,1,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,...,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days,3
2017-11-08,2,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,...,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days,3


In [206]:
# Filter Rows based on a year

df.loc['2016']

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,order_month,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-10-11,4,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,...,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,10,10,7 days,7
2016-10-11,5,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,...,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.3680,10,10,7 days,7
2016-11-22,15,US-2016-118983,2016-11-26,Standard Class,HP-14815,Harold Pawlan,Home Office,United States,Fort Worth,Texas,...,Central,OFF-AP-10002311,Office Supplies,Appliances,Holmes Replacement Filter for HEPA Air Cleaner...,68.8100,11,11,4 days,4
2016-11-22,16,US-2016-118983,2016-11-26,Standard Class,HP-14815,Harold Pawlan,Home Office,United States,Fort Worth,Texas,...,Central,OFF-BI-10000756,Office Supplies,Binders,Storex DuraTech Recycled Plastic Frosted Binders,2.5440,11,11,4 days,4
2016-09-25,25,CA-2016-106320,2016-09-30,Standard Class,EB-13870,Emily Burns,Consumer,United States,Orem,Utah,...,West,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,1044.6300,9,9,5 days,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016-05-09,9786,CA-2016-155635,2016-05-13,Standard Class,ME-17725,Max Engle,Consumer,United States,Louisville,Kentucky,...,South,OFF-BI-10000962,Office Supplies,Binders,Acco Flexible ACCOHIDE Square Ring Data Binder...,48.8100,5,5,4 days,4
2016-01-12,9797,CA-2016-128608,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,...,East,OFF-AR-10001374,Office Supplies,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.3680,1,1,5 days,5
2016-01-12,9798,CA-2016-128608,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,...,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.1880,1,1,5 days,5
2016-01-12,9799,CA-2016-128608,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,...,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.3760,1,1,5 days,5


In [31]:
# Filter a specific date

df.loc['2016-09-25']

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales,order_month,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2016-09-25,25,CA-2016-106320,2016-09-30,Standard Class,EB-13870,Emily Burns,Consumer,United States,Orem,Utah,84057.0,West,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,1044.63,9,5 days,5
2016-09-25,1695,CA-2016-156335,2016-09-28,Second Class,PO-19195,Phillina Ober,Home Office,United States,Bayonne,New Jersey,7002.0,East,TEC-AC-10002006,Technology,Accessories,Memorex Micro Travel Drive 16 GB,63.96,9,3 days,3
2016-09-25,1696,CA-2016-156335,2016-09-28,Second Class,PO-19195,Phillina Ober,Home Office,United States,Bayonne,New Jersey,7002.0,East,OFF-BI-10003314,Office Supplies,Binders,Tuff Stuff Recycled Round Ring Binders,14.46,9,3 days,3
2016-09-25,1697,CA-2016-156335,2016-09-28,Second Class,PO-19195,Phillina Ober,Home Office,United States,Bayonne,New Jersey,7002.0,East,TEC-PH-10002726,Technology,Phones,netTALK DUO VoIP Telephone Service,104.98,9,3 days,3
2016-09-25,2447,CA-2016-100573,2016-10-01,Standard Class,AM-10705,Anne McFarland,Consumer,United States,Los Angeles,California,90004.0,West,OFF-EN-10000461,Office Supplies,Envelopes,"#10- 4 1/8"" x 9 1/2"" Recycled Envelopes",17.48,9,6 days,6
2016-09-25,5597,CA-2016-159779,2016-09-29,Standard Class,SB-20185,Sarah Brown,Consumer,United States,Concord,New Hampshire,3301.0,East,OFF-BI-10002735,Office Supplies,Binders,GBC Prestige Therm-A-Bind Covers,68.62,9,4 days,4
2016-09-25,5726,CA-2016-103933,2016-09-27,First Class,DR-12880,Dan Reichenbach,Corporate,United States,New York City,New York,10011.0,East,TEC-AC-10004171,Technology,Accessories,Razer Kraken 7.1 Surround Sound Over Ear USB G...,899.91,9,2 days,2
2016-09-25,6450,CA-2016-156510,2016-09-29,Standard Class,EH-13990,Erica Hackney,Consumer,United States,Meriden,Connecticut,6450.0,East,OFF-BI-10000822,Office Supplies,Binders,"Acco PRESSTEX Data Binder with Storage Hooks, ...",10.76,9,4 days,4
2016-09-25,6451,CA-2016-156510,2016-09-29,Standard Class,EH-13990,Erica Hackney,Consumer,United States,Meriden,Connecticut,6450.0,East,OFF-PA-10002222,Office Supplies,Paper,"Xerox Color Copier Paper, 11"" x 17"", Ream",45.68,9,4 days,4
2016-09-25,6452,CA-2016-156510,2016-09-29,Standard Class,EH-13990,Erica Hackney,Consumer,United States,Meriden,Connecticut,6450.0,East,OFF-AR-10004930,Office Supplies,Art,Turquoise Lead Holder with Pocket Clip,6.7,9,4 days,4


In [64]:
# Filter rows based on date slicing
#when order_Date was column this is the syntax
df.loc[df['order_date']>='2016-09-25' & df['order_date']<='2016-09-26' ]

# when order_date is indexes of the datset
df.loc['2016-09-25':'2016-09-26']

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-09-25,25,CA-2016-106320,2016-09-30,Standard Class,EB-13870,Emily Burns,Consumer,United States,Orem,Utah,...,West,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,1044.63,5 days,9,5 days,5
2016-09-26,281,US-2016-161991,2016-09-28,Second Class,SC-20725,Steven Cartwright,Consumer,United States,Houston,Texas,...,Central,OFF-BI-10004967,Office Supplies,Binders,Round Ring Binders,2.08,2 days,9,2 days,2
2016-09-26,282,US-2016-161991,2016-09-28,Second Class,SC-20725,Steven Cartwright,Consumer,United States,Houston,Texas,...,Central,TEC-PH-10001760,Technology,Phones,Bose SoundLink Bluetooth Speaker,1114.4,2 days,9,2 days,2
2016-09-26,284,CA-2016-130883,2016-10-02,Standard Class,TB-21520,Tracy Blumstein,Consumer,United States,Portland,Oregon,...,West,OFF-PA-10000474,Office Supplies,Paper,Easy-staple paper,141.76,6 days,9,6 days,6
2016-09-26,285,CA-2016-130883,2016-10-02,Standard Class,TB-21520,Tracy Blumstein,Consumer,United States,Portland,Oregon,...,West,TEC-AC-10001956,Technology,Accessories,Microsoft Arc Touch Mouse,239.8,6 days,9,6 days,6
2016-09-26,286,CA-2016-130883,2016-10-02,Standard Class,TB-21520,Tracy Blumstein,Consumer,United States,Portland,Oregon,...,West,OFF-PA-10004100,Office Supplies,Paper,Xerox 216,31.104,6 days,9,6 days,6
2016-09-26,1420,CA-2016-124800,2016-09-30,Standard Class,RW-19540,Rick Wilson,Corporate,United States,Mesa,Arizona,...,West,OFF-PA-10000501,Office Supplies,Paper,Petty Cash Envelope,86.272,4 days,9,4 days,4
2016-09-26,1421,CA-2016-124800,2016-09-30,Standard Class,RW-19540,Rick Wilson,Corporate,United States,Mesa,Arizona,...,West,OFF-BI-10000778,Office Supplies,Binders,GBC VeloBinder Electric Binding Machine,72.588,4 days,9,4 days,4
2016-09-26,1422,CA-2016-124800,2016-09-30,Standard Class,RW-19540,Rick Wilson,Corporate,United States,Mesa,Arizona,...,West,OFF-AP-10004980,Office Supplies,Appliances,3M Replacement Filter for Office Air Cleaner f...,60.672,4 days,9,4 days,4
2016-09-26,1423,CA-2016-124800,2016-09-30,Standard Class,RW-19540,Rick Wilson,Corporate,United States,Mesa,Arizona,...,West,OFF-BI-10003984,Office Supplies,Binders,Lock-Up Easel 'Spel-Binder',77.031,4 days,9,4 days,4


In [65]:
df.loc['2016-09-25':'2016-09-26', ['sales']]

Unnamed: 0_level_0,sales
order_date,Unnamed: 1_level_1
2016-09-25,1044.63
2016-09-26,2.08
2016-09-26,1114.4
2016-09-26,141.76
2016-09-26,239.8
2016-09-26,31.104
2016-09-26,86.272
2016-09-26,72.588
2016-09-26,60.672
2016-09-26,77.031


In [66]:
# Applying aggregation within a date slicing

print("Min Sales Amount:\n", df.loc['2016-09-25':'2016-09-26', ['sales']].min())
print("Max Sales Amount:\n", df.loc['2016-09-25':'2016-09-26', ['sales']].max())
print("Mean Sales Amount:\n", df.loc['2016-09-25':'2016-09-26', ['sales']].mean())
print("Spread Sales Amount:\n", df.loc['2016-09-25':'2016-09-26', ['sales']].std())

Min Sales Amount:
 sales    2.08
dtype: float64
Max Sales Amount:
 sales    1114.4
dtype: float64
Mean Sales Amount:
 sales    164.78122
dtype: float64
Spread Sales Amount:
 sales    272.046464
dtype: float64


### Sorting Data Based on Index vs Values and Resetting Index
```python
df.sort_index(ascending = False)
df.sort_values(by = 'sales')
df.reset_index()
```

In [67]:
df.head()

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-11-08,1,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,...,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days,3
2017-11-08,2,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,...,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days,3
2017-06-12,3,CA-2017-138688,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,...,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days,4
2016-10-11,4,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,...,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days,7
2016-10-11,5,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,...,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days,7


In [68]:
df.sort_index(ascending = False).head()

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-12-30,646,CA-2018-126221,2019-01-05,Standard Class,CC-12430,Chuck Clark,Home Office,United States,Columbus,Indiana,...,Central,OFF-AP-10002457,Office Supplies,Appliances,Eureka The Boss Plus 12-Amp Hard Box Upright V...,209.3,6 days,12,6 days,6
2018-12-30,5092,CA-2018-156720,2019-01-03,Standard Class,JM-15580,Jill Matthias,Consumer,United States,Loveland,Colorado,...,West,OFF-FA-10003472,Office Supplies,Fasteners,Bagged Rubber Bands,3.024,4 days,12,4 days,4
2018-12-30,909,CA-2018-143259,2019-01-03,Standard Class,PO-18865,Patrick O'Donnell,Consumer,United States,New York City,New York,...,East,OFF-BI-10003684,Office Supplies,Binders,Wilson Jones Legal Size Ring Binders,52.776,4 days,12,4 days,4
2018-12-30,908,CA-2018-143259,2019-01-03,Standard Class,PO-18865,Patrick O'Donnell,Consumer,United States,New York City,New York,...,East,TEC-PH-10004774,Technology,Phones,Gear Head AU3700S Headset,90.93,4 days,12,4 days,4
2018-12-30,907,CA-2018-143259,2019-01-03,Standard Class,PO-18865,Patrick O'Donnell,Consumer,United States,New York City,New York,...,East,FUR-BO-10003441,Furniture,Bookcases,"Bush Westfield Collection Bookcases, Fully Ass...",323.136,4 days,12,4 days,4


In [69]:
df.sort_values(by = 'sales').head()

Unnamed: 0_level_0,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
order_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-06-19,4102,US-2018-102288,2018-06-23,Standard Class,ZC-21910,Zuschuss Carroll,Consumer,United States,Houston,Texas,...,Central,OFF-AP-10002906,Office Supplies,Appliances,Hoover Replacement Belt for Commercial Guardsm...,0.444,4 days,6,4 days,4
2018-03-02,9293,CA-2018-124114,2018-03-02,Same Day,RS-19765,Roland Schwarz,Corporate,United States,Waco,Texas,...,Central,OFF-BI-10004022,Office Supplies,Binders,Acco Suede Grain Vinyl Round Ring Binder,0.556,0 days,3,0 days,0
2017-06-21,8659,CA-2017-168361,2017-06-25,Standard Class,KB-16600,Ken Brennan,Corporate,United States,Chicago,Illinois,...,Central,OFF-BI-10003727,Office Supplies,Binders,Avery Durable Slant Ring Binders With Label Ho...,0.836,4 days,6,4 days,4
2015-03-31,4712,CA-2015-112403,2015-03-31,Same Day,JO-15280,Jas O'Carroll,Consumer,United States,Philadelphia,Pennsylvania,...,East,OFF-BI-10003529,Office Supplies,Binders,Avery Round Ring Poly Binders,0.852,0 days,3,0 days,0
2015-09-26,2107,US-2015-152723,2015-09-26,Same Day,HG-14965,Henry Goldwyn,Corporate,United States,Mesquite,Texas,...,Central,OFF-BI-10003460,Office Supplies,Binders,Acco 3-Hole Punch,0.876,0 days,9,0 days,0


In [70]:
df = df.set_index(['order_date']) #from default indexes for 0,1,2,3... by using a column in dataset we are changing into row labels
df.reset_index() # from row labels we are changing back again into  default indexes for 0,1,2,3

Unnamed: 0,order_date,row_id,order_id,ship_date,ship_mode,customer_id,customer_name,segment,country,city,...,region,product_id,category,sub_category,product_name,sales,diff_del_days,order_month_number,delivery_time,delivery_time_days
0,2017-11-08,1,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,3 days,11,3 days,3
1,2017-11-08,2,CA-2017-152156,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3 days,11,3 days,3
2,2017-06-12,3,CA-2017-138688,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,4 days,6,4 days,4
3,2016-10-11,4,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,7 days,10,7 days,7
4,2016-10-11,5,US-2016-108966,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,7 days,10,7 days,7


In [215]:
df['ship_date']

order_date
2017-11-08   2017-11-11
2017-11-08   2017-11-11
2017-06-12   2017-06-16
2016-10-11   2016-10-18
2016-10-11   2016-10-18
                ...    
2017-05-21   2017-05-28
2016-01-12   2016-01-17
2016-01-12   2016-01-17
2016-01-12   2016-01-17
2016-01-12   2016-01-17
Name: ship_date, Length: 9800, dtype: datetime64[ns]

In [71]:
df['ship_date'].dt.dayofweek

order_date
2017-11-08    5
2017-11-08    5
2017-06-12    4
2016-10-11    1
2016-10-11    1
             ..
2017-05-21    6
2016-01-12    6
2016-01-12    6
2016-01-12    6
2016-01-12    6
Name: ship_date, Length: 9800, dtype: int64

In [73]:
dw_mapping={
    0: 'Monday', 
    1: 'Tuesday', 
    2: 'Wednesday', 
    3: 'Thursday', 
    4: 'Friday',
    5: 'Saturday', 
    6: 'temp'
} 
df['ship_date'].dt.weekday.map(dw_mapping)

order_date
2017-11-08    Saturday
2017-11-08    Saturday
2017-06-12      Friday
2016-10-11     Tuesday
2016-10-11     Tuesday
                ...   
2017-05-21        temp
2016-01-12        temp
2016-01-12        temp
2016-01-12        temp
2016-01-12        temp
Name: ship_date, Length: 9800, dtype: object

### Working with DateTime in Pandas

#### Get year, month, and day

```python
df['year']= df['DoB'].dt.year
df['month']= df['DoB'].dt.month
df['day']= df['DoB'].dt.day
```

#### Get the week of year, the day of week and leap year
```python
df['week_of_year'] = df['DoB'].dt.week
df['day_of_week'] = df['DoB'].dt.dayofweek
df['is_leap_year'] = df['DoB'].dt.is_leap_year

dw_mapping={
    0: 'Monday', 
    1: 'Tuesday', 
    2: 'Wednesday', 
    3: 'Thursday', 
    4: 'Friday',
    5: 'Saturday', 
    6: 'Sunday'
} 
df['day_of_week_name']=df['DoB'].dt.weekday.map(dw_mapping)