In [None]:
#importing libraries
import pandas as pd
import numpy as np

In [37]:
## Loading the TimeSeries dataset
data = pd.read_csv('Data\Superstore.csv')
data.head(5)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


In [5]:
#checking the data types of the columns
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Row ID         9800 non-null   int64  
 1   Order ID       9800 non-null   object 
 2   Order Date     9800 non-null   object 
 3   Ship Date      9800 non-null   object 
 4   Ship Mode      9800 non-null   object 
 5   Customer ID    9800 non-null   object 
 6   Customer Name  9800 non-null   object 
 7   Segment        9800 non-null   object 
 8   Country        9800 non-null   object 
 9   City           9800 non-null   object 
 10  State          9800 non-null   object 
 11  Postal Code    9789 non-null   float64
 12  Region         9800 non-null   object 
 13  Product ID     9800 non-null   object 
 14  Category       9800 non-null   object 
 15  Sub-Category   9800 non-null   object 
 16  Product Name   9800 non-null   object 
 17  Sales          9800 non-null   float64
dtypes: float

In [6]:
#checking for null values
data.describe()

Unnamed: 0,Row ID,Postal Code,Sales
count,9800.0,9789.0,9800.0
mean,4900.5,55273.322403,230.769059
std,2829.160653,32041.223413,626.651875
min,1.0,1040.0,0.444
25%,2450.75,23223.0,17.248
50%,4900.5,58103.0,54.49
75%,7350.25,90008.0,210.605
max,9800.0,99301.0,22638.48


## **Introduction to Time Series in Python**

Time series data refers to a sequence of data points indexed in time order. These observations are recorded at regular or irregular intervals over time. The defining characteristic of time series data is that **time is an essential and meaningful index** for the dataset.

In the context of business and analytics, time series analysis allows us to:

- **Monitor performance** over time (e.g., daily sales)
- **Forecast future outcomes** using historical data (e.g., sales forecasting)
- **Detect trends and seasonality**, which are critical for business planning
- **Identify anomalies** (e.g., sudden drops in revenue)

Time series analysis **requires proper date/time handling**, and in Python, this is achieved through native libraries like `datetime`, as well as powerful libraries such as **NumPy** and **Pandas**.

## Native Python Dates and Times — `datetime` module

Python’s built-in `datetime` module offers fundamental classes for date and time manipulation:
- `datetime.date`: Represents a calendar date (year, month, day)
- `datetime.time`: Represents a clock time (hour, minute, second)
- `datetime.datetime`: Combines date and time
- `datetime.timedelta`: Represents the difference between two datetime objects

This module is critical for **parsing raw date strings**, performing **date arithmetic**, and creating **time-aware operations** that later integrate well with Pandas.


In [28]:
from datetime import date, time, datetime, timedelta

# Creating a date object
date = date(2025, 5, 5)
print("Date:", date)

# Creating a time object
time = datetime.now().time()
print("Time:", time)

# Creating a full datetime object
dt = datetime.now()
print("Datetime:", dt)

# Adding days using timedelta
delta = timedelta(days=10)
print("10 days after:", dt + delta)
print("10 days before:", dt - delta)

# Difference between two dates
start = datetime(2025, 4, 20)
end = datetime(2025, 5, 5)
diff = end - start
print("Difference in days:", diff.days)

new_date = datetime.now()
old_date = datetime.now() - delta
print("Difference in days:", new_date-old_date)

Date: 2025-05-05
Time: 11:07:01.258593
Datetime: 2025-05-05 11:07:01.258593
10 days after: 2025-05-15 11:07:01.258593
10 days before: 2025-04-25 11:07:01.258593
Difference in days: 15
Difference in days: 10 days, 0:00:00


### **Key Takeaways**
- datetime is essential for precise date manipulations at the granular level.
- Not vectorized — best used for single or small-scale operations.
- Ideal for date cleaning, parsing, and feeding into Pandas for large-scale analysis.

## **NumPy's `datetime64` and `timedelta64` — Typed Arrays of Times**

Unlike Python’s standard `datetime`, NumPy’s `datetime64` allows us to perform **efficient, vectorized date operations**. This is critical when working with large datasets such as time-indexed sales data.

`datetime64` is `a data type for representing dates and times`. It stores information about a point in time, including its date and time components, often with sub-second precision.

### Why Use `datetime64`?
- Operates on entire arrays (not just individual dates)
- Enables fast arithmetic and comparisons
- Supports many granularities (Y, M, D, h, m, s, ms, ns)

For example, we can subtract an entire column of dates from another in milliseconds.

In the `Superstore` dataset, we can explore this by converting `Order Date` and `Ship Date` to `datetime64`, then compute shipping time.


In [8]:
# viewing the Order Date and Ship Date columns in the dataset
data[['Order Date', 'Ship Date']].head(10)

Unnamed: 0,Order Date,Ship Date
0,08/11/2017,11/11/2017
1,08/11/2017,11/11/2017
2,12/06/2017,16/06/2017
3,11/10/2016,18/10/2016
4,11/10/2016,18/10/2016
5,09/06/2015,14/06/2015
6,09/06/2015,14/06/2015
7,09/06/2015,14/06/2015
8,09/06/2015,14/06/2015
9,09/06/2015,14/06/2015


In [31]:
data[['Order Date', 'Ship Date']].dtypes

Order Date    datetime64[ns]
Ship Date     datetime64[ns]
dtype: object

In [34]:
# Convert date columns to datetime64
data['Order Date'] = pd.to_datetime(data['Order Date'], dayfirst=True) 
# By specifying dayfirst=True, we ensure that the day is interpreted correctly in case of ambiguous formats, and in this case, it is day/month/year.
# Alternatively, you can use pd.to_datetime(data['Order Date'], format='%d/%m/%Y') to specify the exact format.
# data['Order Date'] = pd.to_datetime(data['Order Date'], format='%d/%m/%Y')
data['Ship Date'] = pd.to_datetime(data['Ship Date'], dayfirst=True)

# Extract as NumPy arrays of datetime64
# .values returns the underlying data as a NumPy array.
# .astype('datetime64[D]') converts the data to a specific datetime format (in this case, just the date part).
# datetime64[ns] (nanoseconds) is overkill for day-based calculations i.e '2017-11-08T00:00:00.000000000', '2017-11-08T00:00:00.000000000'
# 'datetime64[D]' specifies NumPy’s datetime format with day precision. i.e '2017-11-08', '2017-11-08'
order_np = data['Order Date'].values.astype('datetime64[D]')
ship_np = data['Ship Date'].values.astype('datetime64[D]')

# Compute shipping time (in days) using timedelta64
shipping_time = ship_np - order_np
#Shipping_time
print("Shipping time (first 5 entries):", shipping_time[:5])

Shipping time (first 5 entries): [3 3 4 7 7]


In [11]:
# Add the shipping time to the DataFrame
data['Shipping Duration (Days)'] = shipping_time.astype('timedelta64[D]').astype(int)
data['Shipping Duration (Days)']

0       3
1       3
2       4
3       7
4       7
       ..
9795    7
9796    5
9797    5
9798    5
9799    5
Name: Shipping Duration (Days), Length: 9800, dtype: int32

In [35]:
data.head(5)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


### **`[D]` → Day Precision (Other Options Available)**
NumPy's `datetime64` supports different time units:

| Unit   | Meaning      | Example                      |
|--------|--------------|------------------------------|
| `[Y]`  | Year         | `2024`                       |
| `[M]`  | Month        | `2024-05`                    |
| `[D]`  | **Day**      | `2024-05-20`                 |
| `[h]`  | Hour         | `2024-05-20T14`              |
| `[m]`  | Minute       | `2024-05-20T14:30`           |
| `[s]`  | Second       | `2024-05-20T14:30:15`        |
| `[ms]` | Millisecond  | `2024-05-20T14:30:15.500`    |
| `[ns]` | Nanosecond   | `2024-05-20T14:30:15.500000000` |

**In a nutshell**

- Python’s `datetime` module gives us fine-grained control for parsing, formatting, and basic time arithmetic.
- NumPy’s `datetime64` and `timedelta64` provide **performance-oriented**, **vectorized** operations across large time arrays.
- In the Superstore dataset, these tools allow us to **analyze delivery efficiency**, calculate durations, and prepare time-indexed features — a critical step before moving to full **Pandas time series analysis**.

Below, we will explore how Pandas builds on top of these foundational tools to offer rich time series structures and capabilities.


## Dates and Times in Pandas

Pandas extends the functionality of Python’s `datetime` and NumPy’s `datetime64` by providing:

- High-level, intuitive tools to work with time series data  
- Built-in support for indexing, slicing, and aligning time-indexed data  
- Features like resampling, rolling windows, frequency conversion, and time-aware grouping

Pandas handles dates using its own data structures that wrap around `datetime64` and provide enhanced time-aware functionality.

In this section, we explore four key tools provided by Pandas:
1. `pd.to_datetime()` — for robust date parsing and conversion
2. `pd.DatetimeIndex` — for time-aware indexing
3. `pd.Timedelta` — for handling durations
4. `pd.date_range()` — for generating fixed-frequency time series


### 1. `pd.to_datetime()` — Parse and Convert Strings to Timestamps

`pd.to_datetime()` converts a wide variety of date formats (strings, numbers) into proper Pandas `datetime64[ns]` format. It is extremely robust and intelligent — capable of inferring formats and handling errors gracefully.

In the Superstore dataset, the columns `"Order Date"` and `"Ship Date"` are originally of type `object`. To enable time-series operations, we convert them.


In [38]:
# Convert 'Order Date' and 'Ship Date' columns to datetime
data['Order Date'] = pd.to_datetime(data['Order Date'], format='%d/%m/%Y')
data['Ship Date'] = pd.to_datetime(data['Ship Date'], format='%d/%m/%Y')

# Confirm the change
print(data[['Order Date', 'Ship Date']].dtypes)

Order Date    datetime64[ns]
Ship Date     datetime64[ns]
dtype: object


### 2. `pd.DatetimeIndex` — Time-Aware Indexing

Pandas uses `DatetimeIndex` to represent and manipulate datetime-based indices efficiently.

Once a `DatetimeIndex` is set, we unlock time-specific operations such as:
- Time slicing: `data['2018']`, `data['2020-03']`
- Resampling and rolling windows
- Date/time component extraction (e.g., month, weekday)

Let’s make "Order Date" the index and demonstrate slicing and indexing.

In [13]:
# Set 'Order Date' as the index
data_indexed = data.set_index('Order Date').sort_index()

# Confirm the index type
print(type(data_indexed.index))

data_indexed.head(10)

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>


Unnamed: 0_level_0,Row ID,Order ID,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Shipping Duration (Days)
Order Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2015-01-03,7981,CA-2015-103800,2015-01-07,Standard Class,DP-13000,Darren Powers,Consumer,United States,Houston,Texas,77095.0,Central,OFF-PA-10000174,Office Supplies,Paper,"Message Book, Wirebound, Four 5 1/2"" X 4"" Form...",16.448,4
2015-01-04,742,CA-2015-112326,2015-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,Illinois,60540.0,Central,OFF-BI-10004094,Office Supplies,Binders,GBC Standard Plastic Binding Systems Combs,3.54,4
2015-01-04,741,CA-2015-112326,2015-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,Illinois,60540.0,Central,OFF-ST-10002743,Office Supplies,Storage,SAFCO Boltless Steel Shelving,272.736,4
2015-01-04,740,CA-2015-112326,2015-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,Illinois,60540.0,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,4
2015-01-05,1760,CA-2015-141817,2015-01-12,Standard Class,MB-18085,Mick Brown,Consumer,United States,Philadelphia,Pennsylvania,19143.0,East,OFF-AR-10003478,Office Supplies,Art,Avery Hi-Liter EverBold Pen Style Fluorescent ...,19.536,7
2015-01-06,7479,CA-2015-167199,2015-01-10,Standard Class,ME-17320,Maria Etezadi,Home Office,United States,Henderson,Kentucky,42420.0,South,TEC-PH-10004539,Technology,Phones,Wireless Extenders zBoost YX545 SOHO Signal Bo...,755.96,4
2015-01-06,7478,CA-2015-167199,2015-01-10,Standard Class,ME-17320,Maria Etezadi,Home Office,United States,Henderson,Kentucky,42420.0,South,TEC-PH-10004977,Technology,Phones,GE 30524EE4,391.98,4
2015-01-06,5328,CA-2015-130813,2015-01-08,Second Class,LS-17230,Lycoris Saunders,Consumer,United States,Los Angeles,California,90049.0,West,OFF-PA-10002005,Office Supplies,Paper,Xerox 225,19.44,2
2015-01-06,7475,CA-2015-167199,2015-01-10,Standard Class,ME-17320,Maria Etezadi,Home Office,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10004063,Furniture,Chairs,Global Deluxe High-Back Manager's Chair,2573.82,4
2015-01-06,7481,CA-2015-167199,2015-01-10,Standard Class,ME-17320,Maria Etezadi,Home Office,United States,Henderson,Kentucky,42420.0,South,OFF-PA-10000955,Office Supplies,Paper,Southworth 25% Cotton Granite Paper & Envelopes,6.54,4


In [14]:
data["Order Date"]

0      2017-11-08
1      2017-11-08
2      2017-06-12
3      2016-10-11
4      2016-10-11
          ...    
9795   2017-05-21
9796   2016-01-12
9797   2016-01-12
9798   2016-01-12
9799   2016-01-12
Name: Order Date, Length: 9800, dtype: datetime64[ns]

In [None]:
# Slice data from January 2017

In [None]:
# Extract sales on a specific day
specific_day_sales = data_indexed.loc['2017-01-03']
print("Sales on 2017-01-03:\n", specific_day_sales[['Customer Name', 'Sales']])

Sales on 2017-01-03:
                 Customer Name    Sales
Order Date                            
2017-01-03      Bill Overfelt  1592.85
2017-01-03  Christine Abelman    30.08
2017-01-03       Lena Radford   114.46
2017-01-03  Christine Abelman   180.96
2017-01-03      Bill Overfelt    11.88
2017-01-03  Christine Abelman   165.60
