## Working on Time Series with Pandas


* Forecasting: Principles and Practice (3rd ed) [link](https://otexts.com/fpp3/)
* Pandas documentation for time series(https://pandas.pydata.org/docs/user_guide/timeseries.html)

In [None]:
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.dates as mdates
plt.style.use('dark_background')
%matplotlib inline

In [None]:
## Generate three timestamps starting from "2023-01-01" with frequency of "1 hr"

ts_index =
ts_index

Some of the commonly used `freq` tags

| Date Offset | Frequency String | Description        |
|-------------|------------------|--------------------|
| MonthEnd    | 'M'              | calendar month end |
| Day         | 'D'              | one absolute day   |
| Hour        | 'H'              | one hour           |
| Minute      | 'T' or 'min'     | one minute         |
| Second      | 'S'              | one second         |

*Check this for other supported "freq" tags:
https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects*



**Manipulating and converting date times with timezone information**

In [None]:
## Generate timestamps specific to timezones

# Time zone UTC
print(ts_index.tz_localize("UTC"))

# Time zone Asia/kokata
print(ts_index.tz_localize("Asia/kolkata"))

____
**Task - 1**
____

Your manager has given you a climate timeseries dataset with 1000 rows and asked you to analyse it but the dataset **does not have a timestamp column**. You are informed that each row of this dataset represents outputs from different sensors.

Your manager added that the observations were made starting from "*12th jan 2020*" and is **daily data**, meaning one obeservation recorded per day starting from `2020-01-12`.

* Create a datetime index in pandas starting from `12th jan 2020` with `1000` observatiions.

* Add timestamps as index to the dataset.



In [None]:
# Read climate data set
climate_data = pd.read_csv('https://tinyurl.com/mpbudws')
print(climate_data.head(5))

# Generate timestamp values
start_date =
data_length =
timestamp =

# Add timestamp column to the climate_data
climate_data['timestamps'] =
climate_data =
print(climate_data.head(5))

____

**Resampling a time series**

In [None]:
## Genrate hourly data with random values for 10 periods
idx =
df =
df.head()

To perform resampling, the daraframe index has to be in pandas DateTime format and we need to specify the aggregation fuction like `mean()`, `min()`, `max()`, etc. Refer to official documentation of pandas resample for more information. [pandas.DataFrame.resample](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html)

In [None]:
## Resample (in this case, downsample) the data to z frequency of "2 hours"
df.

____
**Task - 2**
____

From your initial analysis of the climate data, you came to know that there is only a slight variation in the data on a daily basis. It is redundant to have too many values with no extra information and might cause computational overhead. So, you decide to convert the daily data to weekly data. In other words, you want to resample the climate time series from `1 day` to  `1 week`. The resampled values must be the maximum observation over the respective weeks.


In [None]:
## Resample the climate time series from 1 day to 1 week
## with values as maximum over the week
climate_data_resampled =
climate_data_resampled.head()

**Performing date and time arithmetic with absolute or relative time increments**

In [None]:
date1 =

# add 2 days, 5 hours and 10 minutes to 1st date
date2 =

date2

____

**Task - 3**
____

Assume that the data you have was collected in India. You want to store this data in an internal database, and your database engineer tells you that their team can only work with `timestamps without time zone` (24  hour clock UTC). Achieve this by subtracting 5 hours and 30 minutes from all the values in the timestamp index.




In [None]:
climate_data.head()

In [None]:
climate_data.index =
climate_data.head()

____

### **Time Series Data Wrangling and Visualization**



Vizualising a subset of data using groupby and pivot. The data set consists of timestamps column `Date` (frequency is 1 day), `store` (store id), `product` (product id)  and `number_sold` (number of a product sold by a store).

In [None]:
## Data source: https://www.kaggle.com/datasets/samuelcortinhas/time-series-practice-dataset
sales_data = pd.read_csv('https://tinyurl.com/mr2rv4yh')
sales_data.head()

____

**Task - 3**
____


Convert 'date' column type to pandas.datetime and set date as index of the dataframe

In [None]:
# Convert 'date' column type to pandas.datetime
sales_data['Date'] =

# Set Date as index of the dataframe


sales_data.head()

In [None]:
## Resample and forward fill null values
grouping_cols = ['store', 'product']

def resample_fn(data):
  """Resamples the time series to frequency
  equal to 1 Week, using average of values
  over the week and forward fill the nan values
  if present.
  """
  return data.

## Resample the data belonging to each product of each store seperately
## and group the data by 'store' and 'product' by applying the fuction above
sales_data_processed =
sales_data_processed.head()

In [None]:
## Remove MultiIndex, only Date is required as Index
sales_data_processed = sales_data_processed.
sales_data_processed.head()

**Plot the sales of all the products from store id 0**

In [None]:
## Filter the data for store id equal to 0
store_id_0 =
store_id_0

In [None]:
## Modify the data such that index represents date and
## columns represent the product id

store_0_pivot =
store_0_pivot.head()

In [None]:
store_0_pivot.plot(figsize=(15,10))

Analyze how the sales of `product 1` of `store 0` have changed over the years

In [None]:
# Filter  product id equal to 1 from store_0
product_1_s0 =

In [None]:
# Create a column called year
product_1_s0['year'] =

# create a column called month
product_1_s0['month'] =

# create a DataFrame whose index represents months and columns respresent years
result_df =

result_df

In [None]:
result_df.plot(figsize=(10,8))