
---
# datetime dtype demonstration

author: F.Feenstra

Many datasets involve time-based information. Time series analysis, trend identification, and understanding temporal relationships are crucial in various fields. Pandas datetime functionality plays a vital role in efficiently handling time related data. 

## Introduction to Pandas Datetime Dtype

Pandas has the `datetime64` dtype, designed specifically for representing dates and times. This data type is not only efficient and flexible but also equipped with various methods that simplify working with datetime objects.


## Changing Objects to Datetime Objects
You can use the `pd.to_datetime()` method to change the datatype to a datetime data type. You need to specify the format for instance 
`pd.to_datetime(df['timestamp'], format="%m/%d/%Y")`

## About the data
To demonstrate the datetime object dtype we will use totalsteps per day data from a fitbit. Each record has a timestamp. the data contains several subjects measured ('id'). We will visualize the difference in weekday and weekend day per subject id.

---

In [None]:
import pandas as pd
import numpy as np
import holoviews as hv
import panel as pn
hv.extension('bokeh')
from bokeh.io import output_notebook
output_notebook()

#data source: https://www.kaggle.com/datasets/arashnic/fitbit
df = pd.read_csv('../data/dailyActivity_merged.csv')
display(df.info())
display(df.head())

---

The ActivityDate is an object type, not an datetime type. This needs to be change. we can use `pd.to_datetime` with the format month, day, year seperated by a slash `/`

---

In [None]:
df.ActivityDate = pd.to_datetime(df.ActivityDate, format="%m/%d/%Y")
df.head()

---
Let's set the ActivityDate to datetime and inspect the attributes and methods of the datetime dtype



In [None]:
df.set_index('ActivityDate', inplace=True)
#df.index?

We can use attributes like month, year to extract the month and year information from the datetime object 

In [None]:
print(f'this data set contains data of {len(set(df.index.month))} months; months {list(set(df.index.month))}')

In [None]:
print(f'this data set contains data of {len(set(df.index.year))} years; years {list(set(df.index.year))}')

In [None]:
df.index.day_name()

In [None]:
# Create a new column 'day_of_week' with the name of the day
df['day_of_week'] = df.index.day_name()

# Group by the day_of_week and calculate the mean of 'TotalSteps'
result = df.groupby('day_of_week')['TotalSteps'].mean()
result

---

## Create a date range
In pandas, `date_range` is a function used to generate a fixed frequency DatetimeIndex. This function is particularly useful when working with time series data or when you need to create a range of dates. The `date_range` function is part of the pandas library, and it allows you to create a sequence of dates at regular intervals.

Here's a basic explanation of the `date_range` function:

### Syntax:

`pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)`

### Parameters:

- **start**: The start date of the range.
- **end**: The end date of the range.
- **periods**: The number of periods (int), optional if `end` is specified.
- **freq**: The frequency of the data, specified as a string or DateOffset object. This parameter defines the step size between each date.
- **tz**: Time zone.
- **normalize**: If True, normalize the start and end dates to midnight.
- **name**: The name of the resulting DatetimeIndex.
- **closed**: Specify which side of the interval is closed. The interval can be closed on the 'right', 'left', 'both', or 'neither'.

The `freq` parameter in the `date_range` function of pandas specifies the frequency of the date range. It defines the step size between each date in the resulting DatetimeIndex. This parameter can be specified as a string or as a DateOffset object, and it plays a crucial role in determining how the date range is generated.

Here are some common frequency strings that you can use with the `freq` parameter:

- **'D'**: Calendar day frequency
- **'B'**: Business day frequency
- **'W'**: Weekly frequency
- **'M'**: Month end frequency
- **'Q'**: Quarter end frequency
- **'A'**: Year end frequency
- **'H'**: Hourly frequency
- **'T'** or **'min'**: Minutely frequency
- **'S'**: Secondly frequency

You can also combine these basic frequencies with numbers to specify multiples. For example, '2D' represents a 2-day frequency, '3H' represents a 3-hourly frequency, and so on. `date_range_combined = pd.date_range(start='2022-01-01', periods=5, freq='2D3H')`


We will use the `freq` parameter, to create a business days frequency series. Next we will use the `np.where` method to determine if a timestamp in the dataset is a weekday or a weekend day

In [None]:
# Generate a list of business days for two months of two months
business_days = pd.date_range(df.index.min(), periods=62, freq='B')
print(business_days[0:4])
df['day_type'] = np.where(df.index.isin(business_days), 'business day', 'weekend')
# Group by 'day_type' and calculate the mean of 'TotalSteps'
result = df.groupby('day_type')['TotalSteps'].mean()
result

Next we use the panel interact to select an id and display a boxplot of weekdays versus weekend days

---

In [None]:
def create_boxplot(id_value):
    selected_df = df[df['Id'] == id_value]
    boxplot = hv.BoxWhisker(selected_df, kdims='day_type', vdims='TotalSteps')
    return boxplot.opts(box_fill_color='day_type', cmap='Category10', box_line_color='black', width=400, height=300)

unique_ids = df['Id'].unique()

@pn.interact(ID=unique_ids)
def interactive_boxplot(ID):
    boxplot = create_boxplot(ID)
    return boxplot

pn.serve(interactive_boxplot)