---
title: "Pandas: Time-series on multiple columns (grouper)"
author: "Damien Martin"
date: "2024-11-04"
categories: [pandas, dates]
---

# Problem

In a [previous article](./grouper.ipynb) we looked at time-series, with a datetime index, and making it spaced out at a regular interval. In particular, ensuring that it kept "missing dates" (e.g. if counting sales, incorporated 0 sales for a week with no entries, rather than just skipping that week).

This article looks at how to take a regular datetime column and "standardize it", for example, making it the Monday at the beginning of the week.

For example, let's say that we wanted to make this product catalog have the week that the product was introduced:

In [1]:
#| echo: False
import pandas as pd 

example = pd.DataFrame([
    {'product': 'Great Gatsby Beans', 'introduced': '2024-01-05'},
    {'product': 'Chocoloate Frogs', 'introduced': '1896-07-05'},
    {'product': 'Whizz-bangs!', 'introduced': '1905-02-04'}
])
example['introduced'] = pd.to_datetime(example['introduced'])
example

Unnamed: 0,product,introduced
0,Great Gatsby Beans,2024-01-05
1,Chocoloate Frogs,1896-07-05
2,Whizz-bangs!,1905-02-04


That is, we want the earliest Monday before the date in question, like so:

In [5]:
#| echo: False
example['week_start'] = example['introduced'].dt.to_period('W-SUN').dt.start_time
example

Unnamed: 0,product,introduced,week_start
0,Great Gatsby Beans,2024-01-05,2024-01-01
1,Chocoloate Frogs,1896-07-05,1896-06-29
2,Whizz-bangs!,1905-02-04,1905-01-30


# Solution

One of the simplest solutions (modified from  [this SO post](https://stackoverflow.com/questions/27989120/get-week-start-date-monday-from-a-date-column-in-python-pandas)) is to convert to a period, and then use the start time of that period, as follows:
```
df['time_field'].dt.to_period('W-SUN').dt.start_time
```

**Note:**

The period syntax is a little confusing: it denotes when the week will _end_. So using a period of `W-SUN` means a week going from Monday to Sunday. Details in the [pandas documentation](https://pandas.pydata.org/docs/user_guide/timeseries.html#anchored-offsets).

The original stack overflow post used an apply, and used a different method to vectorize, but this solution is readable without the non-performant `apply`.

# Example

Taking the catalog from earlier:

In [3]:
catalog = pd.DataFrame([
    {'product': 'Great Gatsby Beans', 'introduced': '2024-01-05'},
    {'product': 'Chocoloate Frogs', 'introduced': '1896-07-05'},
    {'product': 'Whizz-bangs!', 'introduced': '1905-02-04'}
])
catalog['introduced'] = pd.to_datetime(catalog['introduced'])
catalog

Unnamed: 0,product,introduced
0,Great Gatsby Beans,2024-01-05
1,Chocoloate Frogs,1896-07-05
2,Whizz-bangs!,1905-02-04


We can make our new field

In [6]:
catalog['week_start'] = catalog['introduced'].dt.to_period('W-SUN').dt.start_time
catalog

Unnamed: 0,product,introduced,week_start
0,Great Gatsby Beans,2024-01-05,2024-01-01
1,Chocoloate Frogs,1896-07-05,1896-06-29
2,Whizz-bangs!,1905-02-04,1905-01-30
