<h1 align="center">00 — Date Column Inspection</h1>


## Purpose
This notebook verifies the integrity and frequency of the time variable before any data processing begins.

## Objectives
1. Load the raw dataset from the `data/raw` directory.  
2. Parse the `datum` column as a datetime variable.  
3. Verify that all records are weekly and consistent (7-day intervals).  
4. Confirm that the date range is complete and covers the expected time frame (2014–2019).

## Expected Outcome
- The `datum` column should be correctly parsed as datetime.  
- The unique time difference between consecutive records should be exactly 7 days.  
- The dataset can therefore be safely treated as a **weekly time series** in later steps.


In [None]:
# 00_check_date_column.ipynb
# Purpose: Check if the date column is parsed correctly and verify the frequency of the records

import pandas as pd

# path
path = "../data/raw/pharma_sales.csv"

# csv read
df = pd.read_csv(path)

# date column -> datetime
df["datum"] = pd.to_datetime(df["datum"], errors="coerce")

# print first and last few dates
print("First 5 dates:", df["datum"].head().tolist())
print("Last 5 dates:", df["datum"].tail().tolist())

# check differences between consecutive dates
diffs = df["datum"].diff().dropna().unique()
print("Unique date differences (in days):", diffs)

First 5 dates: [Timestamp('2014-01-05 00:00:00'), Timestamp('2014-01-12 00:00:00'), Timestamp('2014-01-19 00:00:00'), Timestamp('2014-01-26 00:00:00'), Timestamp('2014-02-02 00:00:00')]
Last 5 dates: [Timestamp('2019-09-15 00:00:00'), Timestamp('2019-09-22 00:00:00'), Timestamp('2019-09-29 00:00:00'), Timestamp('2019-10-06 00:00:00'), Timestamp('2019-10-13 00:00:00')]
Unique date differences (in days): <TimedeltaArray>
['7 days']
Length: 1, dtype: timedelta64[ns]
