# Daily availability data analysis

This notebook shows how many beds were released for a given hut and booking date. It is based on availability data collected once a day (before 2025-10-25) and once an hour (after 2025-10-25). The moment at which the availability data was collected is referred to as _fetch date_.

The final output of this notebook is a CSV file that shows the total number of beds released for each hut and booking date.

In [1]:
from pathlib import Path
from datetime import datetime, timedelta

import numpy as np
import pandas as pd

In [2]:
YEAR = "2025" # booking season
DATA_DIR = Path(YEAR) # availability data

In [3]:
def concat_all_dailies():
    dfs = []
    
    for loc in DATA_DIR.glob("*.csv"):
        df = pd.read_csv(loc)
        df = df.rename(columns={df.columns[0]: "name"})

        if len(loc.stem) == 10:
            format = "%Y-%m-%d"
        else:
            format = "%Y-%m-%d-%H"
        
        date = pd.to_datetime(loc.stem, format=format)
        df.insert(1, "fetch_date", date)  # Insert fetch date as the second column
        dfs.append(df)

    return pd.concat(dfs, ignore_index=True)

In [4]:
df = concat_all_dailies()
df.head()

Unnamed: 0,name,fetch_date,2025-06-01,2025-06-02,2025-06-03,2025-06-04,2025-06-05,2025-06-06,2025-06-07,2025-06-08,...,2025-09-19,2025-09-20,2025-09-21,2025-09-22,2025-09-23,2025-09-24,2025-09-25,2025-09-26,2025-09-27,2025-09-28
0,Chalet Les Méandres (ex Tupilak),2024-11-12 07:00:00,30,30,30,30,30,30,30,30,...,33,35,35,35,35,35,35,35,35,35
1,Gîte Michel Fagot,2024-11-12 07:00:00,0,0,0,0,0,0,23,25,...,27,25,27,27,27,27,27,27,0,0
2,Refuge du Fioux,2024-11-12 07:00:00,22,22,22,22,20,20,21,22,...,22,22,0,0,0,0,0,0,0,0
3,Auberge du Truc,2024-11-12 07:00:00,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,La Ferme à Piron,2024-11-12 07:00:00,0,0,0,0,0,0,0,0,...,14,8,8,10,10,10,10,0,0,0


## How many bookable beds have there been?
TMB huts are notoriously hard to book. 
There are several stages of the TMB where there are not enough huts for the number of people that traverse the tour.
Two examples are Les Chapieux (Auberge de la Nova, Les Chambres du Soleil, Refuge des Mottets) and Trient (Hôtel du Col de la Forclaz, Hôtel La Grande Ourse, Refuge Le Peuty, Auberge Mont-Blanc).
Can we use our daily availability data to show how many hard it really is? Specifically, how many beds could be booked in a given date range for a given hut and booking date?

Let's define this data problem.
- $H$: Set of huts.
- $B$: Set of booking dates (i.e., the hiking season).
- $D$: Set of availability fetching dates.
- $n^{hb}_d$: Number of available beds for hut $h$ for booking date $b$ accessed on date $d$.

Consider a hut $h$ and a booking date $b$.
Let $s$ denote the start date of interest and let $e$ denote the end date of interest with $s \le e \le b$.
Define $\Delta_{d}^{hb} = n_{d}^{hb} - n_{d-1}^{hb}$ as the change in the number of beds between two access dates; a positive value means that beds have become available.
Then $$\sum_{d=s}^{e} \max \{\Delta_d^{hb}, 0\}$$ calculates the number of bookable beds for the problem.


## Data preparation
We have to prepare the data first. 
Here we order the availablility data on name and date.

In [5]:
on_name_date = df.sort_values(['name', 'fetch_date'])
on_name_date.head()

Unnamed: 0,name,fetch_date,2025-06-01,2025-06-02,2025-06-03,2025-06-04,2025-06-05,2025-06-06,2025-06-07,2025-06-08,...,2025-09-19,2025-09-20,2025-09-21,2025-09-22,2025-09-23,2025-09-24,2025-09-25,2025-09-26,2025-09-27,2025-09-28
13154,Auberge Gîte Bon Abri,2024-09-04,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12785,Auberge Gîte Bon Abri,2024-09-05,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10694,Auberge Gîte Bon Abri,2024-09-06,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10899,Auberge Gîte Bon Abri,2024-09-07,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
15286,Auberge Gîte Bon Abri,2024-09-08,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


It's much easier to work with rows, so we transform the booking date columns into rows.

In [6]:
booking_dates = [c for c in on_name_date.columns if c not in ["name", "fetch_date"]]
melted = pd.melt(on_name_date, id_vars=["name", "fetch_date"], value_vars=booking_dates, var_name="booking_date")
melted.head()

Unnamed: 0,name,fetch_date,booking_date,value
0,Auberge Gîte Bon Abri,2024-09-04,2025-06-01,0
1,Auberge Gîte Bon Abri,2024-09-05,2025-06-01,0
2,Auberge Gîte Bon Abri,2024-09-06,2025-06-01,0
3,Auberge Gîte Bon Abri,2024-09-07,2025-06-01,0
4,Auberge Gîte Bon Abri,2024-09-08,2025-06-01,0


Grouping by name and booking date gives us a convenient overview of all fetched availability sorted on date.

In [7]:
by_name_booking = melted.groupby(["name", "booking_date"])

name = 'Refuge des Mottets'
date = f'{YEAR}-07-05'
by_name_booking.get_group((name, date)).head()

Unnamed: 0,name,fetch_date,booking_date,value
983892,Refuge des Mottets,2024-09-04,2025-07-05,0
983893,Refuge des Mottets,2024-09-05,2025-07-05,0
983894,Refuge des Mottets,2024-09-06,2025-07-05,0
983895,Refuge des Mottets,2024-09-07,2025-07-05,0
983896,Refuge des Mottets,2024-09-08,2025-07-05,0


Now we perform the following calculations on each group:
- Compute the differences between each two rows. For the first row, we assume that the previous row was zero.
- Sum all positive differences up to the booking date of the group.

In [8]:
def compute_booking_changes(group):
    name, booking_date = group.name
    booking_date = datetime.strptime(booking_date, "%Y-%m-%d")

    sub = group[group['fetch_date'].dt.date <= booking_date.date()]
    changes = np.diff(sub['value'], prepend=0)
    changes = pd.Series(index=sub['fetch_date'], data=changes)
    
    return changes[~changes.isna() & (changes > 0)] # return only positive changes

In [9]:
new_bookable = by_name_booking.apply(compute_booking_changes).reset_index()
new_bookable = new_bookable.rename(columns={0: "value"}).astype({"value": int})
new_bookable.head()

Unnamed: 0,name,booking_date,fetch_date,value
0,Auberge Gîte Bon Abri,2025-06-01,2024-09-27,57
1,Auberge Gîte Bon Abri,2025-06-02,2024-09-27,46
2,Auberge Gîte Bon Abri,2025-06-03,2024-09-27,42
3,Auberge Gîte Bon Abri,2025-06-04,2024-09-27,57
4,Auberge Gîte Bon Abri,2025-06-05,2024-09-27,57


The `new_bookable` dataframe shows for every hut and booking date combination when a new number of bookings have become available.

In [10]:
name = 'Refuge des Mottets'
date = f'{YEAR}-08-05'
groups = new_bookable.groupby(["name", "booking_date"])
groups.get_group((name, date))

Unnamed: 0,name,booking_date,fetch_date,value
4569,Refuge des Mottets,2025-08-05,2024-10-15 00:00:00,10
4570,Refuge des Mottets,2025-08-05,2024-11-10 09:00:00,6


This means that Refuge des Mottets, for booking date 2025-08-05, only this number of beds was released! You can use the above example to get for every hut and booking date the moments when new beds were released.

Let's now compute the _total_ number of bed releases.

In [11]:
total_bookable = groups["value"].sum()
total_bookable = total_bookable.reset_index()
df_wide = total_bookable.pivot(index='name', columns='booking_date', values='value')
df_wide.head(10)

booking_date,2025-06-01,2025-06-02,2025-06-03,2025-06-04,2025-06-05,2025-06-06,2025-06-07,2025-06-08,2025-06-09,2025-06-10,...,2025-09-19,2025-09-20,2025-09-21,2025-09-22,2025-09-23,2025-09-24,2025-09-25,2025-09-26,2025-09-27,2025-09-28
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Auberge Gîte Bon Abri,57.0,46.0,42.0,57.0,57.0,41.0,41.0,57.0,42.0,42.0,...,57.0,57.0,57.0,42.0,41.0,57.0,57.0,57.0,57.0,57.0
Auberge Mont-Blanc,22.0,22.0,22.0,22.0,22.0,22.0,22.0,14.0,10.0,22.0,...,26.0,22.0,18.0,22.0,22.0,22.0,22.0,22.0,,
Auberge des Glaciers,,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,...,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0,52.0
Auberge du Truc,,,,,,,,,,15.0,...,,,,,,,,,,
Auberge la Boërne,1.0,,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,...,33.0,33.0,33.0,33.0,21.0,33.0,19.0,33.0,33.0,33.0
Auberge-Refuge de la Nova,,,,,,35.0,61.0,61.0,35.0,33.0,...,33.0,49.0,49.0,48.0,47.0,61.0,61.0,31.0,,
Cabane du Combal,,,,,,,,,,,...,,,,,,,,,,
Chalet 'Le Dolent',,30.0,30.0,30.0,30.0,30.0,,,,,...,30.0,,,30.0,30.0,30.0,30.0,,,
Chalet La Grange,,,,,,,,,,,...,24.0,24.0,24.0,,,,,,,
Chalet Les Méandres (ex Tupilak),30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,...,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0


This table shows for every hut and booking date the number of beds that could have been booked (starting from the first fetch date). A NaN value means that there was no availability offered at all. Note that this could also be due to the frequency of data collected: we only collect once per day (or per hour, from 2025-10-25 onwards) and the beds may have been booked right away in that period. For example, Refuge Des Mottets/Cabane du Combal count much fewer beds than actually released because the beds were released and sold between two fetch moments.

In [12]:
out_loc = "../tmp/bookable.csv"
df_wide.to_csv(out_loc, encoding='utf-8')