# Brioche example Jupyter notebook

This is an example of how Brioche can be used in Jupyter.  Similar code would work in Google Colaboratory, using the utility class methods that reads data from a Google Docs spreadsheet.

This example reads two excel files, one containing the PFT mappings, one containing a worksheet per site. It then performs the biomization analysis on each site and writes the results to an Excel file.

# Install brioche

In [1]:
# Install from source distribution. Hint: if editing the Brioche code, you must restart the kernel
# for the changes to take effect, but you do not have to rerun this step.
%pip install --editable ..

# Install from pypi
# %pip install brioche

Obtaining file:///C:/Users/peter.liljenberg/code/brioche
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: brioche
  Building editable for brioche (pyproject.toml): started
  Building editable for brioche (pyproject.toml): finished with status 'done'
  Created wheel for brioche: filename=brioche-1.0.0a0-py3-none-any.whl size=3804 sha256=ec38633887fd79ccab49892ebdd4a0df67460e1653d7eba007344408767245bb
  Stored in directory: C:\Users\peter.liljenberg\AppData\Local\Temp\pip-ephem-wheel-cache-plvl2os2\wheels\2c\9

# Setup

In [2]:
### Configuration

# The spreadsheet that contains one or more worksheets of pollen counts

# If the document only contains one index column (either age or depth)
# index_col = 0

# If the document contains two index columns (age and depth)
index_col = [0, 1]

# Path to the spreadsheet and worksheets that contains the taxa/pft/biome mapping
mapping_path = 'githumbi-2025/mapping.xlsx'
taxa_pft_worksheet_name = 'Taxa=PFT'
biome_pft_worksheet_name = 'PFT=BIOME'

# Path to the pollen sample spreadsheet
sample_path = 'githumbi-2025/sites.xlsx'

#bin_years = 1000
#bin_years = 2000
bin_years = 5000

# Bin the biome affinity scores and recalculate
def bin(x):
  depth, age = x
  return ((int(age) + bin_years // 2) // bin_years) * bin_years

# Name of the result document
result_file_name = f'Biomization result ({bin_years} bins).xlsx'

# Name of the trace document containing partial steps of the calculation
trace_file_name = f'Biomization trace ({bin_years} bins).xlsx'

# The threshold subtracted by the biomization to remove insignificant taxa
default_threshold = 0.5


# Load data

In [3]:
from brioche import BiomePftList, TaxaPftList, PollenCounts, Biomization, PollenPercentages
import pandas as pd
import openpyxl

pollen_document = openpyxl.open(sample_path, read_only=True)
biome_document = openpyxl.open(mapping_path, read_only=True)

biome_pft_mapping = BiomePftList.read_excel_sheet(biome_document[biome_pft_worksheet_name])
taxa_pft_mapping = TaxaPftList.read_excel_sheet(biome_document[taxa_pft_worksheet_name])

# Load Excel file with one site per worksheet
site_dfs = pd.read_excel(sample_path, sheet_name=None, index_col=index_col)

# Convert into pollen counts. If the data are percentages, use PollenPercentages instead.
sites = [PollenCounts(site_dfs[name], name) for name in sorted(site_dfs.keys())]

print('Data loaded from', sample_path)

Data loaded from githumbi-2025/sites.xlsx


# Perform biomization analysis

In [4]:
### Perform biome affinity analysis

biomization = Biomization(taxa_pft_mapping, biome_pft_mapping)

unmapped_taxas = biomization.get_unmapped_taxas(*sites)
if unmapped_taxas:
  print()
  print('WARNING: sample worksheets contain taxas that are not mapped to any biome')
  for t in unmapped_taxas:
    print(t)
  print()

percentages = [sample.get_percentages(decimals=2) for sample in sites]
stabilized = [perc.get_stabilized(default_threshold=default_threshold, decimals=2) for perc in percentages]
biomes = [biomization.get_biome_affinity(stab) for stab in stabilized]

binned_biomes = [biome.apply(lambda scores: scores.groupby(bin).mean()) for biome in biomes]

for biome in binned_biomes:
  print('Site:', biome.site)
  print(biome.biomes)
  print()


Nymphaea
Myriophyllum
Cyperaceae undiff.
Subularia monticola
Limosella
Cyperaceae
Alisma
Laurembergia
Typha
Laurembergia tetrandra
Hydrocotyle
Hydrocotyle ranunculoides
Nymphaeaceae
Callitriche
Ludwigia
Subularia

Site: MM002-Devadeva
0        Upper Montane Forest/cold temperate evergreen
5000     Upper Montane Forest/cold temperate evergreen
10000    Upper Montane Forest/cold temperate evergreen
15000    Upper Montane Forest/cold temperate evergreen
20000    Upper Montane Forest/cold temperate evergreen
25000    Upper Montane Forest/cold temperate evergreen
30000    Upper Montane Forest/cold temperate evergreen
35000    Upper Montane Forest/cold temperate evergreen
40000    Upper Montane Forest/cold temperate evergreen
45000    Upper Montane Forest/cold temperate evergreen
50000    Upper Montane Forest/cold temperate evergreen
55000    Upper Montane Forest/cold temperate evergreen
Name: Biome, dtype: object

Site: MM005-Kitumbako
0        Upper Montane Forest/cold temperate evergreen

# Store result

Saves an Excel spreadsheat with four sheets per site:
* Biome for each sample
* Affinity scores for each sample
* Binned biomes
* Binned affinity scores

In [5]:
import pandas as pd

with pd.ExcelWriter(result_file_name) as writer:
  for biome in biomes:
    biome.biomes.to_excel(writer, sheet_name=f'AB {biome.site}'[:31])
    biome.scores.to_excel(writer, sheet_name=f'AS {biome.site}'[:31])

  for biome in binned_biomes:
    biome.biomes.to_excel(writer, sheet_name=f'BB {biome.site}'[:31])
    biome.scores.to_excel(writer, sheet_name=f'BS {biome.site}'[:31])

print('Saved results to', result_file_name)

Saved results to Biomization result (5000 bins).xlsx


# Store traces (optional)

This step saves the pollen samples and stabilized values used to calculate the affinity score. It is mainly intended to investigate any issues in those steps.

In [6]:
with pd.ExcelWriter(trace_file_name) as writer:
  for perc in percentages:
    perc.samples.to_excel(writer, sheet_name=f'% {perc.site}'[:31])

  for stab in stabilized:
    stab.samples.to_excel(writer, sheet_name=f'S {perc.site}'[:31])

print('Saved trace to', trace_file_name)


Saved trace to Biomization trace (5000 bins).xlsx
