# バイオフィルム構成菌種の絶対量推定 (Absolute Volume Estimation)

このノートブックでは、バイオフィルム全体の体積データ (Total Volume) と各菌種の構成割合データ (Species Distribution) を統合し、各菌種の**絶対量 (Absolute Volume)** の経時変化を推定・可視化します。

## 目的
相対的な割合（%）だけでなく、バイオフィルム全体の成長を加味した各菌種の絶対的な増減（バイオマスへの寄与）を理解する。

## データソース
1. **Total Volume**: `biofilm_boxplot_data.csv` (Median値を使用)
2. **Species Ratio**: `species_distribution_data.csv` (Median値を使用)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

# Set plot style with Seaborn
sns.set_theme(style="whitegrid")
%matplotlib inline

## 1. データの読み込み

In [None]:
# File paths
volume_file = 'biofilm_boxplot_data.csv'
species_file = 'species_distribution_data.csv'

# Load data
df_vol = pd.read_csv(volume_file)
df_sp = pd.read_csv(species_file)

print("Volume Data Head:")
print(df_vol.head())
print("\nSpecies Data Head:")
print(df_sp.head())

## 2. データの統合と絶対量の計算

計算式:
$$ \text{Absolute Volume}_{species} = \text{Total Volume} \times \frac{\text{Species Ratio}_{species}}{100} $$

In [None]:
# Rename columns for clarity before merging
df_vol_renamed = df_vol[['condition', 'cultivation', 'day', 'median']].rename(
    columns={'median': 'total_volume'}
)

df_sp_renamed = df_sp[['condition', 'cultivation', 'species', 'day', 'median']].rename(
    columns={'median': 'species_ratio'}
)

# Merge datasets
df_merged = pd.merge(df_sp_renamed, df_vol_renamed, on=['condition', 'cultivation', 'day'], how='left')

# Calculate Absolute Volume
df_merged['absolute_volume'] = df_merged['total_volume'] * (df_merged['species_ratio'] / 100.0)

# Display merged data
df_merged.head()

## 3. 可視化: 積み上げ面グラフ (Stacked Area Plot)

各条件（Commensal/Dysbiotic × Static/HOBIC）における菌種ごとの絶対量の推移を表示します。

In [None]:
# Define parameters for plotting
conditions = ['Commensal', 'Dysbiotic']
cultivations = ['Static', 'HOBIC']

# Color mapping (consistent with previous plots)
color_map = {
    'Blue': '#3498db',    # S. oralis
    'Green': '#2ecc71',   # A. naeslundii
    'Yellow': '#f1c40f',  # V. dispar
    'Orange': '#e67e22',  # V. parvula
    'Purple': '#9b59b6',  # F. nucleatum
    'Red': '#e74c3c'      # P. gingivalis
}

# Order of stacking (optional, but good for consistency)
species_order = ['Blue', 'Green', 'Yellow', 'Orange', 'Purple', 'Red']

fig, axes = plt.subplots(2, 2, figsize=(15, 12), sharex=True, sharey=True)

for i, cond in enumerate(conditions):
    for j, cult in enumerate(cultivations):
        ax = axes[i, j]
        
        # Filter data for current condition/cultivation
        subset = df_merged[(df_merged['condition'] == cond) & (df_merged['cultivation'] == cult)]
        
        if subset.empty:
            ax.text(0.5, 0.5, 'No Data', ha='center', va='center')
            continue
            
        # Pivot data for stackplot: index=day, columns=species, values=absolute_volume
        pivot_df = subset.pivot(index='day', columns='species', values='absolute_volume').fillna(0)
        
        # Reorder columns if they exist in the data
        present_species = [sp for sp in species_order if sp in pivot_df.columns]
        pivot_df = pivot_df[present_species]
        
        # Prepare data for stackplot
        days = pivot_df.index
        values = pivot_df.T.values
        colors = [color_map.get(sp, 'gray') for sp in present_species]
        labels = present_species
        
        # Plot Stacked Area
        ax.stackplot(days, values, labels=labels, colors=colors, alpha=0.8)
        
        ax.set_title(f'{cond} - {cult}', fontsize=14)
        ax.set_xlabel('Day')
        ax.set_ylabel('Absolute Volume (Arbitrary Unit)')

        # Special annotation for Dysbiotic HOBIC Red surge
        if cond == 'Dysbiotic' and cult == 'HOBIC' and 'Red' in present_species:
             # Find the max value to place text appropriately
             max_val = pivot_df.sum(axis=1).max()
             ax.annotate('Late Red Surge', xy=(21, max_val), xytext=(15, max_val*1.2),
                        arrowprops=dict(facecolor='black', shrink=0.05), fontsize=10)

# Add global legend
# For stackplot, we can keep the manual legend construction or use the first ax handles
# Manual is safer for consistency across subplots
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=color_map[sp], label=sp) for sp in species_order]
fig.legend(handles=legend_elements, loc='upper center', bbox_to_anchor=(0.5, 1.05), ncol=6, fontsize=12)

plt.tight_layout()
plt.show()

## 4. 考察

### Absolute Volume (絶対量) の視点からの洞察

1. **Dysbiotic HOBIC の特徴**:
   - 全体の体積（高さ）が他の条件に比べて著しく大きい（グラフのY軸のスケールに注目）。
   - Orange (V. parvula) が早期から大量に存在し、バイオフィルムの基礎を形成している。
   - Red (P. gingivalis) は割合（%）で見ると「遅れて急増」だが、絶対量で見ても後半に有意なバイオマスを形成していることが確認できる。

2. **Commensal vs Dysbiotic**:
   - Commensal では全体の体積増加は緩やか、または一定レベルで飽和する傾向がある。
   - Dysbiotic、特に HOBIC 条件下では、特定の菌種（Orange, Red）の相乗効果により、バイオフィルム全体が巨大化していることが示唆される。

## 5. 可視化: 折れ線グラフ (Line Plot using Seaborn)

積み上げ面グラフでは全体のボリューム感が分かりますが、個々の菌種の増減トレンドを詳細に追うために、単純な折れ線グラフでも可視化します。
Seabornを使用することで、より明瞭に種ごとの傾向を比較できます。

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(15, 12), sharex=True, sharey=True)

for i, cond in enumerate(conditions):
    for j, cult in enumerate(cultivations):
        ax = axes[i, j]
        
        # Filter data for current condition/cultivation
        subset = df_merged[(df_merged['condition'] == cond) & (df_merged['cultivation'] == cult)]
        
        if subset.empty:
            ax.text(0.5, 0.5, 'No Data', ha='center', va='center')
            continue
            
        # Plot using Seaborn
        # sns.lineplot automatically handles hue and style
        sns.lineplot(data=subset, x='day', y='absolute_volume', hue='species', 
                     palette=color_map, marker='o', linewidth=2, ax=ax, legend=False)
        
        ax.set_title(f'{cond} - {cult}', fontsize=14)
        ax.set_xlabel('Day')
        ax.set_ylabel('Absolute Volume (Arbitrary Unit)')
        
        # Special annotation for Dysbiotic HOBIC Red surge
        if cond == 'Dysbiotic' and cult == 'HOBIC':
             red_data = subset[subset['species'] == 'Red']
             if not red_data.empty:
                 last_point = red_data[red_data['day'] == 21]
                 if not last_point.empty:
                     y_val = last_point['absolute_volume'].values[0]
                     ax.annotate('Late Red Surge', xy=(21, y_val), xytext=(15, y_val + 0.1),
                                arrowprops=dict(facecolor='black', shrink=0.05), fontsize=10)

# Add global legend
# Create custom legend elements to match the manual color map, 
# ensuring the order matches 'species_order'
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color=color_map[sp], lw=2, marker='o', label=sp) for sp in species_order]
fig.legend(handles=legend_elements, loc='upper center', bbox_to_anchor=(0.5, 1.05), ncol=6, fontsize=12)

plt.tight_layout()
plt.show()