# 📊 4.1 Data Distributions and Visualisation

This notebook introduces visualisation techniques for nutrient data distributions, essential for nutrition research.

**Objectives**:
- Create histograms, boxplots, and violin plots.
- Interpret distribution characteristics (e.g., skewness, outliers).
- Apply visualizations to real-world nutrition data.

**Context**: Visualizing distributions helps identify patterns in datasets like `vitamin_trial.csv`.

<details><summary>Fun Fact</summary>
A good plot is like a hippo’s portrait—revealing and full of character! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print('Environment ready.')

## Data Preparation

Load `vitamin_trial.csv`, a simulated dataset of vitamin D levels from a clinical trial.

In [None]:
df = fns.get_dataset('vitamin_trial')
print(df.head(1))

   ID     Group  Vitamin_D  Time  Outcome
0  P1  Control     10.5     0  Normal


## Visualizing Distributions

Create a violin plot to compare vitamin D levels across trial groups.

In [3]:
plt.figure(figsize=(10, 5))
sns.violinplot(x='Group', y='Vitamin_D', data=df)
plt.title('Vitamin D Distribution by Treatment Group')
plt.xlabel('Trial Group')
plt.ylabel('Vitamin D (µg)')
plt.show()

## Exercise 1: Create a Boxplot

Generate a boxplot for the same data and describe any outliers in a Markdown cell.

**Guidance**: Use `sns.boxplot()` and check for extreme values.

**Answer**:

The boxplot shows...

## Conclusion

You’ve learned to visualize nutrient distributions using violin and boxplots. Next, explore EDA in 4.2.

**Resources**:
- [Seaborn Documentation](https://seaborn.pydata.org/)
- [Matplotlib Documentation](https://matplotlib.org/)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)