# ⚙️ 5.2 Workflow Automation

This notebook introduces workflow automation to streamline nutrition data processing.

**Objectives**:
- Create reusable data processing functions.
- Automate workflows with scripts.
- Apply automation to `large_food_log.csv`.

**Context**: Automation saves time in nutrition research, like processing large diet logs.

<details><summary>Fun Fact</summary>
Automation is like a hippo’s chef—preparing meals efficiently every day! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np

## Data Preparation

Load `large_food_log.csv`, a dataset of hippo meal nutrients.

In [None]:
# Load the dataset
df = fns.get_dataset('large_food_log')  # Path relative to notebook
print(f'Shape: {df.shape}')  # Display number of rows and columns

Shape: (500, 5)


## Automation Functions

Create a function to summarise nutrient amounts by meal.

In [3]:
# Define summary function
def summarize_nutrients(df, group_by='Meal'):
    """Summarize nutrient amounts by specified column.
    Args:
        df (DataFrame): Input data
        group_by (str): Column to group by
    Returns:
        DataFrame: Mean nutrient amounts
    """
    summary = df.groupby([group_by, 'Nutrient'])['Amount'].mean().unstack()
    return summary

# Apply function
summary = summarize_nutrients(df, 'Meal')
print(summary.head(4))  # Display first few rows

               Amount
Meal     Nutrient    
Breakfast Calcium    300.0
          Iron        2.5
          Protein    25.0
          Vitamin_D  11.0


## Exercise 1: Automate Workflow

Create a function to filter `df` for a specific `Nutrient` and summarise by `Date`. Apply it to Protein data. Document your code.

**Guidance**: Define a function with `df[df['Nutrient'] == nutrient]` and `groupby('Date')`.

**Answer**:

My automation code is...

## Conclusion

You’ve learned to automate nutrition data workflows with reusable functions.

**Next Steps**: Explore advanced Bayesian methods in 5.3.

**Resources**:
- [Pandas GroupBy](https://pandas.pydata.org/docs/user_guide/groupby.html)
- [Automation Guide](https://www.datacamp.com/community/tutorials/python-data-science-automation)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)