# 🌟 Using LLM APIs for Nutrition & Food Science: A Tasty AI Adventure! 🥑

Welcome to this scrumptious Jupyter Notebook on harnessing **Large Language Model (LLM) APIs** for nutrition and food science research! Whether you're nibbling at home 🍎 or cooking up ideas in a classroom, this guide will whisk you through using APIs from LLMs like **Grok**, **ChatGPT**, **Manus**, and more to tackle tasks like **parsing food diaries**, **reviewing literature**, **analyzing supply chains**, and **sensory analysis**! 🍴

We’ll use Python with a **free API** (Hugging Face) and explore other LLM APIs, comparing their strengths and weaknesses. Expect code, exercises, and hidden treats (click the "Details" to reveal them)! Let’s dive into the AI kitchen! 🚀

## 1. Introduction to LLM APIs in Nutrition & Food Science 📊

Nutrition and food science are like a buffet of data 🍽️—food diaries, research papers, supply chain logs, and sensory descriptions. LLM APIs let us tap into powerful language models to:

- **Parse food diaries**: Extract nutrients or dietary patterns.
- **Review literature**: Summarize nutrition studies.
- **Analyze supply chains**: Optimize logistics or detect issues.
- **Perform sensory analysis**: Interpret taste and texture descriptions.

We’ll use Python with `requests`, `pandas`, and a **free Hugging Face API** (plus others if you have access). No master chef skills needed—just curiosity! 😄

**Exercise 1**: Why might LLM APIs be ideal for parsing unstructured nutrition data (e.g., food diaries)? Jot down your thoughts (no code needed).

<details>
<summary>💡 Hint</summary>
Consider LLMs’ ability to understand context, handle varied text formats, and extract structured information from messy data.
</details>

Let's load the required libraries first.

In [None]:

# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

# Import libraries

%pip install huggingface_hub
%pip install transformers
%pip install torch

import requests
import json
import huggingface_hub
from datetime import datetime

import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from scipy.stats import f
import numpy.linalg as la


import requests
import torch

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

import textwrap


## 2. Overview of LLMs and Their APIs 🧠

Here’s a rundown of popular LLMs, their APIs, and their strengths and weaknesses for nutrition and food science tasks. *Note*: All LLMs may exhibit biases from training data (e.g., cultural or dietary biases), which can affect applications like food diary parsing or sensory analysis.

| **LLM** | **Provider** | **API Availability** | **Free Tier?** | **Strengths** | **Weaknesses** |
|---------|--------------|---------------------|----------------|--------------|---------------|
| **Grok** | xAI | Yes (Grok API) | Limited (beta for X Premium+ users) | Real-time X integration, witty responses, strong reasoning | Limited free access, proprietary, less conversational depth |
| **ChatGPT** | OpenAI | Yes (OpenAI API) | No (paid, starts ~$20/month) | Conversational fluency, broad NLP tasks (e.g., text summarization) | No free tier, lacks real-time data |
| **Manus** | Manus AI | Yes (limited beta) | No (proprietary, waitlist) | Enterprise automation, multi-model architecture, deep reasoning | Slow response times, less conversational, academic tone |
| **Llama** | Meta AI | No (open-source, not API-hosted) | Free (local use) | Open-source, medical/nutrition applications | High computational needs, no hosted API |
| **DeepSeek** | DeepSeek | Yes (open-source API) | Yes (free for public use) | Efficient, open-source, structured problem-solving | Less conversational, limited multimodal support |
| **Hugging Face Models** | Hugging Face | Yes (Inference API) | Yes (free tier) | Free, diverse models (e.g., BART for summarization), customizable | Limited free quota, less advanced than GPT-4 |

**Why Hugging Face?** We’ll use Hugging Face’s free Inference API for its accessibility and robust NLP capabilities, ideal for tasks like summarization and text analysis in nutrition research.

**Exercise 2**: Which LLM might be best for real-time supply chain analysis? Why? (No code needed).

<details>
<summary>💡 Solution</summary>
Grok’s real-time X integration makes it ideal for supply chain tasks needing current data (e.g., logistics updates).
</details>

**Learn More**: Explore [Hugging Face](https://huggingface.co/docs/api-inference), [OpenAI](https://openai.com/api/), or [xAI’s API](https://x.ai/api) for details! 📚

## 3. Parsing Food Diaries with Hugging Face API 📝

Food diaries are unstructured text (e.g., “Ate oatmeal with berries and coffee”). LLMs can extract nutrients or dietary patterns. Let’s use Hugging Face’s free API to summarize a food diary entry.

### 3.1 Summarizing a Food Diary

We’ll use the `facebook/bart-large-cnn` model to summarize a diary entry. Get a free API key from [Hugging Face](https://huggingface.co/).

**Note**: Replace `YOUR_API_KEY` with your Hugging Face API key.

In [None]:
# Set up Hugging Face API
API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"

from dotenv import load_dotenv 
load_dotenv()

HF_TOKEN = os.getenv("HF_TOKEN")  # Ensure HF_TOKEN is set in your environment
if not HF_TOKEN:
    raise ValueError("Hugging Face API token not found. Set HF_TOKEN environment variable.")
headers = {"Authorization": f"Bearer {HF_TOKEN}"}


# Sample food diary entry
diary_entry = """
Breakfast: Oatmeal with blueberries, almond milk, and a drizzle of honey. Coffee with a splash of cream.
Lunch: Grilled chicken salad with spinach, tomatoes, cucumber, and olive oil dressing. Sparkling water.
Dinner: Baked salmon, quinoa, steamed broccoli, and a glass of red wine.
Snack: Greek yogurt with a handful of almonds.
"""

# Function to summarize text
def summarize_text(text, max_length=100, min_length=30):
    payload = {"inputs": text, "parameters": {"max_length": max_length, "min_length": min_length}}
    response = requests.post(API_URL, headers=headers, json=payload)
    if response.status_code == 200:
        return response.json()[0]['summary_text']
    else:
        return f"Error: {response.status_code}"

# Summarize the diary
summary = summarize_text(diary_entry)

# Wrap summary to avoid long lines
wrapper = textwrap.TextWrapper(width=80, subsequent_indent="  ")  # Wrap at 80 characters
wrapped_summary = wrapper.fill(f"- {summary}")

# Format the output for readability
original_length = len(diary_entry.strip())
summary_length = len(summary.strip())

# Create structured output
output = f"""
{'=' * 36}
Food Diary Summary
{'=' * 36}
Model: facebook/bart-large-cnn
Original Length: {original_length} characters
Summary Length: {summary_length} characters

Summary:
- {wrapped_summary}

🥣
{'=' * 36}
"""

# Print formatted output
print(output)

**Explanation**:
- **Hugging Face API**: Uses `facebook/bart-large-cnn` for summarization, ideal for condensing food diary text.
- **requests.post**: Sends the diary entry to the API and retrieves the summary.
- **payload**: Controls summary length for concise output.

**Exercise 3**: Modify `max_length` to 50 in `summarize_text`. Is the summary more concise? Compare outputs.

<details>
<summary>💡 Solution</summary>
Change the function call to:
```python
summary = summarize_text(diary_entry, max_length=50)
```
A shorter `max_length` produces a more concise summary, potentially omitting details like specific foods.
</details>

**Learn More**: Try [Hugging Face’s NER models](https://huggingface.co/models) to extract specific nutrients from diaries! 🚀

In [None]:
# Set up NER model for food entity extraction
ner_model = "Dizex/InstaFoodRoBERTa-NER"
tokenizer = AutoTokenizer.from_pretrained(ner_model)
model = AutoModelForTokenClassification.from_pretrained(ner_model)
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Sample food diary entry
diary_entry = """
Breakfast: Oatmeal with blueberries, almond milk, and a drizzle of honey. Coffee with a splash of cream.
Lunch: Grilled chicken salad with spinach, tomatoes, cucumber, and olive oil dressing. Sparkling water.
Dinner: Baked salmon, quinoa, steamed broccoli, and a glass of red wine.
Snack: Greek yogurt with a handful of almonds.
"""

# Function to summarize text using Hugging Face Inference API
def summarize_text(text, max_length=80, min_length=30):
    # Prepare payload with tightened parameters for shorter summary
    payload = {
        "inputs": text,
        "parameters": {
            "max_length": max_length,
            "min_length": min_length,
            "truncation": True
        }
    }
    # Send POST request to the Inference API
    response = requests.post(API_URL, headers=headers, json=payload)
    # Check response status
    if response.status_code == 200:
        return response.json()[0]['summary_text']
    else:
        return f"Error: {response.status_code} - {response.text}"

# Function to extract food entities using NER
def extract_foods(text):
    # Normalize text to avoid spacing issues
    text = " ".join(text.split())
    # Run NER pipeline and extract FOOD entities
    ner_results = ner_pipeline(text)
    # Filter valid food entities (exclude single letters or fragments)
    foods = [result['word'] for result in ner_results if result['entity_group'] == 'FOOD' and len(result['word']) > 1]
    # Post-process to merge common multi-word foods
    merged_foods = []
    i = 0
    while i < len(foods):
        # Check for known multi-word foods
        if i < len(foods) - 1 and foods[i].lower() + " " + foods[i+1].lower() in nutrient_map:
            merged_foods.append(foods[i] + " " + foods[i+1])
            i += 2
        else:
            merged_foods.append(foods[i])
            i += 1
    return merged_foods

# Extended dictionary to map foods to key nutrients (case-insensitive)
nutrient_map = {
    "blueberries": "Antioxidants, Vitamin C, Fiber",
    "almonds": "Vitamin E, Magnesium, Healthy Fats",
    "salmon": "Omega-3 Fatty Acids, Vitamin D, Protein",
    "spinach": "Iron, Vitamin K, Folate",
    "olive oil": "Monounsaturated Fats, Vitamin E, Antioxidants",
    "broccoli": "Vitamin C, Vitamin K, Fiber",
    "quinoa": "Protein, Magnesium, Fiber",
    "greek yogurt": "Protein, Probiotics, Calcium",
    "oatmeal": "Fiber, Iron, Magnesium",
    "chicken": "Protein, Vitamin B6, Niacin",
    "tomatoes": "Vitamin C, Lycopene, Potassium",
    "cucumber": "Hydration, Vitamin K, Antioxidants",
    "honey": "Antioxidants, Natural Sugars",
    "cream": "Calcium, Vitamin A, Saturated Fats",
    "red wine": "Resveratrol, Antioxidants",
    "almond milk": "Calcium, Vitamin E, Low Calories",
    "coffee": "Caffeine, Antioxidants",
    "sparkling water": "Hydration, No Calories",
    "chicken salad": "Protein, Vitamins A and C",
    "olive oil dressing": "Monounsaturated Fats, Vitamin E",
}

# Summarize the diary
summary = summarize_text(diary_entry)

# Extract food entities
foods = extract_foods(diary_entry)

# Map foods to nutrients (case-insensitive)
nutrients = {food: nutrient_map.get(food.lower(), "Unknown nutrients") for food in foods}

# Pre-compute food/nutrient list with wrapped lines
wrapper = textwrap.TextWrapper(width=80, subsequent_indent="  ")  # Wrap at 80 characters
food_nutrient_lines = [wrapper.fill(f"- {food}: {nutrients[food]}") for food in sorted(nutrients)]
food_nutrient_text = "\n".join(food_nutrient_lines)

# Format the output for readability
current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
original_length = len(diary_entry.strip())
summary_length = len(summary.strip())

# Wrap summary to avoid long lines
wrapped_summary = wrapper.fill(f"- {summary}")

output = f"""
{'=' * 40}
Food Diary Analysis
{'=' * 40}
Summarization Model: facebook/bart-large-cnn
NER Model: Dizex/InstaFoodRoBERTa-NER
Timestamp: {current_time}
Original Length: {original_length} characters
Summary Length: {summary_length} characters
Food Count: {len(foods)} items

Summary:
{wrapped_summary}

Extracted Foods and Nutrients:
{food_nutrient_text}

🥣
{'=' * 40}
"""

# Print formatted output
print(output)

## 4. Literature Review with Hugging Face API 📚

LLM APIs can summarize nutrition research papers for literature reviews. Let’s use the same Hugging Face API to summarize a study abstract.

### 4.1 Summarizing a Nutrition Study

We’ll summarize a sample abstract to extract key findings.

In [None]:
# Sample nutrition study abstract
abstract = """
The Mediterranean diet, characterized by high intake of fruits, vegetables, whole grains, and olive oil, has been associated with reduced cardiovascular risk and improved cognitive function. This study examined the impact of adherence to the Mediterranean diet on heart disease outcomes in 500 participants over five years. Results showed a 30% reduction in cardiovascular events among high-adherence groups compared to low-adherence groups. Challenges include cultural barriers and cost of fresh produce.
"""

# Summarize the abstract
lit_summary = summarize_text(abstract)
print(f'Literature Summary: {lit_summary} 📝')



**Explanation**:
- **summarize_text**: Reuses the Hugging Face API to condense the abstract into key points.
- **Output**: Highlights main findings (e.g., cardiovascular benefits) for quick review.

**Exercise 4**: Change `min_length` to 50 in the `summarize_text` call for the abstract. Does it include more details? Why?

<details>
<summary>💡 Solution</summary>
Update the call to:
```python
lit_summary = summarize_text(abstract, max_length=100, min_length=50)
```
A higher `min_length` ensures a longer summary, capturing more details like study size or challenges.
</details>

**Learn More**: For advanced literature reviews, try [OpenAI’s API](https://openai.com/api/) for deeper contextual analysis (paid).

## 5. Supply Chain Analysis with a Mock LLM API 🚚

LLM APIs can analyze supply chain logs (e.g., delivery reports) to detect issues or optimize logistics. Since real APIs like OpenAI or xAI are paid, we’ll use a mock API to simulate extracting insights from a supply chain log.

### 5.1 Mock Supply Chain Analysis

We’ll simulate an LLM extracting key issues from a supply chain log.

In [None]:
# Mock function to simulate an LLM API for supply chain analysis
def mock_supply_chain_analysis(log_text):
    # Simulate LLM extracting issues (in reality, use Grok or ChatGPT API)
    import random
    issues = ['Delay in delivery', 'Contamination risk', 'Inventory shortage', 'Transport cost overrun']
    confidence = random.uniform(0.7, 0.95)
    detected_issue = random.choice(issues)
    return {'issue': detected_issue, 'confidence': confidence}

# Sample supply chain log
supply_log = """
Delivery of fresh produce delayed by 2 days due to truck breakdown. Spinach showed signs of wilting upon arrival. Inventory levels for tomatoes are critically low. Fuel costs for transport increased by 15% this month.
"""

# Analyze the log
result = mock_supply_chain_analysis(supply_log)
print(f'Supply Chain Issue: {result["issue"]} (Confidence: {result["confidence"]:.2f}) 🚛')

# Save result to file
with open('supply_chain_result.json', 'w') as f:
    json.dump(result, f)

**Explanation**:
- **mock_supply_chain_analysis**: Simulates an LLM identifying issues (e.g., delays, contamination) from a log.
- **Real Alternative**: Grok’s API with X integration could provide real-time supply chain insights.

**Exercise 5**: Add a new issue (e.g., ‘Supplier miscommunication’) to the `issues` list in `mock_supply_chain_analysis`. Test it and check the output.

<details>
<summary>💡 Solution</summary>
Update the issues list:
```python
issues = ['Delay in delivery', 'Contamination risk', 'Inventory shortage', 'Transport cost overrun', 'Supplier miscommunication']
```
The function now includes the new issue in its random selection.
</details>

**Learn More**: For real supply chain analysis, explore [xAI’s API](https://x.ai/api) for Grok’s real-time capabilities (paid).

## 6. Sensory Analysis with Hugging Face API 🍫

Sensory analysis involves interpreting taste, texture, or aroma descriptions (e.g., “creamy, nutty chocolate”). LLMs can classify or summarize sensory data. Let’s use Hugging Face to summarize a sensory description.

### 6.1 Summarizing Sensory Data

We’ll summarize a sensory description of a food product.

In [None]:
# Sample sensory description
sensory_text = """
The dark chocolate bar has a rich, velvety texture with a deep cocoa flavor. It offers subtle hints of roasted nuts and a slight fruity tang. The finish is smooth with a mild bitterness that lingers pleasantly.
"""

# Summarize the sensory description
sensory_summary = summarize_text(sensory_text)
print(f'Sensory Summary: {sensory_summary} 🍫')

# Save summary to file
with open('sensory_summary.txt', 'w') as f:
    f.write(sensory_summary)

**Explanation**:
- **summarize_text**: Uses Hugging Face to condense sensory text into key descriptors (e.g., “rich, nutty”).
- **Application**: Useful for product development or consumer feedback analysis.

**Exercise 6**: Try a different sensory description (e.g., for a cheese or beverage). Does the summary capture the main flavors? Why or why not?

<details>
<summary>💡 Solution</summary>
Create a new `sensory_text` (e.g., “The cheddar is sharp, creamy, with a nutty aftertaste”). The summary should capture key descriptors if the input is clear, but vague inputs may lead to less precise summaries.
</details>

**Learn More**: For advanced sensory analysis, try [ChatGPT’s API](https://openai.com/api/) for sentiment or emotion detection (paid).

## 7. Combining Applications: A Nutrition & Food Science Pipeline 🛠️

Let’s combine our LLM API calls into a pipeline to process a food diary, summarize a study, analyze a supply chain, and evaluate sensory data.

### 7.1 Building the Pipeline

We’ll create a function to run all tasks and save results.

In [None]:
# Pipeline function
def nutrition_food_science_pipeline(diary, abstract, supply_log, sensory_text):
    results = {}
    
    # Step 1: Summarize food diary
    results['diary_summary'] = summarize_text(diary)
    
    # Step 2: Summarize literature
    results['literature_summary'] = summarize_text(abstract)
    
    # Step 3: Analyze supply chain
    results['supply_chain_issue'] = mock_supply_chain_analysis(supply_log)
    
    # Step 4: Summarize sensory data
    results['sensory_summary'] = summarize_text(sensory_text)
    
    return results

# Run pipeline
pipeline_results = nutrition_food_science_pipeline(diary_entry, abstract, supply_log, sensory_text)

# Print results
print('Pipeline Results:')
print(f'Food Diary Summary: {pipeline_results["diary_summary"]} 🥣')
print(f'Literature Summary: {pipeline_results["literature_summary"]} 📝')
print(f'Supply Chain Issue: {pipeline_results["supply_chain_issue"]} 🚛')
print(f'Sensory Summary: {pipeline_results["sensory_summary"]} 🍫')

# Save pipeline results
with open('pipeline_results.json', 'w') as f:
    json.dump(pipeline_results, f)

**Explanation**:
- **nutrition_food_science_pipeline**: Combines diary parsing, literature summarization, supply chain analysis, and sensory summarization.
- **results**: A dictionary storing outputs from each step.

**Exercise 7**: Add a step to the pipeline to count the number of meals in the `diary_summary` (hint: split by punctuation). Print the count in the results.

<details>
<summary>💡 Solution</summary>
Add to the pipeline:
```python
results['meal_count'] = len(results['diary_summary'].split('.')) - 1  # Subtract 1 for trailing period
```
Then print:
```python
print(f'Meal Count: {pipeline_results["meal_count"]}')
```
This counts sentences in the summary, approximating meal mentions.
</details>

## 8. Summary: Your LLM API Toolkit for Nutrition & Food Science 🧰

Here’s what you’ve mastered:

- **Food Diary Parsing** 🥣: Summarize dietary logs with Hugging Face.
- **Literature Reviews** 📚: Condense research papers for quick insights.
- **Supply Chain Analysis** 🚚: Detect issues in logistics (mock API).
- **Sensory Analysis** 🍫: Summarize taste and texture descriptions.
- **Pipelines** 🛠️: Combine LLM tasks for efficient workflows.

**Final Exercise**: Find a real nutrition dataset or paper (e.g., from [USDA FoodData Central](https://fdc.nal.usda.gov/)) and apply one of these LLM API approaches. Share your findings in a short paragraph!

**What’s Next?** Experiment with paid APIs like [Grok](https://x.ai/api) or [ChatGPT](https://openai.com/api/) for advanced tasks, or explore open-source models like DeepSeek for cost-free options. Keep tasting the future of AI in food science! 😄