# **MI-Agent: A Google Colab Tutorial**
**A Fully Autonomous Agent for Materials Informatics**
### **Purpose of This Tutorial**
This tutorial demonstrates how to use [**MI-Agent**](https://github.com/hasan-sayeed/mi_agent), an autonomous agentic system designed to accelerate a wide range of materials informatics tasks. With just a problem description and relevant CSV files, MI-Agent can:

- Interpret the scientific problem and understand the data context
- Load and (if needed) merge structured datasets
- Automatically select the appropriate target and feature columns
- Perform exploratory data analysis (EDA) and generate visualizations
- Run multiple machine learning models, select the top 5 performers, tune their hyperparameters, and identify the best final model
- Save all generated code, plots, and intermediate outputs
- Produce a detailed technical report of the full process
- Log every decision and execution step to LangSmith for traceability

By following this tutorial, you'll learn how to set up MI-Agent in Google Colab, describe your materials problem, and execute a complete, automated analysis pipeline with minimal effort.


> This kind of automation offers a powerful **starting point for materials informatics engineers**—helping them **move 10x faster**, explore ideas more effectively, and potentially increase productivity by an order of magnitude.






## **Step 1. Install Required Packages**

Installs the `materials-informatics-agent` package and `wkhtmltopdf` (used for PDF report generation). Output is suppressed for a cleaner notebook experience.

In [1]:
!pip install materials-informatics-agent -q > /dev/null 2>&1
!apt-get update -qq > /dev/null 2>&1 \
 && apt-get install -y -qq wkhtmltopdf > /dev/null 2>&1

## **Step 2. Mount Google Drive**
Allows the notebook to read your problem file and data, and write results to your Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## **Step 3. Set Your API Keys**
MI-Agent requires two API keys:

- `OPENAI_API_KEY` – for accessing the language model (e.g., GPT-4)
- `LANGCHAIN_API_KEY` – for logging execution traces to LangSmith

Choose one of the following methods:
### **Option A: Set Environment Variables Directly**
Replace the placeholders with your actual keys. These are required for LLM generation and step-by-step tracing via LangSmith.

In [None]:
%env OPENAI_API_KEY=sk-...
%env LANGCHAIN_API_KEY=lsv2_...

### **Option B: Use the Colab Secrets Sidebar**
1. In the left Secrets tab, add two secrets with exact names:

- `OPENAI_API_KEY`

- `LANGCHAIN_API_KEY`

2. Then run the following to inject them into your environment:

In [4]:
from google.colab import userdata
import os

# list whatever secrets you’ve added in the UI
for key in ("OPENAI_API_KEY", "LANGCHAIN_API_KEY"):
    val = userdata.get(key)       # grabs the secret by name
    if val is not None:
        os.environ[key] = val     # inject into the process env


This makes your API keys available to MI-Agent without hardcoding them into the notebook.

## **Step 4. Set File Paths**
Update these paths to match the location of your problem description and desired output folder.

In [5]:
PROBLEM_FILE = "/content/drive/MyDrive/mi_agent/sample_problem.txt"
OUTPUT_DIR   = "/content/drive/MyDrive/mi_agent/output"

## **Step 5. Write a Problem File**
Create a `.txt` file describing the **materials science problem** you want to analyze. Clearly explain the context and list the full paths to the CSV files that contain your data.

Example:

> A company that manufactures metal components for marine environments wants to speed up the process of developing corrosion-resistant alloys. They often test different metal alloys by immersing them in saltwater and measuring how much they corrode over time.
>
> You are provided with two CSV files:
1. /content/drive/MyDrive/mi_agent/data/alloy_composition.csv
2. /content/drive/MyDrive/mi_agent/data/corrosion_test_results.csv

⚠️ Always provide the full path to your data files as seen from within Colab (starting with /content/drive/...).

## **Step 6. Run MI-Agent**
Once started, **MI-Agent** will:

- Parse your problem and load the listed CSV file(s)

- Merge datasets if needed

- Identify the appropriate target and features automatically

- Generate and run EDA code, including plots and summaries

- Save all EDA code (\*.py) and images (\*.png) to the output directory

- Train a variety of machine learning models, select the top 5 based on performance, perform hyperparameter tuning, and pick the final best model

- Write and save a detailed technical summary of the full analysis (in PDF format) in the output directory

- Log all reasoning steps and intermediate outputs to LangSmith

Run the following to start the pipeline:

In [None]:
!mi_agent \
  --problem-file "$PROBLEM_FILE" \
  --output-dir   "$OUTPUT_DIR" \
  --model        "gpt-4.1-mini"

## **Step 7. Explore the Output**
Navigate to your OUTPUT_DIR in Google Drive. You’ll find:

Python scripts for the EDAs performed

Visualizations generated during analysis

A 5-page PDF of a technical summary of the entire workflow

You can trace the steps and reasonings in your LangSmith dashboard

From here, you can review, share, or build upon the automated analysis **MI-Agent** has performed!

# **Beyond This Tutorial**
This tutorial demonstrates a **proof of concept** of what's possible with an agentic system for materials informatics. While **MI-Agent** is designed to work out-of-the-box for many common scenarios, your real-world problems might be more complex, involve additional constraints, or require custom features.

Have something bigger in mind? Want **MI-Agent** to handle simulation data, text inputs, custom featurization, or integrate with your own models?

> **We'd love to hear from you.**
>
> Submit a feature request, share feedback, or get in touch to help shape the future of agentic systems in materials science.
- Open a [GitHub issue](https://github.com/hasan-sayeed/mi_agent)
- Or reach out directly at hasan.sayeed@utah.edu

