🧠 MongoDB Training Data Logger — CIFAR‑10 Agent 🚀

Train models. Capture everything. Query your ML history like a database engineer.

This repo demonstrates how to train deep learning models while persistently logging runs, metrics, configs, and predictions to MongoDB Atlas — turning ephemeral experiments into durable, queryable intelligence.

Perfect for:

🤖 Agentic ML workflows
📊 Experiment tracking without heavyweight tools
🧬 Reproducible research
☁️ Cloud‑native training telemetry
🔍 Mining past runs for insights
🐋 Building LLM/RAG training corpora from real model activity

✨ Features

🔥 End‑to‑End CIFAR‑10 Training Agent
📦 Automatic dataset handling
⚡ Mixed‑precision + distributed training support
🗃️ MongoDB Atlas logging for EVERYTHING
🔁 Reload configs from past runs
🏆 Query best runs by metric
🔮 Log predictions + evaluations
🧠 CLI‑driven workflow
🛠️ Zero external experiment tracker required

🏗️ Architecture Overview

flowchart TD
    A["🖼️ CIFAR-10 Dataset"]
    B["⚙️ Training Pipeline (TensorFlow / Keras)"]
    C["📊 Metrics + Config + History"]
    D["🗄️ MongoDB Atlas"]
    E["🧠 Queryable Training Intelligence"]

    A --> B --> C --> D --> E

MongoDB becomes your:

📚 Experiment database
🧾 Model registry (lightweight)
🔍 Analytics backend
🧠 Knowledge store for agents

📂 What Gets Stored in MongoDB

Each run produces a rich document containing:

🧬 Model architecture
⚙️ Hyperparameters
📅 Timestamps
📉 Training history
🏆 Best metrics
🧪 Test results
🔮 Predictions
📦 Dataset metadata
✅ Completion status

Example document shape:

{
  "type": "training",
  "model_name": "cifar10_basic_20260224_120101",
  "architecture": "basic",
  "epochs": 50,
  "learning_rate": 0.001,
  "best_metrics": {
    "val_accuracy": 0.84
  },
  "test_metrics": {
    "test_accuracy": 0.82,
    "test_loss": 0.65
  },
  "history": { "...": "..." },
  "timestamp": "2026‑02‑24T12:01:01Z",
  "status": "completed"
}

🚀 Quick Start

1️⃣ Install Dependencies

pip install tensorflow pymongo matplotlib numpy

2️⃣ Configure MongoDB Atlas

Edit the script:

MONGODB_URI = "your_connection_string_here"
DB_NAME = "agentic_demo"
COLLECTION_NAME = "training_data"

⚠️ Logging is automatically disabled if no URI is set.

3️⃣ Train a Model

python cifar10_agent.py train --architecture basic --epochs 50

Advanced model + mixed precision:

python cifar10_agent.py train \
  --architecture advanced \
  --epochs 100 \
  --mixed-precision

📊 Evaluate a Trained Model

python cifar10_agent.py evaluate \
  --model-path checkpoints/best_model.h5

🔮 Predict a Single Image

python cifar10_agent.py predict \
  --image-path test.jpg \
  --model-path checkpoints/best_model.h5

Logs prediction confidence + top‑3 classes to MongoDB.

📋 Inspect Dataset Info

python cifar10_agent.py info \
  --show-classes \
  --show-stats \
  --show-sample

🗄️ Query Past Training Runs

Recent runs

python cifar10_agent.py list-runs --limit 10

Best runs by metric

python cifar10_agent.py list-runs \
  --best \
  --metric test_accuracy \
  --limit 5

🔁 Reproduce a Previous Run

Load config from MongoDB:

python cifar10_agent.py load-config \
  --run-id <DOCUMENT_ID>

Retrain automatically:

python cifar10_agent.py load-config \
  --run-id <DOCUMENT_ID> \
  --retrain

Reproducibility achieved 🧪✨

🧠 Why MongoDB for Training Data?

Unlike traditional experiment trackers, MongoDB gives you:

✅ Flexible schema for evolving experiments
✅ Powerful ad‑hoc queries
✅ Aggregation pipelines for analytics
✅ Vector search compatibility
✅ Native cloud scaling
✅ Easy integration with agents + apps

You can ask questions like:

🏆 “What architecture performs best at LR=0.001?”
📉 “Show runs where validation diverged.”
⚡ “Which configs train fastest?”
🔍 “Find models similar to this performance profile.”

🤖 Agentic ML Potential

This repository is designed as a building block for:

Autonomous ML agents
Self‑optimizing training systems
Feedback‑driven pipelines
Experiment mining
RAG over training history
LLM‑guided hyperparameter search

MongoDB becomes the agent’s memory.

🛑 Disable MongoDB Logging

All commands support:

--no-mongodb

Useful for offline or privacy‑sensitive runs.

🧩 Extending This Repo

Ideas for next steps:

📈 Add dashboards (Streamlit / Gradio)
🧠 Integrate vector embeddings of runs
🔁 Auto‑tuning agents
🗂️ Model artifact storage references
🔔 Real‑time training notifications
📊 Aggregation‑based leaderboards

🐳 Built For Builders

If you like:

MongoDB
ML engineering
Agentic systems
Cloud‑native tooling
Reproducible experiments

…this repo is for you.

⭐ Contribute

PRs welcome! Add new architectures, datasets, logging features, or agent capabilities.

📜 License

MIT — use it, fork it, ship it 🚀

💬 Final Thought

Stop losing your experiments in terminal scrollback.
Give your models a memory.

🧠➡️🗄️➡️🚀

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
MongoCifar_Agent2.py		MongoCifar_Agent2.py
README.md		README.md
cifar10_training.py		cifar10_training.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 MongoDB Training Data Logger — CIFAR‑10 Agent 🚀

✨ Features

🏗️ Architecture Overview

📂 What Gets Stored in MongoDB

🚀 Quick Start

1️⃣ Install Dependencies

2️⃣ Configure MongoDB Atlas

3️⃣ Train a Model

📊 Evaluate a Trained Model

🔮 Predict a Single Image

📋 Inspect Dataset Info

🗄️ Query Past Training Runs

Recent runs

Best runs by metric

🔁 Reproduce a Previous Run

🧠 Why MongoDB for Training Data?

🤖 Agentic ML Potential

🛑 Disable MongoDB Logging

🧩 Extending This Repo

🐳 Built For Builders

⭐ Contribute

📜 License

💬 Final Thought

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mongodb-developer/Training_Models_MongoDB

Folders and files

Latest commit

History

Repository files navigation

🧠 MongoDB Training Data Logger — CIFAR‑10 Agent 🚀

✨ Features

🏗️ Architecture Overview

📂 What Gets Stored in MongoDB

🚀 Quick Start

1️⃣ Install Dependencies

2️⃣ Configure MongoDB Atlas

3️⃣ Train a Model

📊 Evaluate a Trained Model

🔮 Predict a Single Image

📋 Inspect Dataset Info

🗄️ Query Past Training Runs

Recent runs

Best runs by metric

🔁 Reproduce a Previous Run

🧠 Why MongoDB for Training Data?

🤖 Agentic ML Potential

🛑 Disable MongoDB Logging

🧩 Extending This Repo

🐳 Built For Builders

⭐ Contribute

📜 License

💬 Final Thought

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages