Skip to content

mongodb-developer/Training_Models_MongoDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 MongoDB Training Data Logger — CIFAR‑10 Agent 🚀

Train models. Capture everything. Query your ML history like a database engineer.

This repo demonstrates how to train deep learning models while persistently logging runs, metrics, configs, and predictions to MongoDB Atlas — turning ephemeral experiments into durable, queryable intelligence.

Perfect for:

  • 🤖 Agentic ML workflows
  • 📊 Experiment tracking without heavyweight tools
  • 🧬 Reproducible research
  • ☁️ Cloud‑native training telemetry
  • 🔍 Mining past runs for insights
  • 🐋 Building LLM/RAG training corpora from real model activity

✨ Features

🔥 End‑to‑End CIFAR‑10 Training Agent
📦 Automatic dataset handling
⚡ Mixed‑precision + distributed training support
🗃️ MongoDB Atlas logging for EVERYTHING
🔁 Reload configs from past runs
🏆 Query best runs by metric
🔮 Log predictions + evaluations
🧠 CLI‑driven workflow
🛠️ Zero external experiment tracker required


🏗️ Architecture Overview

flowchart TD
    A["🖼️ CIFAR-10 Dataset"]
    B["⚙️ Training Pipeline (TensorFlow / Keras)"]
    C["📊 Metrics + Config + History"]
    D["🗄️ MongoDB Atlas"]
    E["🧠 Queryable Training Intelligence"]

    A --> B --> C --> D --> E
Loading

MongoDB becomes your:

  • 📚 Experiment database
  • 🧾 Model registry (lightweight)
  • 🔍 Analytics backend
  • 🧠 Knowledge store for agents

📂 What Gets Stored in MongoDB

Each run produces a rich document containing:

  • 🧬 Model architecture
  • ⚙️ Hyperparameters
  • 📅 Timestamps
  • 📉 Training history
  • 🏆 Best metrics
  • 🧪 Test results
  • 🔮 Predictions
  • 📦 Dataset metadata
  • ✅ Completion status

Example document shape:

{
  "type": "training",
  "model_name": "cifar10_basic_20260224_120101",
  "architecture": "basic",
  "epochs": 50,
  "learning_rate": 0.001,
  "best_metrics": {
    "val_accuracy": 0.84
  },
  "test_metrics": {
    "test_accuracy": 0.82,
    "test_loss": 0.65
  },
  "history": { "...": "..." },
  "timestamp": "2026‑02‑24T12:01:01Z",
  "status": "completed"
}

🚀 Quick Start

1️⃣ Install Dependencies

pip install tensorflow pymongo matplotlib numpy

2️⃣ Configure MongoDB Atlas

Edit the script:

MONGODB_URI = "your_connection_string_here"
DB_NAME = "agentic_demo"
COLLECTION_NAME = "training_data"

⚠️ Logging is automatically disabled if no URI is set.


3️⃣ Train a Model

python cifar10_agent.py train --architecture basic --epochs 50

Advanced model + mixed precision:

python cifar10_agent.py train \
  --architecture advanced \
  --epochs 100 \
  --mixed-precision

📊 Evaluate a Trained Model

python cifar10_agent.py evaluate \
  --model-path checkpoints/best_model.h5

🔮 Predict a Single Image

python cifar10_agent.py predict \
  --image-path test.jpg \
  --model-path checkpoints/best_model.h5

Logs prediction confidence + top‑3 classes to MongoDB.


📋 Inspect Dataset Info

python cifar10_agent.py info \
  --show-classes \
  --show-stats \
  --show-sample

🗄️ Query Past Training Runs

Recent runs

python cifar10_agent.py list-runs --limit 10

Best runs by metric

python cifar10_agent.py list-runs \
  --best \
  --metric test_accuracy \
  --limit 5

🔁 Reproduce a Previous Run

Load config from MongoDB:

python cifar10_agent.py load-config \
  --run-id <DOCUMENT_ID>

Retrain automatically:

python cifar10_agent.py load-config \
  --run-id <DOCUMENT_ID> \
  --retrain

Reproducibility achieved 🧪✨


🧠 Why MongoDB for Training Data?

Unlike traditional experiment trackers, MongoDB gives you:

✅ Flexible schema for evolving experiments
✅ Powerful ad‑hoc queries
✅ Aggregation pipelines for analytics
✅ Vector search compatibility
✅ Native cloud scaling
✅ Easy integration with agents + apps

You can ask questions like:

  • 🏆 “What architecture performs best at LR=0.001?”
  • 📉 “Show runs where validation diverged.”
  • ⚡ “Which configs train fastest?”
  • 🔍 “Find models similar to this performance profile.”

🤖 Agentic ML Potential

This repository is designed as a building block for:

  • Autonomous ML agents
  • Self‑optimizing training systems
  • Feedback‑driven pipelines
  • Experiment mining
  • RAG over training history
  • LLM‑guided hyperparameter search

MongoDB becomes the agent’s memory.


🛑 Disable MongoDB Logging

All commands support:

--no-mongodb

Useful for offline or privacy‑sensitive runs.


🧩 Extending This Repo

Ideas for next steps:

  • 📈 Add dashboards (Streamlit / Gradio)
  • 🧠 Integrate vector embeddings of runs
  • 🔁 Auto‑tuning agents
  • 🗂️ Model artifact storage references
  • 🔔 Real‑time training notifications
  • 📊 Aggregation‑based leaderboards

🐳 Built For Builders

If you like:

  • MongoDB
  • ML engineering
  • Agentic systems
  • Cloud‑native tooling
  • Reproducible experiments

…this repo is for you.


⭐ Contribute

PRs welcome! Add new architectures, datasets, logging features, or agent capabilities.


📜 License

MIT — use it, fork it, ship it 🚀


💬 Final Thought

Stop losing your experiments in terminal scrollback.
Give your models a memory.

🧠➡️🗄️➡️🚀

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages