Skip to content

rasali535/hans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

106 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title ForgeSight
emoji 🏗️
colorFrom red
colorTo gray
sdk docker
pinned true
license mit
short_description Multimodal Civil QC Copilot on AMD MI300X + ROCm
tags
amd
rocm
mi300x
qwen
vllm
civil-engineering
quality-control
agents

🏗️ ForgeSight — Multimodal QC Copilot on AMD Instinct™ MI300X

ForgeSight is a production-grade Agentic Quality Control (QC) Pipeline designed for civil engineering, construction, and infrastructure projects. Built exclusively for the AMD + lablab.ai Developer Hackathon, it leverages the massive 192GB VRAM of the AMD Instinct MI300X to run a state-of-the-art multimodal multi-agent workflow.

🎯 Hackathon Alignment

ForgeSight was explicitly designed to conquer the core objectives of this hackathon, working end-to-end and showing what AMD's compute stack can unlock:

  • 🤖 Track 1: AI Agents & Agentic Workflows: We moved far beyond simple RAG. ForgeSight implements a sophisticated, coordinated 4-agent workflow (Inspector, Diagnostician, Action, Reporter) that automates the complex task of infrastructure quality control, reasoning sequentially to deliver concrete work orders.
  • 🎨 Track 3: Vision & Multimodal AI: We process and understand complex high-resolution visual data using the massive memory bandwidth of AMD GPUs. ForgeSight is a true high-throughput industrial inspection application using Qwen2-VL-7B optimized for ROCm™.
  • 🚢 Extra Challenge: Ship It + Build in Public: Not only did we build in public, but we also built an agent for it. The pipeline features a 5th silent agent (the Social Agent) that automatically generates punchy, hashtag-ready X and LinkedIn posts for every inspection, tagging @lablab and @AIatAMD.

🏗️ Architecture Overview

ForgeSight is built on a distributed "Console-Agent-Compute" architecture, optimized for production stability:

  1. ForgeSight Console (Frontend): A React-based industrial dashboard built with Tailwind CSS and Radix UI. Hosted on Vercel for high availability and global edge performance.
  2. Agentic Backend (Orchestration): A hybrid FastAPI service.
    • Vercel Serverless: Handles metadata, telemetry, and reporting.
    • Hugging Face Spaces: Orchestrates the heavy 4-agent pipeline with longer timeouts.
  3. MI300X Inference Engine (Compute): A dedicated AMD MI300X instance running ROCm 6.2 and vLLM. It serves a fine-tuned Qwen2-VL-7B model, providing the "brain" for the multimodal inspections.

🛡️ Production-Grade Robustness

During development, we prioritized "Failure-Proof" engineering to handle real-world data inconsistencies:

  • Defensive API Layer: Implemented global exception handling and "Graceful Fallbacks" (e.g., in-memory persistence) to ensure the UI stays up even if DB connections are interrupted.
  • Robust Schema Mapping: Designed the pipeline to handle malformed agent outputs or missing visual data without crashing, ensuring site managers always get a response.
  • SSL/TLS Hardening: Custom connection handlers for MongoDB Atlas to ensure secure, stable connectivity from serverless environments.

🚀 How We Built It: A Walkthrough

1. High-Throughput Serving with vLLM & ROCm

To make the agents responsive, we deployed the model using vLLM on the ROCm 6.2 stack.

  • We utilized PagedAttention to handle the high VRAM requirements of the model.
  • The massive 192GB VRAM of the MI300X allowed us to serve the full model without sharding, maximizing throughput for our concurrent agent calls.
  • ROCm Tuning: Optimized the engine by enforcing eager execution and disabling chunked prefill to ensure flawless stability on the MI300X.

2. Designing the Multi-Agent Pipeline

We implemented a 4-stage sequential pipeline in Python to ensure industrial-grade auditability:

  • Inspector Agent: Performs the initial multimodal analysis of the image.
  • Diagnostician Agent: Determines the root cause (e.g., thermal expansion).
  • Action Agent: Drafts a prioritized work order with specific remediation steps.
  • Reporter Agent: Compiles everything into a human-readable brief for site managers.

3. Developing the ForgeSight Console

  • Live Telemetry: Real-time visualization of GPU utilization and power consumption directly from the MI300X node.
  • Agentic Transcripts: A dynamic UI that displays the "thought process" and JSON hand-offs of each agent.
  • Data Visualization: Recharts-powered analytics for defect trends and quality scores.

🛠️ Tech Stack

  • Hardware: AMD Instinct MI300X (192GB HBM3).
  • Software Stack: ROCm 6.2, PyTorch, vLLM.
  • Backend: FastAPI, Gradio, Python.
  • Frontend: React, Tailwind CSS, Radix UI, Recharts.
  • Persistence: MongoDB Atlas (via Motor).

🛠️ Installation & Setup

  1. Clone the Repo: git clone https://github.com/rasali535/hans.git
  2. Install Frontend: cd frontend && npm install
  3. Install Backend: pip install -r backend/requirements.txt
  4. Configure Environment: Add the following to your .env or Vercel settings:
    • AMD_INFERENCE_URL
    • AMD_INFERENCE_TOKEN
    • MONGO_URL
  5. Launch Local Dev:
    • Backend: python backend/server.py
    • Frontend: npm start

📊 Performance on AMD

The MI300X's 5.3 TB/s bandwidth allows ForgeSight to maintain >2500 tokens/sec throughput, enabling real-time visual inspection of massive infrastructure projects without the latency typical of cloud-based VLM APIs.


Built by Ras Ali Labs for the AMD Developer Hackathon.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors