ProcessumAir

Autonomous Data Cleaning & Feature Engineering Agent

ProcessumAir is a next-generation Data Engineering Agent designed to democratize the data preparation process. Powered by Google Gemini 2.5 Flash, it acts as an intelligent autonomous loop that ingests raw datasets (CSV/Excel), identifies quality issues, creates a rigorous cleaning strategy, and executes it to produce Machine Learning-ready data.

ProcessumAir is an autonomous agent that inspects, cleans, and engineers features for your datasets. From raw CSV to ML-ready code in seconds.

🚀 Key Features

Autonomous Reasoning: Analyzes dataset schema and user goals (e.g., "Predict Churn") to formulate a tailored cleaning plan.
Universal Import: Drag-and-drop support for CSV and Excel (.xlsx) files.
Smart Profiling: Automatic detection of data types, missing values, and outliers.
Trace Visibility: A visual "Decision Trace Graph" showing the agent's internal thought process (Input → Reasoning → Code).
Client-Side Processing: Data parsing and heuristic cleaning happen locally in the browser for speed and privacy.
Deliverables:
- Cleaned Dataset: Download the processed file as .xlsx.
- Python Script: Get a reproducible Pandas/Scikit-Learn script.
- Executive Audit Report: A professional PDF certificate of data health.

🛠️ Tech Stack

Frontend: React, TypeScript, Tailwind CSS
AI Engine: Google Gemini API (@google/genai)
Data Engine: SheetJS (xlsx)
Visualization: Recharts
Reporting: jsPDF

📦 Installation & Setup

Clone the repository

git clone https://github.com/yourusername/processum-air.git
cd processum-air

Install dependencies
```
npm install
```

Configure API Key ProcessumAir requires a Google Gemini API Key.

Create a .env file in the root directory:

# Note: In a Vite setup, you might need to configure 'define' in vite.config.ts 
# to support 'process.env.API_KEY' or switch to 'import.meta.env'.
API_KEY=your_google_gemini_api_key_here

Run the development server
```
npm run dev
```

📂 Project Structure

processum-air/
├── src/
│   ├── components/       # UI Components (Dashboard, Charts, Modals)
│   ├── services/         # Gemini AI Service & Prompt Engineering
│   ├── utils/            # CSV/Excel Parsers & Cleaning Logic
│   ├── types.ts          # TypeScript Interfaces
│   └── App.tsx           # Main Application State & Routing
├── public/
└── package.json

🧠 System Architecture

1. Profiling Phase: The app uses SheetJS to parse the file locally. It generates a statistical metadata summary (Row count, Null %, Types) without sending raw rows to the cloud.

2. Planning Phase: This metadata + User Goal is sent to Gemini 2.5 Flash. The LLM returns a JSON Cleaning Plan (List of steps with reasoning and Python code).

3. Execution Phase: The dashboard simulates the execution. The cleanAndExportData utility replicates the logic (Imputation, Dropping columns) on the full dataset in the browser memory.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Verified by ProcessumAir

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
components		components
services		services
utils		utils
.env.local		.env.local
.gitignore		.gitignore
App.tsx		App.tsx
README.md		README.md
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package.json		package.json
tsconfig.json		tsconfig.json
types.ts		types.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProcessumAir

🚀 Key Features

🛠️ Tech Stack

📦 Installation & Setup

📂 Project Structure

🧠 System Architecture

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ProcessumAir

🚀 Key Features

🛠️ Tech Stack

📦 Installation & Setup

📂 Project Structure

🧠 System Architecture

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages