InfraTool - Visual ETL Pipeline Builder

A modern, drag-and-drop ETL (Extract, Transform, Load) pipeline builder with real-time execution and monitoring.

🚀 Features

Visual Pipeline Builder: Drag and drop nodes to create data processing pipelines
File Upload: Upload CSV datasets directly through the web interface
Dataset Management: Browse, preview, and manage your uploaded datasets
Real-time Execution: Watch your pipeline execute with live logging
Python Transforms: Write custom Python code to transform your data
Node Configuration: Double-click nodes to configure them with an intuitive UI

🛠️ Getting Started

Prerequisites

Docker and Docker Compose
A modern web browser

Quick Start

Start the application:
```
docker compose up --build
```
Open your browser and navigate to:
- Web App: http://localhost:3000
- API Docs: http://localhost:8000/docs
Upload a dataset:
- Click on the "Datasets" tab in the sidebar
- Upload a CSV file (try the included sample_data.csv)
- View the dataset metadata and preview
Build a pipeline:
- Click on the "Nodes" tab
- Add a "CSV Source" node and double-click to configure it
- Add a "Python Transform" node and write your transformation code
- Add a "Console Sink" node to see the results
- Connect the nodes by dragging between their connection points
Run your pipeline:
- Click the "▶️ Run Pipeline" button
- Watch the real-time logs as your pipeline executes

📋 Available Node Types

📄 CSV Source

Loads data from uploaded CSV files.

Configuration: Select from your uploaded datasets

🐍 Python Transform

Transforms data using custom Python code.

Configuration: Write a transform(df) function that takes a pandas DataFrame and returns a modified DataFrame

Example:

def transform(df):
    df['total_compensation'] = df['salary'] * 1.2
    df['age_group'] = df['age'].apply(lambda x: 'Young' if x < 30 else 'Senior')
    return df[df['salary'] > 70000]  # Filter high earners

📺 Console Sink

Displays the processed data in the logs panel.

Configuration: Optional label for identification

💡 Example Workflows

Basic Data Filtering

Upload employee data CSV
CSV Source → Python Transform → Console Sink
Transform code filters employees by criteria
View filtered results in console

Data Enrichment

Upload sales data CSV
CSV Source → Python Transform → Console Sink
Transform code adds calculated fields (tax, commission, etc.)
View enriched data with new columns

Data Aggregation

Upload transaction data CSV
CSV Source → Python Transform → Console Sink
Transform code groups and summarizes data
View aggregated results

🔧 Architecture

Backend: FastAPI with WebSocket support for real-time logging
Frontend: React + TypeScript with React Flow for visual pipeline building
Styling: Tailwind CSS for modern, responsive UI
Execution: DAG-based pipeline execution with topological sorting
Storage: File-based dataset storage with metadata management

📝 API Endpoints

POST /upload - Upload CSV datasets
GET /datasets - List uploaded datasets
GET /datasets/{id} - Get dataset details and preview
DELETE /datasets/{id} - Delete a dataset
POST /run - Execute a pipeline
WebSocket /ws/{run_id} - Real-time pipeline logs

🎯 Use Cases

Data Analysis: Quickly explore and transform datasets
ETL Prototyping: Build and test data pipelines visually
Data Science: Prepare data for analysis with custom transformations
Learning: Understand data processing workflows interactively
Reporting: Transform raw data into report-ready formats

🔒 Security Notes

Python transforms run in a restricted execution environment
File uploads are validated and sanitized
Data is stored locally within Docker volumes

🛠️ Development

To extend InfraTool:

Add new node types: Extend the backend executor and frontend node library
Add data sources: Support databases, APIs, or other file formats
Enhanced transforms: Add support for SQL, R, or other languages
Output options: Add database sinks, file exports, or API calls

📄 License

MIT License - feel free to use and modify for your needs.

Happy Data Processing! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.do		.do
backend		backend
data		data
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
DEPLOY.md		DEPLOY.md
DEPLOYMENT_SUMMARY.md		DEPLOYMENT_SUMMARY.md
Dockerfile.api		Dockerfile.api
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
Dockerfile.web		Dockerfile.web
Makefile		Makefile
README.md		README.md
demo_transforms.py		demo_transforms.py
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf
requirements.txt		requirements.txt
sample_data.csv		sample_data.csv
transform_examples.md		transform_examples.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfraTool - Visual ETL Pipeline Builder

🚀 Features

🛠️ Getting Started

Prerequisites

Quick Start

📋 Available Node Types

📄 CSV Source

🐍 Python Transform

📺 Console Sink

💡 Example Workflows

Basic Data Filtering

Data Enrichment

Data Aggregation

🔧 Architecture

📝 API Endpoints

🎯 Use Cases

🔒 Security Notes

🛠️ Development

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InfraTool - Visual ETL Pipeline Builder

🚀 Features

🛠️ Getting Started

Prerequisites

Quick Start

📋 Available Node Types

📄 CSV Source

🐍 Python Transform

📺 Console Sink

💡 Example Workflows

Basic Data Filtering

Data Enrichment

Data Aggregation

🔧 Architecture

📝 API Endpoints

🎯 Use Cases

🔒 Security Notes

🛠️ Development

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages