A modern, drag-and-drop ETL (Extract, Transform, Load) pipeline builder with real-time execution and monitoring.
- Visual Pipeline Builder: Drag and drop nodes to create data processing pipelines
- File Upload: Upload CSV datasets directly through the web interface
- Dataset Management: Browse, preview, and manage your uploaded datasets
- Real-time Execution: Watch your pipeline execute with live logging
- Python Transforms: Write custom Python code to transform your data
- Node Configuration: Double-click nodes to configure them with an intuitive UI
- Docker and Docker Compose
- A modern web browser
-
Start the application:
docker compose up --build
-
Open your browser and navigate to:
- Web App: http://localhost:3000
- API Docs: http://localhost:8000/docs
-
Upload a dataset:
- Click on the "Datasets" tab in the sidebar
- Upload a CSV file (try the included
sample_data.csv) - View the dataset metadata and preview
-
Build a pipeline:
- Click on the "Nodes" tab
- Add a "CSV Source" node and double-click to configure it
- Add a "Python Transform" node and write your transformation code
- Add a "Console Sink" node to see the results
- Connect the nodes by dragging between their connection points
-
Run your pipeline:
- Click the "
▶️ Run Pipeline" button - Watch the real-time logs as your pipeline executes
- Click the "
Loads data from uploaded CSV files.
- Configuration: Select from your uploaded datasets
Transforms data using custom Python code.
- Configuration: Write a
transform(df)function that takes a pandas DataFrame and returns a modified DataFrame - Example:
def transform(df): df['total_compensation'] = df['salary'] * 1.2 df['age_group'] = df['age'].apply(lambda x: 'Young' if x < 30 else 'Senior') return df[df['salary'] > 70000] # Filter high earners
Displays the processed data in the logs panel.
- Configuration: Optional label for identification
- Upload employee data CSV
- CSV Source → Python Transform → Console Sink
- Transform code filters employees by criteria
- View filtered results in console
- Upload sales data CSV
- CSV Source → Python Transform → Console Sink
- Transform code adds calculated fields (tax, commission, etc.)
- View enriched data with new columns
- Upload transaction data CSV
- CSV Source → Python Transform → Console Sink
- Transform code groups and summarizes data
- View aggregated results
- Backend: FastAPI with WebSocket support for real-time logging
- Frontend: React + TypeScript with React Flow for visual pipeline building
- Styling: Tailwind CSS for modern, responsive UI
- Execution: DAG-based pipeline execution with topological sorting
- Storage: File-based dataset storage with metadata management
POST /upload- Upload CSV datasetsGET /datasets- List uploaded datasetsGET /datasets/{id}- Get dataset details and previewDELETE /datasets/{id}- Delete a datasetPOST /run- Execute a pipelineWebSocket /ws/{run_id}- Real-time pipeline logs
- Data Analysis: Quickly explore and transform datasets
- ETL Prototyping: Build and test data pipelines visually
- Data Science: Prepare data for analysis with custom transformations
- Learning: Understand data processing workflows interactively
- Reporting: Transform raw data into report-ready formats
- Python transforms run in a restricted execution environment
- File uploads are validated and sanitized
- Data is stored locally within Docker volumes
To extend InfraTool:
- Add new node types: Extend the backend executor and frontend node library
- Add data sources: Support databases, APIs, or other file formats
- Enhanced transforms: Add support for SQL, R, or other languages
- Output options: Add database sinks, file exports, or API calls
MIT License - feel free to use and modify for your needs.
Happy Data Processing! 🎉