Skip to content

roshanlam/FoundryLite

Repository files navigation

InfraTool - Visual ETL Pipeline Builder

A modern, drag-and-drop ETL (Extract, Transform, Load) pipeline builder with real-time execution and monitoring.

🚀 Features

  • Visual Pipeline Builder: Drag and drop nodes to create data processing pipelines
  • File Upload: Upload CSV datasets directly through the web interface
  • Dataset Management: Browse, preview, and manage your uploaded datasets
  • Real-time Execution: Watch your pipeline execute with live logging
  • Python Transforms: Write custom Python code to transform your data
  • Node Configuration: Double-click nodes to configure them with an intuitive UI

🛠️ Getting Started

Prerequisites

  • Docker and Docker Compose
  • A modern web browser

Quick Start

  1. Start the application:

    docker compose up --build
  2. Open your browser and navigate to:

  3. Upload a dataset:

    • Click on the "Datasets" tab in the sidebar
    • Upload a CSV file (try the included sample_data.csv)
    • View the dataset metadata and preview
  4. Build a pipeline:

    • Click on the "Nodes" tab
    • Add a "CSV Source" node and double-click to configure it
    • Add a "Python Transform" node and write your transformation code
    • Add a "Console Sink" node to see the results
    • Connect the nodes by dragging between their connection points
  5. Run your pipeline:

    • Click the "▶️ Run Pipeline" button
    • Watch the real-time logs as your pipeline executes

📋 Available Node Types

📄 CSV Source

Loads data from uploaded CSV files.

  • Configuration: Select from your uploaded datasets

🐍 Python Transform

Transforms data using custom Python code.

  • Configuration: Write a transform(df) function that takes a pandas DataFrame and returns a modified DataFrame
  • Example:
    def transform(df):
        df['total_compensation'] = df['salary'] * 1.2
        df['age_group'] = df['age'].apply(lambda x: 'Young' if x < 30 else 'Senior')
        return df[df['salary'] > 70000]  # Filter high earners

📺 Console Sink

Displays the processed data in the logs panel.

  • Configuration: Optional label for identification

💡 Example Workflows

Basic Data Filtering

  1. Upload employee data CSV
  2. CSV Source → Python Transform → Console Sink
  3. Transform code filters employees by criteria
  4. View filtered results in console

Data Enrichment

  1. Upload sales data CSV
  2. CSV Source → Python Transform → Console Sink
  3. Transform code adds calculated fields (tax, commission, etc.)
  4. View enriched data with new columns

Data Aggregation

  1. Upload transaction data CSV
  2. CSV Source → Python Transform → Console Sink
  3. Transform code groups and summarizes data
  4. View aggregated results

🔧 Architecture

  • Backend: FastAPI with WebSocket support for real-time logging
  • Frontend: React + TypeScript with React Flow for visual pipeline building
  • Styling: Tailwind CSS for modern, responsive UI
  • Execution: DAG-based pipeline execution with topological sorting
  • Storage: File-based dataset storage with metadata management

📝 API Endpoints

  • POST /upload - Upload CSV datasets
  • GET /datasets - List uploaded datasets
  • GET /datasets/{id} - Get dataset details and preview
  • DELETE /datasets/{id} - Delete a dataset
  • POST /run - Execute a pipeline
  • WebSocket /ws/{run_id} - Real-time pipeline logs

🎯 Use Cases

  • Data Analysis: Quickly explore and transform datasets
  • ETL Prototyping: Build and test data pipelines visually
  • Data Science: Prepare data for analysis with custom transformations
  • Learning: Understand data processing workflows interactively
  • Reporting: Transform raw data into report-ready formats

🔒 Security Notes

  • Python transforms run in a restricted execution environment
  • File uploads are validated and sanitized
  • Data is stored locally within Docker volumes

🛠️ Development

To extend InfraTool:

  1. Add new node types: Extend the backend executor and frontend node library
  2. Add data sources: Support databases, APIs, or other file formats
  3. Enhanced transforms: Add support for SQL, R, or other languages
  4. Output options: Add database sinks, file exports, or API calls

📄 License

MIT License - feel free to use and modify for your needs.


Happy Data Processing! 🎉

About

A modern, drag-and-drop ETL (Extract, Transform, Load) pipeline builder with real-time execution and monitoring.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors