RetailFlow

Description

RetailFlow is a comprehensive ELT (Extract, Load, Transform) project designed to simulate the flow of retail sales data for an e-commerce platform. The infrastructure is provisioned and managed on AWS, with each service optimized for its specific role in the pipeline.

The data simulation is handled by a Python script executing within an AWS Lambda function. The generated data is then pushed to a PostgreSQL database instance deployed on AWS EC2.

Data is ingested using Airbyte into the data warehousing solution, Snowflake. Airbyte operates on its own EC2 instance, ensuring dedicated resources for the critical task of data synchronization.

For the transformation phase, we utilize a combination of Dagster and dbt, two cutting-edge tools in the data engineering ecosystem. These tools are deployed on an EC2 instance, allowing for a flexible and powerful transformation process.

The final piece of the pipeline is data visualization, which is handled by Metabase. Running on a dedicated EC2 instance, Metabase provides intuitive and insightful data analytics, allowing stakeholders to extract meaningful conclusions from the data.

The entire system is orchestrated using Terraform, an Infrastructure as Code (IaC) tool that simplifies and standardizes infrastructure deployment. On the application level, we utilize Docker for containerization, ensuring consistency across all stages of development and production.

Data Infrastructure

graph LR
  subgraph L["AWS Lambda"]
    style L fill:#e8fce8
    LA["generate_fake_data.py"]
  end
  subgraph EC2_1["EC2 Instance"]
    subgraph D1["Docker"]
      style D1 fill:#d4ebf2
      P["Postgres DB"]
    end
  end
  subgraph EC2_2["EC2 Instance"]
      A["Airbyte"]
  end
  subgraph EC2_5["Hosted on AWS"]
      S["Snowflake"]
  end
  subgraph EC2_3["EC2 Instance"]
    subgraph D3["Docker"]
      style D3 fill:#d4ebf2
      D["dbt +  Dagster"]
    end
  end
  subgraph EC2_4["EC2 Instance"]
    subgraph D4["Docker"]
      style D4 fill:#d4ebf2
      M["Metabase"]
    end
  end
  L -- "Generates Fake Data" --> P
  P -- "Data Ingestion" --> A
  A -- "Data Loading" --> S
  S -- "Data Transformation" --> D
  D -- "Data Transformation" --> S
  S -- "Data Visualization" --> M


linkStyle 0 stroke:#2ecd71,stroke-width:2px;
linkStyle 1 stroke:#2ecd71,stroke-width:2px;
linkStyle 2 stroke:#2ecd71,stroke-width:2px;
linkStyle 3 stroke:#2ecd71,stroke-width:2px;
linkStyle 4 stroke:#2ecd71,stroke-width:2px;
linkStyle 5 stroke:#2ecd71,stroke-width:2px;

Requirements

AWS Account
AWS CLI (installed and configured)
Create a Snowflake account and note down the account_id, username and password
Docker
docker-compose
Terraform

You can install these requirements using the following command: brew install docker docker-compose awscli terraform

Instructions

To see a full list of commands, run make help

Run make venv-setup to create your virtual environment
Run make initial-config to set up everything related to containers, container orchestration, permissions, etc.
Run make infra-up to deploy the pipeline to AWS and wait until you see the 'All Ready' Message
Run make port-forwarding-airbyte and configure the PostgresDB source and the Snowflake destination (this cannot be done programmatically)
Run make port-forwarding-metabase and configure the Snowflake source (this cannot be done programmatically)
Explore the remainder of the project by running make port-forwarding-dbt, make ssh-postgres, make port-forwarding-dagster, make open-snowflake or make print-lambda to interact with the ec2 instances (port-forwarding, ssh, information, etc.)
Once you are finished, run make infra-down

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
generate		generate
helpers		helpers
ingestion/airbyte		ingestion/airbyte
storage/postgres		storage/postgres
terraform		terraform
transformation		transformation
visualization		visualization
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RetailFlow

Description

Data Infrastructure

Requirements

Instructions

About

Releases

Packages

Languages

oresttokovenko/RetailFlow

Folders and files

Latest commit

History

Repository files navigation

RetailFlow

Description

Data Infrastructure

Requirements

Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages