Skip to content

paingzinhtun/SQL-for-Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL for Data Engineering: Beginner to Advanced

SQL Case Study Level Project Style

A connected, production-style SQL learning repository built around one end-to-end business case:

Mingalar Mart, a fictional Myanmar retail chain with stores, delivery operations, social commerce, customer activity data, and branch inventory tracking.

This repo is designed to feel like a real Data Engineering project:

  • one connected story,
  • one shared schema,
  • one growing set of business questions,
  • and one clear path from SQL basics to pipeline thinking.

Why This Repo Feels Different

Most SQL repos teach syntax in isolation. This one teaches SQL the way data teams use it:

  • understand the source system,
  • inspect business grain,
  • join messy real-world tables,
  • build correct KPIs,
  • transform raw data into marts,
  • validate logic,
  • and explain tradeoffs clearly.

The whole course stays inside the same Myanmar retail world, so each module builds on the last instead of starting over.

The Business Story

You are the Data Engineer supporting Mingalar Mart's analytics stack.

The business runs across multiple cities in Myanmar and needs help answering questions like:

  • Which branches are strongest by completed sales?
  • Which channels perform best: in_store, delivery_app, facebook_live, or call_center?
  • Which product categories drive revenue in Yangon vs Mandalay?
  • Which customers are becoming repeat buyers?
  • Which products are at stock risk?
  • Which app campaigns generate interest but weak order conversion?

What You Learn

Foundations

  • read schemas and understand grain
  • filter, sort, and inspect retail transactions
  • work safely with keys and relationships

Core Analytics

  • join orders, customers, products, stores, and payments
  • aggregate branch, category, and channel KPIs
  • express business rules with conditional logic

Advanced SQL

  • build staged logic with CTEs
  • use windows for recency, ranking, and running metrics
  • clean raw retail data for reporting

Data Engineering Thinking

  • design staging and mart layers
  • write validation SQL
  • reason about performance and modeling
  • solve real-world retail analytics patterns

Dataset Snapshot

The repository includes a synthetic but realistic Myanmar retail dataset with:

Entity Rows
Stores 12
Suppliers 6
Products 48
Customers 140
Orders 650
Order Items 1,594
Payments 650
App Events 1,311
Delivery Events 1,636
Inventory Snapshots 576

Architecture

flowchart LR
    A[Raw Retail Sources] --> B[Staging SQL]
    B --> C[Intermediate Models]
    C --> D[Retail Marts]
    D --> E[Finance Reporting]
    D --> F[Operations Reporting]
    D --> G[Growth Analytics]
Loading

Repository Tour

README.md                 -> public landing page
ROADMAP.md                -> full curriculum roadmap
docs/storyline.md         -> business context and metric definitions
datasets/schemas/         -> schema and seed SQL
datasets/raw/             -> source-style CSV data
modules/                  -> module lessons from beginner to advanced
exercises/                -> practice prompts and answer keys
queries/                  -> practice SQL, solutions, reusable patterns
projects/                 -> mini-projects and capstones

Learning Journey

Stage Modules Outcome
1 00-02 Understand the business, schema, keys, and raw tables
2 03-05 Build joins, KPIs, and business logic correctly
3 06-08 Use advanced SQL to transform and clean retail data
4 09-12 Think like a Data Engineer: marts, validation, modeling, performance
5 13-16 Build projects, capstones, and interview-ready portfolio work

Quick Start

  1. Read docs/storyline.md
  2. Load course_schema.sql
  3. Load course_seed.sql
  4. Start with modules/00_orientation/README.md
  5. Practice with queries/practice
  6. Solve exercises before checking exercises/solutions

Example Query Style

WITH order_sales AS (
    SELECT
        oi.order_id,
        SUM(oi.line_total_mmk) AS net_item_sales_mmk
    FROM order_items AS oi
    GROUP BY oi.order_id
)
SELECT
    s.city,
    o.sales_channel,
    COUNT(DISTINCT o.order_id) AS completed_orders,
    SUM(os.net_item_sales_mmk + o.delivery_fee_mmk) AS net_order_sales_mmk
FROM orders AS o
INNER JOIN order_sales AS os
    ON o.order_id = os.order_id
INNER JOIN stores AS s
    ON o.store_id = s.store_id
WHERE o.order_status = 'completed'
GROUP BY s.city, o.sales_channel
ORDER BY net_order_sales_mmk DESC;

Best Entry Points

Portfolio Value

By the end of this repo, you should be able to talk through:

  • retail source-to-mart SQL design,
  • branch and channel KPI logic,
  • grain and duplication pitfalls,
  • validation and idempotent transformation patterns,
  • star schema basics,
  • and practical SQL reasoning in interviews.

Suggested Next Moves

  • rebuild one mart in dbt
  • port the models to BigQuery or Snowflake
  • add tests and freshness checks
  • orchestrate refreshes with Airflow

Contributing

See CONTRIBUTING.md. Contributions should preserve the connected Mingalar Mart storyline rather than introducing unrelated toy examples.

About

Production-style SQL for Data Engineering with a connected Myanmar retail case study, realistic datasets, exercises, and projects.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors