Skip to content

Commit

Permalink
clean up sqlmesh
Browse files Browse the repository at this point in the history
  • Loading branch information
matsonj committed Apr 6, 2024
1 parent 6f2c5a9 commit 31fab25
Show file tree
Hide file tree
Showing 12 changed files with 108 additions and 84 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
/venv
/.meltano
.env
/transform/dbt_packages
/transform/.tmp
Expand All @@ -15,4 +14,4 @@
/logs
/sqlmesh/logs
/sqlmesh/.cache
/sqlmesh/config.yaml
*.pyc
11 changes: 5 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
build:
pip install -r requirements.txt
pipx ensurepath
meltano invoke dbt-duckdb deps
meltano invoke evidence npm install
cd transform && dbt deps
cd evidence && npm install
mkdir -p data/data_catalog/raw
mkdir -p data/data_catalog/prep
mkdir -p data/data_catalog/simulator
mkdir -p data/data_catalog/analysis

run:
meltano invoke dbt-duckdb build
meltano invoke evidence npm run sources
cd transform && dbt build
cd evidence && npm run sources

dev:
meltano invoke evidence dev
cd evidence && npm run dev -- --host 0.0.0.0

serve:
rm -rf evidence/build
Expand All @@ -30,7 +30,6 @@ docker-build:
docker-run-evidence:
docker run \
--publish 3000:3000 \
--env MELTANO_CLI_LOG_LEVEL=WARNING \
--env MDS_SCENARIOS=10000 \
--env MDS_INCLUDE_ACTUALS=true \
--env MDS_LATEST_RATINGS=true \
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
The latest version of the project is available at [mdsinabox.com](http://www.mdsinabox.com). The website embraces the notion of "Serverless BI" - the pages are built asynchronously with open source software on commodity hardware and then pushed to a static site. The github action that automatically deploys the site upon PR can be [found here](https://github.com/matsonj/nba-monte-carlo/blob/master/.github/workflows/deploy_on_netlify.yml).

# MDS-in-a-box
This project serves as end to end example of running the "Modern Data Stack" on a single node. The components are designed to be "hot swappable", using Meltano to create clearly defined interfaces between discrete components in the stack. It runs in many enviroments with many visualization options. In addition, the data transformation documentation is [self hosted on github pages](https://matsonj.github.io/nba-monte-carlo/#!/overview).
This project serves as end to end example of running the "Modern Data Stack" on a single node. The components are designed to be "hot swappable", using makefile to create clearly defined interfaces between discrete components in the stack. It runs in many enviroments with many visualization options. In addition, the data transformation documentation is [self hosted on github pages](https://matsonj.github.io/nba-monte-carlo/#!/overview).
## Many Environments
It runs practically anywhere, and has been tested in the environments below.

Expand Down Expand Up @@ -61,8 +61,8 @@ sudo apt-get install python3.9 python3-pip python3.9-venv
```
4. clone the this repo.
```
mkdir meltano-projects
cd meltano-projects
mkdir my_projects
cd my_projects
git clone https://github.com/matsonj/nba-monte-carlo.git
# Go one folder level down into the folder that git just created
cd nba-monte-carlo
Expand Down Expand Up @@ -95,7 +95,6 @@ docker-build:
docker-run-evidence:
docker run \
--publish 8088:8088 \
--env MELTANO_CLI_LOG_LEVEL=WARNING \
--env MDS_SCENARIOS=10000 \
--env MDS_INCLUDE_ACTUALS=true \
--env MDS_LATEST_RATINGS=true \
Expand Down
19 changes: 18 additions & 1 deletion data/nba/nba_results.csv
Original file line number Diff line number Diff line change
Expand Up @@ -1138,4 +1138,21 @@ Wed Apr 3 2024,7:30p,Indiana Pacers,111,Brooklyn Nets,115,Box Score,,17732,Barcl
Wed Apr 3 2024,8:00p,Memphis Grizzlies,111,Milwaukee Bucks,101,Box Score,,17420,Fiserv Forum,
Wed Apr 3 2024,8:00p,Toronto Raptors,85,Minnesota Timberwolves,133,Box Score,,18024,Target Center,
Wed Apr 3 2024,8:00p,Orlando Magic,117,New Orleans Pelicans,108,Box Score,,16427,Smoothie King Center,
Wed Apr 3 2024,10:00p,Cleveland Cavaliers,101,Phoenix Suns,122,Box Score,,17071,Footprint Center,
Wed Apr 3 2024,10:00p,Cleveland Cavaliers,101,Phoenix Suns,122,Box Score,,17071,Footprint Center,
Thu Apr 4 2024,7:30p,Atlanta Hawks,95,Dallas Mavericks,109,Box Score,,20211,American Airlines Center,
Thu Apr 4 2024,7:30p,Philadelphia 76ers,109,Miami Heat,105,Box Score,,19719,Kaseya Center,
Thu Apr 4 2024,7:30p,Sacramento Kings,109,New York Knicks,120,Box Score,,19812,Madison Square Garden (IV),
Thu Apr 4 2024,8:00p,Golden State Warriors,133,Houston Rockets,110,Box Score,,18055,Toyota Center,
Thu Apr 4 2024,10:00p,Denver Nuggets,100,Los Angeles Clippers,102,Box Score,,19370,Crypto.com Arena,
Fri Apr 5 2024,7:00p,Orlando Magic,115,Charlotte Hornets,124,Box Score,,16374,Spectrum Center,
Fri Apr 5 2024,7:00p,Oklahoma City Thunder,112,Indiana Pacers,126,Box Score,,17274,Gainbridge Fieldhouse,
Fri Apr 5 2024,7:00p,Portland Trail Blazers,108,Washington Wizards,102,Box Score,,18079,Capital One Arena,
Fri Apr 5 2024,7:30p,Sacramento Kings,100,Boston Celtics,101,Box Score,,,TD Garden,
Fri Apr 5 2024,8:00p,New York Knicks,100,Chicago Bulls,108,Box Score,,21599,United Center,
Fri Apr 5 2024,8:00p,Miami Heat,119,Houston Rockets,104,Box Score,,18055,Toyota Center,
Fri Apr 5 2024,8:00p,Detroit Pistons,90,Memphis Grizzlies,108,Box Score,,16745,FedEx Forum,
Fri Apr 5 2024,8:00p,Toronto Raptors,117,Milwaukee Bucks,111,Box Score,,17750,Fiserv Forum,
Fri Apr 5 2024,8:00p,San Antonio Spurs,111,New Orleans Pelicans,109,Box Score,,17422,Smoothie King Center,
Fri Apr 5 2024,8:30p,Golden State Warriors,106,Dallas Mavericks,108,Box Score,,20425,American Airlines Center,
Fri Apr 5 2024,10:00p,Minnesota Timberwolves,87,Phoenix Suns,97,Box Score,,17071,Footprint Center,
Fri Apr 5 2024,10:30p,Utah Jazz,102,Los Angeles Clippers,131,Box Score,,19370,Crypto.com Arena,
2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

106 changes: 53 additions & 53 deletions evidence/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions evidence/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@
"type": "module",
"dependencies": {
"@evidence-dev/bigquery": "^2.0.1",
"@evidence-dev/core-components": "^3.7.0",
"@evidence-dev/csv": "^1.0.2",
"@evidence-dev/core-components": "^3.7.2",
"@evidence-dev/csv": "^1.0.7",
"@evidence-dev/databricks": "^1.0.1",
"@evidence-dev/duckdb": "^1.0.2",
"@evidence-dev/evidence": "^31.0.0",
"@evidence-dev/duckdb": "^1.0.7",
"@evidence-dev/evidence": "^31.0.2",
"@evidence-dev/mssql": "^1.0.1",
"@evidence-dev/mysql": "^1.0.1",
"@evidence-dev/postgres": "^1.0.1",
Expand Down
14 changes: 4 additions & 10 deletions evidence/pages/about/how_it_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,24 @@

## Env. config
- devcontainer (python + node)
- meltano then handles all the python environment stuff

## Extract
- singer taps / meltano extractors
- using spreadsheets anywhere tap because it can grab data from basically anywhere
- leveraging meltano mappers to enhance data with timestamps (for later)
- invoked with ```meltano run tap-spreadsheets-anywhere mapper-timestamps target-parquet```
- tbd - extraction is via "copy + paste" today - although looking to implement dlthub "soon"

## Load
- using singer target / meltano loaders
- using parquet target as for openness & portability
- considered target-duckdb but ran into a few issues
- using seeds in dbt - but related to extract above, have the same considerations

## Transform
- using dbt-duckdb + external tables
- data can be consumed post transformation from either duckdb file or from the output parquet files
- in all other ways, it is a normal dbt-core project
- invoked with ```meltano invoke dbt-duckdb build```
- invoked with ```make run```

## Analyze
- using evidence.dev
- can handle some final transforms as well, queries are staged and pages are built out in markdown
- because evidence doesn't support pathing, have to copy files into the evidence directory
- invoked with ```npm run dev``` and soon from meltano as well
- invoked with ```make dev```

## Other
- take a look at the [makefile](https://github.com/matsonj/nba-monte-carlo/blob/master/Makefile) and the [deploy github action](https://github.com/matsonj/nba-monte-carlo/blob/master/.github/workflows/deploy_on_netlify.yml) to see how the pieces fit together in prod.
2 changes: 1 addition & 1 deletion evidence/pages/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

Welcome to the [NBA monte carlo simulator](https://github.com/matsonj/nba-monte-carlo) project. Evidence is used as the as data visualization & analysis part of [MDS in a box](https://www.dataduel.co/modern-data-stack-in-a-box-with-duckdb/).

This project leverages duckdb, meltano, dbt, and evidence and builds and runs about once per day in a github action. You can learn more about this on [this page](/about).
This project leverages duckdb, make, dbt, and evidence and builds and runs about once per day in a github action. You can learn more about this on [this page](/about).

## [NBA Model](/nba)

Expand Down
Loading

0 comments on commit 31fab25

Please sign in to comment.