Skip to content

Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market

License

Notifications You must be signed in to change notification settings

franloza/coches-net-dashboard

Repository files navigation

coches-net-dashboard

Coches.net dashboard is an application made by @franloza and @Jorjatorz to visualize Spanish car and motorcycle market through the data obtained from the websites coches.net and motos.coches.net.

This project is composed of a data pipeline run with Dagster, which extract data from coches.net, apply some transformations using dbt, and stores the data in a DuckDB database. The data can be analyzed with a dashboard built with Dash.

This application should not be considered suitable for production, and it's intended to be used only locally for analytical purposes.

This project can also serve an example of how to structure an end-to-end data application, ready to be deployed with a single command using Docker.

Quickstart

To run the application simply run

docker-compose up -d

This will build the Docker images (both Dagster and Dash). The Dagster dashboard is then available on http://localhost:3000 and the Dash dashboard in http://localhost:80.

To download the data, go to the Dagster Dashboard, go to "Launchpad" tab and click "Launch Run". After all processes have been completed, go to the Dash Dashboard, add some filters and click "Search" to visualize the data.

The process to download the data can take a while to finish. If you want to limit the amount of data that is downloaded, you can add the following configuration in the Launchpad:

ops:
  download_coches:
    config:
      max_items: <maximum number of cars to download>
  download_motos:
    config:
      max_items: <maximum number of motorcycles to download>

Screenshots

DAG Dagster Launchpad Tasks successful Dashboard

Local Development (Without using Docker)

Pre-requirements

You will need to have pyenv installed. You can find the installation instructions here.

Orchestration and transformation

  1. Create a new Python environment and activate.
# You can use a different Python version
export PYTHON_VERSION=3.8.14
pyenv install -s $PYTHON_VERSION
pyenv local $PYTHON_VERSION
python -m venv venv
source venv/bin/activate
  1. Once you have activated your Python environment, install the orchestration repository as a Python package. By using the --editable flag, pip will install your repository in "editable mode" so that as you develop, local code changes will automatically apply.
pip install -r requirements.txt --editable orchestration 
  1. Set the DAGSTER_HOME environment variable. Dagster will store run history in this directory.
mkdir ~/dagster_home
export DAGSTER_HOME=~/dagster_home
cat <<EOT >> $DAGSTER_HOME/dagster.yaml
telemetry:
  enabled: false
EOT
  1. Start the Dagit process. This will start a Dagit web server that, by default, is served on http://localhost:3000.
dagit -w orchestration/workspace.yaml

In case you want to debug the ingestion jobs, you can use a helper module named main.py with an entrypoint to be run from IDEs like VSCode or Pycharm.

Visualization

  1. Create a new Python environment and activate.
cd visualization
deactivate >/dev/null 2>&1 || true
# You can use a different Python version
export PYTHON_VERSION=3.8.14
pyenv install -s $PYTHON_VERSION
pyenv local $PYTHON_VERSION
python -m venv venv
source venv/bin/activate
  1. Once you have activated your Python environment, install the dependencies using the requirements.txt file.
pip install -r requirements.txt

Start the Gunicorn process. This will start a Dash web application that will be served on http://localhost:80.

gunicorn -b 0.0.0.0:80 app:server