Skip to content

josephmachado/e2e_datapipeline_test

Repository files navigation

End to End data pipeline test

Code for the post Setting up end-to-end tests for cloud data pipelines

Architecture

This is what our data pipeline architecture looks like.

Architecture

For our local setup, we will use

  1. Open source sftp server
  2. Moto server to mock S3 and Lambda
  3. Postgres as a substitute for AWS Redshift

Local Architecture

Prerequisites & Setup

To run, you will need

  1. Docker
  2. Python3.6 or above

Clone, create a virtual env, set up python path, spin up containers and run tests as shown below.

git clone https://github.com/josephmachado/e2e_datapipeline_test.git
python -m venv ./env
source env/bin/activate # use virtual environment
pip install -r requirements.txt
make up # spins up the SFTP, Motoserver, Warehouse docker containers
export PYTHONPATH=${PYTHONPATH}:./src # set path to enable imports

Run tests

We can run our tests using pytest.

pytest # runs all tests under the ./test folder

Clean up

make ci

Tear down

make down # spins down the docker containers
deactivate # stop using the virtual environment