It's a personal Data Warehouse to test out different APIs and Tools like:
- Kafka
- Luigi
- Docker
- Mongo and PostgreSQL
- Perform ETL tasks with Luigi
- Kafka Streams for the Twitter Stream API
- MongoDB as a Archive
- PostgreSQL for the transformed Data
- Generate docker environment dynamically
- Kafka
- MongoDB
- PostgreSQL
- Python
- Jinja2
- Kafka wrapper
- Luigi
- PyMongo
- SQLAlchemy
- Tweepy
To run this project, you will need to add the following environment variables to your .env file
POSTGRES_USER
POSTGRES_PASSWORD
POSTGRES_HOST
POSTGRES_DB
TWITTER_CONSUMER_KEY
TWITTER_CONSUMER_KEY_SECRET
TWITTER_ACCESS_TOKEN
TWITTER_ACCESS_TOKEN_SECRET
MONGO_USER
MONGO_PASSWORD
MONGO_DB
To install the project either use pip
or poetry
pip install -r requirements.txt
or
poetry install
Clone the project
git clone https://github.com/stejul/smikic-dwh
Generate docker environment
python dwh/utils/create_docker_environment.py
Change the MongoDB credentials for the MongoDB-Kafka connector:
kafka_docker/connector/MongoSinkConnector.properties
Start the docker environment
docker-compose up