Assignments for the Large Scale Data Processing course at WUST (Wroclaw University of Science and Technology).
The main goal of the laboratories was to implement a distributed tool for analyzing data collected from Reddit with a given architecture:

- Linux - bash, ssh, scp, tmux, htop, kill, killall, pipe operator, ls, sed, vim, cat
- Docker - Dockerfile, docker-compose, containers in general
- Python - pip, virtualenv, requirements, tox
- Parallelize computation in Python
- Docker - Dockerfile, docker-compose, containers in general
- Python - pip, requirements
- Celery
- Task queue (RabbitMQ)
- System monitoring (Prometheus / InfluxDB)
- Reddit API
- Text embedding (magnitude)
- Data persistency (MongoDB)
- Data analysis (Redash)
- pySpark
- Linear regression
- Binary classification
- Multi-class classification
- Kubernetes
- K3s
- Helm
- Docker
- Application deployment (AWS EC2)
- Serving
- API (Flask)
- SPA (Streamlit)