large-scale-data-processing

Assignments for the Large Scale Data Processing course at WUST (Wroclaw University of Science and Technology).

The main goal of the laboratories was to implement a distributed tool for analyzing data collected from Reddit with a given architecture:

Laboratories scopes

L1

Linux - bash, ssh, scp, tmux, htop, kill, killall, pipe operator, ls, sed, vim, cat
Docker - Dockerfile, docker-compose, containers in general
Python - pip, virtualenv, requirements, tox
Parallelize computation in Python

L2

Docker - Dockerfile, docker-compose, containers in general
Python - pip, requirements
Celery
Task queue (RabbitMQ)
System monitoring (Prometheus / InfluxDB)
Reddit API

L3

Text embedding (magnitude)
Data persistency (MongoDB)
Data analysis (Redash)

L4

pySpark
Linear regression
Binary classification
Multi-class classification

L5

Kubernetes
K3s
Helm
Docker
Application deployment (AWS EC2)

L6

Serving
API (Flask)
SPA (Streamlit)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
l1		l1
l2		l2
l3		l3
l4		l4
l5		l5
l6		l6
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

large-scale-data-processing

Laboratories scopes

L1

L2

L3

L4

L5

L6

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

large-scale-data-processing

Laboratories scopes

L1

L2

L3

L4

L5

L6

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages