Skip to content

mcielinski/large-scale-data-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

large-scale-data-processing

Assignments for the Large Scale Data Processing course at WUST (Wroclaw University of Science and Technology).

The main goal of the laboratories was to implement a distributed tool for analyzing data collected from Reddit with a given architecture: Architecture

Laboratories scopes

L1

  • Linux - bash, ssh, scp, tmux, htop, kill, killall, pipe operator, ls, sed, vim, cat
  • Docker - Dockerfile, docker-compose, containers in general
  • Python - pip, virtualenv, requirements, tox
  • Parallelize computation in Python

L2

  • Docker - Dockerfile, docker-compose, containers in general
  • Python - pip, requirements
  • Celery
  • Task queue (RabbitMQ)
  • System monitoring (Prometheus / InfluxDB)
  • Reddit API

L3

  • Text embedding (magnitude)
  • Data persistency (MongoDB)
  • Data analysis (Redash)

L4

  • pySpark
  • Linear regression
  • Binary classification
  • Multi-class classification

L5

  • Kubernetes
  • K3s
  • Helm
  • Docker
  • Application deployment (AWS EC2)

L6

  • Serving
  • API (Flask)
  • SPA (Streamlit)

About

Assignments for the Large Scale Data Processing course at WUST (Wroclaw University of Science and Technology).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors