ETL-json-to-SQL

Extract data from JSON file, transform it and load it to PostgreSQL

Description

This repository contains two solutions for "extracting", "transforming" and "loading" data to Postgresql. One solution is written in NodeJS and the second one in Golang.

The focus is how to load data in a performant way (streaming ??) and after transforming save them in Postgresql (in bulk inserts).

In NodeJS solution I used multi rows insert, couldn't make the solution with COPY work. pg-copy-streams

In Golang solution I was able to use the COPY that Postgresql supports.

Dataset

The file used for running the ETL pipeline was taken from here. If it's not available you can also find it in this Google Drive. The default path that the file is loaded from is ./data-to-load but you can specify another path by setting the env variable FILE_PATH.

Number of records inside the matchups.json 1.312.252

This is the origin

{
  "p_match_id":"TR1_1201957752_top",
  "goldearned":14425,
  "totalminionskilled":194,
  "win":"false",
  "kills":14,
  "assists":5,
  "deaths":7,
  "champion":"Kassadin",
  "visionscore":17,
  "puuid":"phduyQLB8gBjUerFwiVOtyLLHE9jxw7Jq7dwab_CtRddAvzJ7L1uo5kWzLTKSqStAzml_3yGHiNPFA",
  "totaldamagedealttochampions":33426,
  "summonername":"Borke",
  "gameversion":"11.14.384.6677"
}

and we load it in Postgresql as

id UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,
gold_earned INTEGER,
minions_killed INTEGER,
kda INTEGER,
champion VARCHAR,
vision_score INTEGER,
summoner_name VARCHAR,
win BOOLEAN,
game_version VARCHAR,
damage_dealt_to_champions INTEGER,
lane VARCHAR,
region VARCHAR

Queries

Some queries you can run after loading the data

# Get the gold_earned for each lane

SELECT sum(gold_earned),lane 
FROM matchups
GROUP BY lane
ORDER BY sum DESC;

# Get the champions and number of games with the average kda

SELECT count(*) as games,champion,round(AVG(kda),0) as avg_kda 
FROM matchups 
GROUP BY champion 
ORDER BY avg_kda DESC;

# Get number of unique records in table

SELECT COUNT(*) 
FROM (
  SELECT DISTINCT * 
  FROM matchups
  ) as unique_rows;

Useful commands

Insert Data to Postgres from CSV:

\copy matchups(champion,
damage_dealt_to_champions,game_version,gold_earned,win,minions_killed,kda,lane,region,summoner_name,vision_score) from '/usr/log.csv' (FORMAT csv,DELIMITER ',');

Connect to postgresql container:

docker exec -it postgres psql -U ETL_user -d ETL_db

Running the project

For running etl in both versions

you need to have setup the .env or providing the correct environmental variables
running postgres, in the repo a docker-compose.yaml is provided for running postgres. You can run it with make docker-start

Node

Build code make node-build
Run code (after building) make node-run

Golang

Build code make go-build
Run code (after building) make go-run
Run linter make go-lint

Resources

Data Imports wiki doc for the NodeJS Postgres Driver
Performance Boost wiki doc for the NodeJS Postgres Driver
Multi row insert with pg-promise
Postgresql - Populating a database
Golang PGX
Golang Profiling

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
go-solution		go-solution
node-solution		node-solution
pg_seed		pg_seed
.editorconfig		.editorconfig
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
database.env		database.env
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go-solution

go-solution

node-solution

node-solution

pg_seed

pg_seed

.editorconfig

.editorconfig

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

database.env

database.env

docker-compose.yaml

docker-compose.yaml

Repository files navigation

ETL-json-to-SQL

Description

Dataset

Queries

Useful commands

Running the project

Resources

About

Releases

Packages

Languages

gkampitakis/ETL-json-to-SQL

Folders and files

Latest commit

History

Repository files navigation

ETL-json-to-SQL

Description

Dataset

Queries

Useful commands

Running the project

Resources

About

Topics

Resources

Stars

Watchers

Forks

Languages