This is a demo project using Citus on top of Postgres to ingest Github events data and run analytical queries. I originally saw this on HackerNews in an article talking about implementing seen-by functionality in Postgres. From there I went to the CitusData documentation to try it out. This project is a combination of the tutorials in the docs and other analysis I did myself.
- PostgreSQL version 14.4
- Docker version 20.10.11
- Citus version 11.0
Within the project, there is a data_sets folder which contains event data about public activity on Github and a users file which contains data about the Github users.
In addition the project contains a folder called scripts which contains the SQL commands needed to create the tables, indexes, and distributed tables.
NOTE: I am running Citus via a Docker container, I had to make some adjustments to copy the .csv data from the data sets files into the target tables in Postgres.