This repository contains the setup for the data modeling modules in Weeks 1 and 2.
🔧 Tech Stack
- Git
- Postgres
- PSQL CLI
- Database management environment (DataGrip, DBeaver, VS Code with extensions, etc.)
- Docker, Docker Compose, and Docker Desktop
📝 TL;DR
- Clone the repository.
- Start Postgres instance.
- Connect to Postgres using your preferred database management tool.
For detailed instructions and more information, please refer to the step-by-step instructions below.
-
Clone the repo using the SSH link. This will create a new folder in the current directory on your local machine.
git clone git@github.com:DataExpert-io/data-engineer-handbook.git
ℹ️ To securely interact with GitHub repositories, it is recommended to use SSH keys. Follow the instructions provided here to set up SSH keys on GitHub.
-
Navigate into the cloned repo using the command line:
cd data-engineer-handbook/bootcamp/materials/1-dimensional-data-modeling
There are two methods to get Postgres running locally.
-
Install Postgres
-
Run this command after replacing
<computer-username>
with your computer's username:psql -U <computer-username> postgres < data.dump
-
Set up DataGrip, DBeaver, or your VS Code extension to point at your locally running Postgres instance.
-
Have fun querying!
-
Install Docker Desktop from here.
-
Copy
example.env
to.env
:cp example.env .env
-
Start the Docker Compose container:
-
If you're on Mac:
make up
-
If you're on Windows:
docker compose up -d
-
-
A folder named
postgres-data
will be created in the root of the repo. The data backing your Postgres instance will be saved here. -
You can check that your Docker Compose stack is running by either:
- Going into Docker Desktop: you should see an entry there with a drop-down for each of the containers running in your Docker Compose stack.
- Running
docker ps -a
and looking for the containers with the namepostgres
.
-
When you're finished with your Postgres instance, you can stop the Docker Compose containers with:
make down
Or if you're on Windows:
docker compose down -v
Refer to the instructions below to resolve the issue when the data dump fails to load tables, displaying the message
PostgreSQL Database directory appears to contain a database; Skipping initialization.
-
Some options for interacting with your Postgres instance:
- DataGrip - JetBrains; 30-day free trial or paid version.
- VSCode built-in extension (there are a few of these).
- PGAdmin.
- Postbird.
-
Using your client of choice, follow the instructions to establish a new PostgreSQL connection.
- The default username is
postgres
and corresponds to$POSTGRES_USER
in your.env
. - The default password is
postgres
and corresponds to$POSTGRES_PASSWORD
in your.env
. - The default database is
postgres
and corresponds to$POSTGRES_DB
in your.env
. - The default host is
localhost
or0.0.0.0
. This is the IP address of the Docker container running the PostgreSQL instance. - The default port for Postgres is
5432
. This corresponds to the$CONTAINER_PORT
variable in the.env
file.
→ 💡 You can edit these values by modifying the corresponding values in
.env
. - The default username is
-
If the test connection is successful, click "Finish" or "Save" to save the connection. You should now be able to use the database client to manage your PostgreSQL database locally.
- If you are on Windows and used
docker compose up
, table creation and data load will not take place with container creation. Once you have docker container up and verified that you are able to connect to empty postgres database with your own choice of client, follow the following steps:
- On Docker desktop, connect to my-postgres-container terminal.
- Run:
psql \ -v ON_ERROR_STOP=1 \ --username $POSTGRES_USER \ --dbname $POSTGRES_DB \ < /docker-entrypoint-initdb.d/data.dump>
- → This will run the file
data.dump
from inside your docker container.
- → This will run the file
- If the tables don't come with the loaded data, follow these steps with manual installation of postgres:
-
Find where your
psql
client is installed (Something likeC:\\Program Files\\PostgreSQL\\13\\runpsql.bat
) -
Make sure you're in the root of the repo, and launch
psql
by running that.bat
script -
Enter your credentials for postgres (described in the connect to postgres section)
- → If the above worked, you should now be inside a psql REPL (It looks like
postgres=#
)
- → If the above worked, you should now be inside a psql REPL (It looks like
-
Run:
postgres=# \\i data.dump
- → This will run the file
data.dump
from inside your psql REPL.
- → This will run the file
- If you did the setup using Option 2, and the tables are not in the database, another solution is to:
- Find the container id by running
docker ps
- under CONTAINER ID - Go inside the container by executing
docker exec -it <container_name_or_id> bash
- Run
pg_restore -U $POSTGRES_USER -d $POSTGRES_DB /docker-entrypoint-initdb.d/data.dump
- To restart the Postgres instance, you can run
make restart
. - To see logs from the Postgres container, run
make logs
. - To inspect the Postgres container, run
make inspect
. - To find the port Postgres is running on, run
make ip
.