This project involved designing tables to hold data from CSV files, importing the CSV files into a SQL database, and then answering questions about the data. Essentially, performing data modeling, data engineering, and data analysis, respectively.
To model the data, I first inspected the CSV files, and then sketched an Entity Relationship Diagram of the tables. To create the sketch, I used a free tool called QuickDBD.
To perform the data engineering, I used the provided information (from the learning module along with the CSV files) to create a table schema for each of the six provided CSV files. Creating the schema involved specifying the data type(s) and creating the primary and foreign keys for each table. After creating the tables, I then imported the data from the CSV files into their respective table(s).
To perform the data analysis, a series of queries was executed on the database to collect the desired information from each of the tables.
This exercise allowed me to gain a further understanding of SQL, specifically, how it can be used to engineer and analyze data.
For coding work
- How to Import or Export CSVs to PostgreSQL Using pgAdmin: https://www.commandprompt.com/education/how-to-import-or-export-csvs-to-postgresql-using-pgadmin/#:~:text=To%20do%20that%2C%20right%2Dclick,CSVs%20to%20Postgres%20via%20pgAdmin.
- How to format dates: https://www.simplilearn.com/tutorials/sql-tutorial/sql-date-format
- Best practices for setting VARCHAR lengths: https://dev.mysql.com/doc/refman/8.0/en/char.html#:~:text=Values%20in%20VARCHAR%20columns%20are,and%20the%20character%20set%20used.
Some coding I looked at for inspiration:
- https://github.com/nickjlupu/Pewlett-Hackard-Analysis/blob/master/Queries/queries.sql
- https://github.com/ermiasgelaye/sql-challenge
I wish to thank my teaching staff:
- Hunter Hollis
- Sam Espe
- Randy Sendek