The purpose of this project was to create a SQL database from 6 CSV files containing various employee data from a fictitious company and analyze the data. The first step was to examine the CSV files and determine all the relationships between the fields and create an Entity Relationship Diagram (see below), paying attention to data types, primary and foreign keys, and other possible constraints. The next step was to create the necessary tables in the database (see "table_setups.sql") and load in the CSV files. Once this was complete, several queries were run (see "employee_queries.sql") to analyze the data. As a final step, a jupyter notebook was written (see "SalaryAnalysis.ipynb") to explore the data in the database even further.
- "SQL Queries" (folder)
- "table_setups.sql" (schemata for creating the tables in postgresql)
- "employee_queries.sql" (several examples of postgresql queries)
- "table_setups.sql" (schemata for creating the tables in postgresql)
- "Data Files" (folder)
- "departments.csv" (input data file if needed)
- "dept_emp.csv" (input data file if needed)
- "dept_manager.csv" (input data file if needed)
- "employees.csv" (input data file if needed)
- "salaries.csv" (input data file if needed)
- "titles.csv" (input data file if needed)
- "departments.csv" (input data file if needed)
- "Analysis Code" (folder)
- "SalaryAnalysis.ipynb" (jupyter notebook file containing code for further salary analysis)
- "SalaryAnalysis.ipynb" (jupyter notebook file containing code for further salary analysis)
- "Images" (folder)
- "TableRelationshipDiagram.png" (table relationship diagram)
- "TableRelationshipDiagram.png" (table relationship diagram)
The csv files in subfolder "data" are only needed if re-building the tables from scratch.
The Jupyter notebook requires you to either hardcode your postgresql password into the cell where indicated
or put your postgresql password into a config.py file before running.