Skip to content

This repo is the final project of a SQL course.

Notifications You must be signed in to change notification settings

imPBH/ProjetSQL-B2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Projet SQL B2

This repo is the final project of a SQL course.

Summary

Subject

The given subject was:

You’ve been contacted by TeamLikwid to migrate their old database to their new schema. They want you to:

  • Create a .sql script that’ll create the new database schema
  • Create a second script that’ll transfer data from the old DB to the new one

In addition, to make sure they still can access their data, they want you to show the queries + results for the followings:

  • List every tournament for a given game name
  • Given a game name, retrieve the average wage of the players
  • List all tournament by place
  • Get the number of players by gender

Here is the old database schema:

Old schema

Here is the new schema

New schema

Why did it have to be changed

The old database schema wasn't well structured.
For example, the tournament table contains information about the place where it is happening, but this information isn't related to a tournament, it relates to a place, that's why we created a new place table in the new schema.

Another example is the staff, player and coach table. They look pretty much the same, and all those people are employees. So we created a new table employee_data where we store all the common data instead of putting them in the staff, player and coach tables. We kept those tables, but now, they only contain a primary key for their ID, a foreign key for their employee ID, and specific data if needed, like ranking or game for players.

This is also safer, because we can't add anyone into the staff table for if they are not in the employee_data table because of the use of foreign keys.
Another use of foreign keys which was not possible with the old schema, in the new tournament table we can only add a tournament with a place which is in the place table because we are using the place primary key as a foreign key in the tournament table. So inserting a new tournament with an IdPlace that does not exist will result in an error and the tournament will not be inserted because it does not meet the foreign key constraint

How to run

Migration

You need to have sqlite3 installed.

In the same directory of the old database, in a terminal run this command to create an empty database
sqlite3 newdb.db "VACUUM;"

Run this command to initialize the new database with the new schema
sqlite3 newdb.db < db_init.sql

And then run this command to migrate data from the old database to the new one
sqlite3 tpb2.db < migrate.sql

Python script

You need to have Python installed.

Run this command to launch the script
python main.py
or
python3 main.py

Explanations

Initialization script

In this sqlite script, we initialize the new database with the new schema.
We create all the tables, all the columns and their type.

Here is an example of a simple table with a primary key

CREATE TABLE IF NOT EXISTS "employee_data"
(
    IdEmployee INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    Lastname VARCHAR2(30),
    Firstname VARCHAR2(30),
    Gender VARCHAR2(30),
    Age INTEGER,
    Wage INTEGER
);

The first column is IdEmployee, this is the primary key of the table, we want it to be autoincremented so we dont have to specify the ID each time we add a new employee, and this column can't be null.
For Lastname, Firstname and Gender, they are strings with a limit of 30 chars.
The Age and Wage columns are integer numbers.

Here is an example of a table which uses a foreign key

CREATE TABLE IF NOT EXISTS "staff"
(
    IdStaff INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    IdEmployeeData INTEGER NOT NULL,
    FOREIGN KEY(IdEmployeeData) REFERENCES employee_data(IdEmployee)
);

The first column is Idstaff, this is the primary key of the table, we want it to be autoincremented so we don't have to specify the ID each time we add a new staff, and this column can't be null.
We define IdEmployeeData as an integer that can't be null because this will be our foreign key to the employee_data table.
And then we say that IdEmployeeData is a foreign key, and it is a reference of IdEmployee in the employee_data table.

We can use multiple foreign keys in a table

CREATE TABLE IF NOT EXISTS "tournament"
(
    IdTournament INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    IdPlace INTEGER NOT NULL,
    IdGame INTEGER NOT NULL,
    Date VARCHAR2(30) NOT NULL,
    Duration INTEGER,
    FOREIGN KEY(IdPlace) REFERENCES place(IdPlace),
    FOREIGN KEY(IdGame) REFERENCES game(IdGame)
);

Migration script

In this sqlite script, we transfer data from the old database to the new one while respecting the new schema.

We start the script indicating that we are starting a new transaction with the BEGIN command, we declare which tables we use and how we call them.

BEGIN;

ATTACH DATABASE 'tpb2.db' AS old;
ATTACH DATABASE 'newdb.db' AS new;

Now we can start the migration. We should keep in mind that the order is very important here! That's why we migrate data into tables that don't need foreign key in a first place.
So we start by inserting data into the employee_data table.

INSERT INTO new.employee_data (Lastname, Firstname, Gender, Age, Wage) 
SELECT 
  lastname, 
  firstname, 
  gender, 
  age, 
  wage 
FROM 
  old.staff;

This one is pretty simple because all the data we need is available in the staff table in the old database. So we just retrieve all the data we need and insert them into our employee_data table in the new database. We also make the same request for the player and coach tables in the old database because the employee_data need to contain all the data about all the employees, so we put everybody in this table.

Another table that doesn't use foreign keys is the game table. This one is actually the same as in the old database, it only contains an ID as a primary key and the name of the game. So, we just need to retrieve the name into the old database and insert it into the new one, we don't need to retrieve the ID because it will be autoincremented.

INSERT INTO new.game (Name) 
SELECT 
  name 
FROM 
  old.game;

The place table is the last one that doesn't use foreign keys, but this one is a little bit more tricky because we need to retrieve the data about a place into the tournament table from the old database because the old one doesn't have a place table.
We also use a GROUP BY clause because we want to be sure that we don't insert duplicate rows into our new table.

INSERT INTO new.place (Name, Address, City) 
SELECT 
  placeName, 
  address, 
  city 
FROM 
  old.tournament 
GROUP BY 
  placeName, 
  address, 
  city;

Once that is done, we have done all the simplest part! Now it's getting more tricky.
We have to migrate data into a table that uses foreign keys, and so insert data from multiple tables into one new table.

A simple one is on the new staff table.
We only need to insert the IdEmployee from the new employee_data table into it. But we need to make sure the ID we are inserting is part of the staff people ! So we need to compare the information that we already had from the old staff table between the new employee_datatable.
To do that we will compare all the information that we have about one person, so we make sure we are inserting the right person into the staff table.
We can imagine that it is possible that there is more than one person with the same name and last name, even if the odds are low, it is still possible, that's why we compare with all the data we have.

INSERT INTO new.staff (IdEmployeeData) 
SELECT 
  IdEmployee 
FROM 
  new.employee_data, 
  old.staff 
WHERE 
  new.employee_data.Lastname = old.staff.lastname 
  AND new.employee_data.Firstname = old.staff.firstname
  AND new.employee_data.Gender = old.staff.gender
  AND new.employee_data.Age = old.staff.age
  AND new.employee_data.Wage = old.staff.wage;

It will be the exact same process for the player and coach table as they look alike, the only difference is between some columns. For example in the player table, we need to retrieve the IdGame that was already in the old database, we kept the same order for the game table so we don't need to retrieve the ID into our new game table because the ID is already in the old player table.

So for the last one, this one is a little bit different but the logic is the same.
We need to insert the IdPlace, IdGame, date and duration. Our problem is that we can retrieve all this data from the old tournament table, except the IdPlace!
That's why we also retrieve data from the new place table, and to make sure that we are inserting the right value, we compare every value we have about the place. In our case, we compare the name of the place, the address and the city. This is necessary because it can be possible that 2 different places have the same name, but not the same address for example.

INSERT INTO new.tournament (IdPlace, IdGame, Date, Duration) 
SELECT 
  IdPlace, 
  IdGame, 
  date, 
  duration 
FROM 
  old.tournament, 
  new.place 
WHERE 
  old.tournament.placeName = new.place.Name
  AND old.tournament.address = new.place.Address
  AND old.tournament.city = new.place.City;

And at the end of the script, we close the transaction with the COMMIT command.

Python script

This Python script uses the sqlite3 library.
We use it to demonstrate that the new database is fully functional.

In the main function, we just create a connection to our database and create a context by creating a cursor on the database. This context will be provided to functions that need to execute a request in the database.

def main():
    connection = sqlite3.connect("newdb.db")
    context = connection.cursor()

List every tournament for a given game name

To list every tournament for a given game name, we run the print_all_tournaments_for_game function. This function will call the choose_game function which will call the get_games function.
The get_games will return the list of all games in the database, the request used is

SELECT
  Name 
FROM
  game

Once the user has chosen a game, we will execute this request

query = """
    SELECT 
      Date, 
      place.Name, 
      place.Address, 
      place.City, 
      Duration 
    FROM 
      tournament 
      INNER JOIN game on game.IdGame = tournament.IdGame 
      INNER JOIN place on place.IdPlace = tournament.IdPlace 
    WHERE 
      game.Name = :game_name"""

par = {"game_name": game_name}
res = context.execute(query, par)

Here is the outputs for each game one by one
list smash tournaments
list tekken tournaments
list streetfighter tournaments
list lol tournaments
list csgo tournaments
list apex tournaments

Given a game name, retrieve the average wage of the players

To retrieve the average wage of the players by a given game name, we run the print_average_wage_for_game function.
This function will call the choose_game function like the print_all_tournaments_for_game.
The request used is

query = """
    SELECT 
      AVG(Wage) 
    FROM 
      player 
      INNER JOIN employee_data ON employee_data.IdEmployee = player.IdEmployeeData 
      INNER JOIN game ON game.IdGame = player.IdGame 
    WHERE 
      game.Name = :game_name"""

par = {"game_name": game_name}
res = context.execute(query, par)

Here is the outputs for each game one by one
average wage smash
average wage tekken
average wage streetfighter
average wage lol
average wage csgo
average wage apex

List all tournaments by place

To list all tournaments by place, we run the print_tournaments_by_places function.
This function will call the get_places function that will return the list of all places in the database.
The request used to retrieve places is

SELECT 
  * 
FROM 
  place

Once we have all places, we will loop on them and run the get_tournaments_by_place that will return the list of tournaments given a place.
The request used for this is

query = """
    SELECT 
      * 
    FROM 
      tournament 
    WHERE 
      IdPlace = :idPlace"""

par = {"idPlace": place[0]}
res = context.execute(query, par)

Here is the output
list all tournaments by place output

We also could have used this request to do this in only one request

SELECT
  game.Name,
  tournament.Date,
  tournament.Duration,
  place.Name,
  place.Address,
  place.City
FROM
  tournament
INNER JOIN game ON game.IdGame = tournament.IdGame
INNER JOIN place on place.IdPlace = tournament.IdPlace
ORDER BY place.IdPlace

Here is the output
list all tournaments by place alt output
We didn't use this one because it was not possible to print the output like we did with the loop on tournaments.

Get the number of players by gender

To get the number of players by gender, we run the print_number_of_players_by_gender function.
The request used for this is

query = """
    SELECT 
      gender, 
      COUNT(gender) 
    FROM 
      player 
      INNER JOIN employee_data ON employee_data.IdEmployee = player.IdEmployeeData 
    GROUP BY 
      Gender"""

res = context.execute(query)
genders = res.fetchall()

Here is the output
number of players by gender

About

This repo is the final project of a SQL course.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published