# Creating and Modifying Tables

In this notebook we present the SQL commands used to create and modify tables in a database. We make use of Python as our main programming, alongside libraries such as Pandas and Psycopg2 to present the results that we get throughout the notebook.

## Case of Use

In order to present the SQL commands in a context where their application makes sense, let's suppose the following scenario. Imagine that we want to create a database to keep a record of the premiere league clubs and their players. First of all, we must create two tables in the database, one for the clubs and one for the players. 

In order to start coding, we have to call the pandas and psycopg2 libraries and create a connection to communicate with the database. 

In [12]:
import pandas as pd
import psycopg2 as pg2

connection = pg2.connect(database = 'dvdrental', user = 'postgres', password = 'password')

Notice that we've already created a new databse called _ premier_league _

Just like in the prevous notebook, we´ll define the get_data function in order to prevent memory problems and to make our code cleaner

In [13]:
def get_data(query, rows = 10):

    with connection.cursor() as cursor:
        cursor.execute(query)

        if rows == 'all':
            raw_data = cursor.fetchall()
        else:
            raw_data = cursor.fetchmany(rows) 

        col_names = [col_desc[0] for col_desc in cursor.description]
        data = pd.DataFrame(raw_data, columns = col_names)

    return data

Due to the complexity of the SQL commnands that we'll be implementing, we need to instantiate a _cursor_ object to use it throughout the notebook. Unlike previous notebooks, where we only used the _cursor_ object inside the get_data function.   

In [14]:
cursor = connection.cursor()

_ We'd like to be able to run this notebook multiple times. Since trying to create tables that already exist would lead to an error, the following code makes sure to delete the tables from the database before we try to create them _

In [15]:
delete_tables = 'DROP TABLE IF EXISTS clubs, players'
cursor.execute(delete_tables)

## CREATE TABLE

Now that we've ensured that the premier league database is empty, we can continue to explain how to create the tables we need. As mentioned above, we need to create two tables, one for the clubs and another for the players, let's focus on the first one. 

## Ex. 1

Suppose we want the clubs table to have the following fields: _ club_id _, _name_, _stadium_name_, _location_ and _ times_champion _ . Apart form knowing what columns we want the table to have, we need to determine the data type and the constrains for each column. For example, the name column must consist only of string values (data type) and, given that every club must have a name and that all of  those names must different, those values must be unique and not NULL (constrains). Similar considerations must be taken for the other columns, however, the reader can easily guess them just by looking at the code.

In [16]:
create_clubs_table = '''
                     CREATE TABLE clubs(
                         club_id SERIAL PRIMARY KEY,
                         name VARCHAR(100) UNIQUE NOT NULL,
                         stadium_name VARCHAR(100), 
                         location VARCHAR(100),
                         times_champion INTEGER CHECK(times_champion >= 0)
                     )
                     '''

cursor.execute(create_clubs_table)

We must point out something here, look at the line where we define the club_id column, we set the datatype as SERIAL and impoes a constrain called PRIMARY KEY on that column. It is a good practice that all the SQL tables have a field that uniquely assings a index to each row in the table, we define such column by imposing the PRIMARY KEY constrain on it. Setting the data type of the club_id column will prevent us from assigning two rows the same club_id and, at the time of adding new rows to the table, SQL will automatically assing a proper club_id for us.

## Ex. 2

Now is time to define the players table, the columns of this table are the following: _player_id_, _first_name_, _last_name_, _club_id_ and _nationality_. Let's create the table

In [17]:
create_players_table = '''
                       CREATE TABLE players(
                           player_id SERIAL PRIMARY KEY,
                           first_name VARCHAR(100) NOT NULL,
                           last_name VARCHAR(100) NOT NULL, 
                           club_id INTEGER REFERENCES clubs(club_id),
                           nationality VARCHAR(100) NOT NULL
                       )
                       '''

cursor.execute(create_players_table)

Notice that we've imposed the contrain REFERENCES clubs(club_id) on the club_id column, this constrain means that the values of that column are making reference to the values of a column in another table, using this constrain prevents us from asigning a player to an unexisting club.

Let's make sure that we created these two tables

In [18]:
ask_for_clubs = 'SELECT * FROM clubs'
clubs = get_data(ask_for_clubs)
clubs

Unnamed: 0,club_id,name,stadium_name,location,times_champion


In [19]:
ask_for_players = 'SELECT * FROM players'
players = get_data(ask_for_players)
players

Unnamed: 0,player_id,first_name,last_name,club_id,nationality


## INSERT Statement



Now that we've created the tables, it's time to insert some rows into them.

## Ex. 3

We can instert one row into the clubs table

In [20]:
insert_club = '''
             INSERT INTO clubs(name, stadium_name, location, times_champion)
             VALUES
             ('Liverpool FC', 'Anfield', 'Liverpool', 19)
             '''

cursor.execute(insert_club)

## Ex. 4

Also we cann add several rows

In [21]:
insert_clubs = '''
               INSERT INTO clubs(name, stadium_name, location, times_champion)
               VALUES
               ('Manchester United FC', 'Old Trafford', 'Manchester', 20),
               ('Manchester City FC', 'Etihad Stadium', 'Manchester', 6)
               '''

cursor.execute(insert_clubs)

## Ex. 5
Another way to add several rows is the following

In [23]:
clubs = [
        ('Chelsea FC', 'Stamford Bridge', 'London', 6),
        ('Tottenham Hotspur FC', 'Tottenham Hotspur Stadium', 'London', 2)
        ]

insert_generic_club = '''
                      INSERT INTO clubs(name, stadium_name, location, times_champion)
                      VALUES
                      (%s, %s, %s, %s)
                      '''

for club in clubs:
    cursor.execute(insert_generic_club, club)

Let's display the clubs table

In [24]:
ask_for_clubs = 'SELECT * FROM clubs'
clubs = get_data(ask_for_clubs)
clubs

Unnamed: 0,club_id,name,stadium_name,location,times_champion
0,1,Liverpool FC,Anfield,Liverpool,19
1,2,Manchester United FC,Old Trafford,Manchester,20
2,3,Manchester City FC,Etihad Stadium,Manchester,6
3,4,Chelsea FC,Stamford Bridge,London,6
4,5,Tottenham Hotspur FC,Tottenham Hotspur Stadium,London,2


Notice that, although we didn't define the club_id value for any of the rows, SQL has automatically filled this field. 

## Ex. 6

Let's see what happend if we try to 


