<a href="https://colab.research.google.com/github/shubham-75/DSA/blob/main/SQLite_DB_Exercise_GH.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**STEP 1: CREATE the SQLite database;**


We need to import the sqlite3 module and create the database and tables.  You'll see this follows the syntax we have used on previous weeks.

Note that we have created the student table with a primary key that is not an INTEGER.

Is this good practice?  
What are the issues and benefits of doing this?

In [1]:
import sqlite3

#This statement creates a connection labelled as conn.  This will be used throughout to ensure the consistency for when we start to query the database tables.
conn = sqlite3.connect('student_grades.db')
cursor = conn.cursor()

#create the student table - we've set the ID to be a primary key.  Is it good to create the primary key as an TEXT string.
cursor.execute('''
CREATE TABLE IF NOT EXISTS student (
  ID TEXT PRIMARY KEY,
  First TEXT NOT NULL,
  Last TEXT NOT NULL
)
''')

#create the grade table - no primary key provided.  As students can exist multiple times in the table as can a course.
cursor.execute('''
CREATE TABLE IF NOT EXISTS grade (
  ID TEXT,
  Code TEXT NOT NULL,
  Mark INTEGER NOT NULL
)
''')

#create the course table - primary key provided again set as TEXT.
cursor.execute('''
CREATE TABLE IF NOT EXISTS course (
  Code TEXT PRIMARY KEY,
  Title TEXT NOT NULL
)
''')

#This saves the chnages to the databae.  Up unitl this point the executed SQL statement isn't stored, changes are not immediatley saved.
conn.commit()
#conn.close()
print("Database and tables created successfully!")


Database and tables created successfully!


**STEP 2: Check Tables Created:**

Run the command to show the database tables created and the structure.

In [2]:
# prompt: show the table structures

#import sqlite3

#conn = sqlite3.connect('student_grades.db')
#cursor = conn.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()

for table_name in tables:
    print(f"Table: {table_name[0]}")
    cursor.execute(f"PRAGMA table_info({table_name[0]});")
    columns = cursor.fetchall()
    for col in columns:
        print(f"  Column: {col[1]}, Type: {col[2]}, NotNull: {col[3]}, DefaultVal: {col[4]}, PrimaryKey: {col[5]}")
    print("-" * 20)

#conn.close()


Table: student
  Column: ID, Type: TEXT, NotNull: 0, DefaultVal: None, PrimaryKey: 1
  Column: First, Type: TEXT, NotNull: 1, DefaultVal: None, PrimaryKey: 0
  Column: Last, Type: TEXT, NotNull: 1, DefaultVal: None, PrimaryKey: 0
--------------------
Table: grade
  Column: ID, Type: TEXT, NotNull: 0, DefaultVal: None, PrimaryKey: 0
  Column: Code, Type: TEXT, NotNull: 1, DefaultVal: None, PrimaryKey: 0
  Column: Mark, Type: INTEGER, NotNull: 1, DefaultVal: None, PrimaryKey: 0
--------------------
Table: course
  Column: Code, Type: TEXT, NotNull: 0, DefaultVal: None, PrimaryKey: 1
  Column: Title, Type: TEXT, NotNull: 1, DefaultVal: None, PrimaryKey: 0
--------------------


**STEP 3: Upload Files:**

Run this box three times to upload the relevant csv files.

Course_Table.csv, Student_Table.csv & Grade_table.csv

In [4]:


from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))


**STEP 4: Load CSV files into the database tables:**

This will populate the database tables with the data from teh csv files.  No need to write INSERT statements.

You need to make sure the correct files are loaded into the corresponding tables.

In [8]:
import csv
def import_csv_to_table(csv_file, table_name):
    #opens the file aas read only 'r', doesn't allow the origianl csv to be changed.
    with open(csv_file, 'r', encoding='utf-8') as file:
        csv_reader = csv.reader(file)
        next(csv_reader)  # Skip header row if present
        for row in csv_reader:
            #? creates a placeholder for each column in the CSV file. ['?','?','?'] - Join makes it a string so it can then be inserted.
            # use of the '?' reduce risk of SQL injection
            placeholders = ', '.join(['?' for _ in row])
            #Assumes that the CSV and table have the same structure (this could be an issue) Would have to specify column names if different.
            sql = f"INSERT INTO {table_name} VALUES ({placeholders})"
            cursor.execute(sql, row)

# Import data from CSV files into the relevant table - Student_Table goes into student table.  teh import_csv_to_table is the function, passing the two values across.
try:
    import_csv_to_table('Student_Table.csv', 'student')
    import_csv_to_table('Course_Table.csv', 'course')
    import_csv_to_table('Grade_Table.csv', 'grade')
    conn.commit()
    print("Data imported successfully!")
except Exception as e:
    print(f"An error occurred: {e}")
    conn.rollback()  # Rollback changes if an error occurred



Data imported successfully!


**STEP 5: Check Data has loaded**

Query each database table and load the data into a dataframe and display the first 5 lines

In [9]:
import pandas as pd
# Query all three tables and load into pandas DataFrames
student_df = pd.read_sql_query("SELECT * FROM student", conn)
grade_df = pd.read_sql_query("SELECT * FROM grade", conn)
course_df = pd.read_sql_query("SELECT * FROM course", conn)

# Show the first 5 lines of each DataFrame
print("Student Table:")
print(student_df.head(5))
print("\nGrade Table:")
print(grade_df.head(5))
print("\nCourse Table:")
print(course_df.head(5))




Student Table:
      ID    First     Last
0  S1001    Alice    Smith
1  S1002      Bob    Brown
2  S1003  Charlie  Johnson
3  S1004    David   Taylor
4  S1005     Emma    Brown

Grade Table:
      ID  Code  Mark
0  S1005  C123    55
1  S1005  C124    76
2  S1011  C124    47
3  S1014  C117    50
4  S1032  C116    54

Course Table:
   Code                Title
0  C101      Mathematics 101
1  C102  Physics Inroduction
2  C103  Chemistry Practical
3  C104     Computer Science
4  C105  Economics & Finance


**ONLY RUN IF YOU NEED TO DELETE THE DATA IN THE TABLES**

If you run go back to **STEP 4** and re-run from there.

In [None]:
# only run if you need to reset the tables without deleting the databae and starting again - then re-run the box previous box.
# Delete all data from the tables
cursor.execute("DELETE FROM student")
cursor.execute("DELETE FROM grade")
cursor.execute("DELETE FROM course")

conn.commit()
print("All data deleted from the tables successfully!")



All data deleted from the tables successfully!


**STEP 6: SQL Select statements**

Run the following statements.  Please ask yoursefl the impact of each one before running.




In [10]:
# Select all columns from the student table
student_df = pd.read_sql_query("SELECT * FROM student", conn)

#What should the output be.
print(student_df)



       ID    First       Last
0   S1001    Alice      Smith
1   S1002      Bob      Brown
2   S1003  Charlie    Johnson
3   S1004    David     Taylor
4   S1005     Emma      Brown
5   S1006    Fiona     Thomas
6   S1007   George      White
7   S1008   Hannah     Harris
8   S1009    Isaac      Smith
9   S1010    Julia   Thompson
10  S1011      Bob      Lewis
11  S1012    Laura      Smith
12  S1013  Michael      Allen
13  S1014    Nancy      Young
14  S1015   Oliver       King
15  S1016    Paula      Smith
16  S1017   Robert      Scott
17  S1018   Robert      Adams
18  S1019   Sophia      Baker
19  S1020  Timothy      Brown
20  S1021  Timothy       Hall
21  S1022   Victor     Carter
22  S1023    Wendy     Foster
23  S1024  Yasmine      Price
24  S1025  Yasmine    Roberts
25  S1026  Zachary      Brown
26  S1027    Aaron      Evans
27  S1028     Beth   Campbell
28  S1029  Yasmine     Parker
29  S1030    Diana     Murphy
30  S1031    Ethan      Price
31  S1032  Felicia     Foster
32  S1033 

In [11]:
# Select Last from the student table
studentLast_df = pd.read_sql_query("SELECT ALL Last FROM student", conn)

#What should the output be.
print(studentLast_df)



         Last
0       Smith
1       Brown
2     Johnson
3      Taylor
4       Brown
5      Thomas
6       White
7      Harris
8       Smith
9    Thompson
10      Lewis
11      Smith
12      Allen
13      Young
14       King
15      Smith
16      Scott
17      Adams
18      Baker
19      Brown
20       Hall
21     Carter
22     Foster
23      Price
24    Roberts
25      Brown
26      Evans
27   Campbell
28     Parker
29     Murphy
30      Price
31     Foster
32     Bryant
33  Alexander
34    Russell
35     Foster
36     Foster
37      Hayes
38   Sullivan
39      Price


In [12]:
# Select DISTINCT last names from the student table
studentLastUnique_df = pd.read_sql_query("SELECT DISTINCT Last FROM student", conn)

#What should the output be.   What does this tell you from the previous outputs?
print(studentLastUnique_df)



         Last
0       Smith
1       Brown
2     Johnson
3      Taylor
4      Thomas
5       White
6      Harris
7    Thompson
8       Lewis
9       Allen
10      Young
11       King
12      Scott
13      Adams
14      Baker
15       Hall
16     Carter
17     Foster
18      Price
19    Roberts
20      Evans
21   Campbell
22     Parker
23     Murphy
24     Bryant
25  Alexander
26    Russell
27      Hayes
28   Sullivan


In [15]:
# Select DISTINCT First from the student table - modify the query
studentFirstUnique_df = pd.read_sql_query("SELECT DISTINCT First FROM student", conn)

#What should the output be?  What does this tell you from the previous outputs?
print(studentFirstUnique_df)

      First
0     Alice
1       Bob
2   Charlie
3     David
4      Emma
5     Fiona
6    George
7    Hannah
8     Isaac
9     Julia
10    Laura
11  Michael
12    Nancy
13   Oliver
14    Paula
15   Robert
16   Sophia
17  Timothy
18   Victor
19    Wendy
20  Yasmine
21  Zachary
22    Aaron
23     Beth
24    Diana
25    Ethan
26  Felicia
27   Gordon
28      Ian
29  Jasmine
30     Kyle
31     Matt
32     Nina


**STEP 7: SELECT with WHERE**

In [16]:
# Select Last from the student table
studentWhere_df = pd.read_sql_query("SELECT * FROM grade WHERE Mark > 60", conn)

#What should the output be.
print(studentWhere_df)

        ID  Code  Mark
0    S1005  C124    76
1    S1034  C114    75
2    S1019  C103    72
3    S1028  C110    61
4    S1035  C120    72
..     ...   ...   ...
397  S1039  C114    65
398  S1004  C115    75
399  S1010  C106    71
400  S1028  C119    67
401  S1012  C114    74

[402 rows x 3 columns]


In [17]:
#export the dataframe to csv for further analysis
#the index=False means no row numbers are exported.
#change index to True and compare the outputs.  Youll need to download formthe Files window

studentWhere_df.to_csv('Grades_Over_60.csv', index=True)


**TASK**

Create a statement to select all students that have passed the Finance Management Course and export the file to CSV.

Just using standard select statments.

How would you tackle the problem.

In [18]:
#review the structure of the grades table and the course table
print("\nGrade Table:")
print(grade_df.head(5))
print("\nCourse Table:")
print(course_df.head(5))


Grade Table:
      ID  Code  Mark
0  S1005  C123    55
1  S1005  C124    76
2  S1011  C124    47
3  S1014  C117    50
4  S1032  C116    54

Course Table:
   Code                Title
0  C101      Mathematics 101
1  C102  Physics Inroduction
2  C103  Chemistry Practical
3  C104     Computer Science
4  C105  Economics & Finance


In [21]:
#select the course code from the course table for the Finance Management Course
studentCourse_df = pd.read_sql_query("SELECT Code FROM course WHERE Title = 'Finance Management'", conn)

print(studentCourse_df)

   Code
0  C115


In [28]:
#select student ID and Mark for the Finance Management Course
studentCourseMark_df = pd.read_sql_query("SELECT ID, Mark FROM grade WHERE Code = 'C115'", conn)

print(studentCourseMark_df)

       ID  Mark
0   S1034    57
1   S1012    68
2   S1017    51
3   S1031    60
4   S1036    58
5   S1030    48
6   S1028    61
7   S1005    67
8   S1015    46
9   S1009    59
10  S1001    46
11  S1008    59
12  S1038    75
13  S1023    57
14  S1040    65
15  S1019    58
16  S1026    53
17  S1013    63
18  S1033    72
19  S1002    57
20  S1006    62
21  S1022    74
22  S1014    72
23  S1016    54
24  S1035    43
25  S1007    47
26  S1018    77
27  S1020    64
28  S1029    57
29  S1011    54
30  S1003    68
31  S1004    75


**STEP 8: Select from multiple tables in one statement**

This can cause issues where tables have hte same column names in different tables.  

To resolve this we need to make use of the following syntax:
TableName.Column

In [29]:
#select the following coloumns:  First, Last & Mark from the Student and Grade tables
studentMarks_df = pd.read_sql_query("SELECT First, Last, Mark FROM student, grade WHERE (student.ID = grade.ID)", conn)

print(studentMarks_df)


       First      Last  Mark
0       Emma     Brown    55
1       Emma     Brown    76
2        Bob     Lewis    47
3      Nancy     Young    50
4    Felicia    Foster    54
..       ...       ...   ...
795     Kyle    Foster    54
796     Matt  Sullivan    57
797    Laura     Smith    74
798    Diana    Murphy    53
799     Kyle    Foster    54

[800 rows x 3 columns]


**TASK**

Modify the statement to get all the students that acheived a mark over 50

In [31]:
#select the following coloumns:  First, Last & Mark from the Student and Grade tables
studentMarks_df = pd.read_sql_query("SELECT First, Last, Mark FROM student, grade WHERE (Student.ID = Grade.ID)", conn)

print(studentMarks_df)

       First      Last  Mark
0       Emma     Brown    55
1       Emma     Brown    76
2        Bob     Lewis    47
3      Nancy     Young    50
4    Felicia    Foster    54
..       ...       ...   ...
795     Kyle    Foster    54
796     Matt  Sullivan    57
797    Laura     Smith    74
798    Diana    Murphy    53
799     Kyle    Foster    54

[800 rows x 3 columns]


In [32]:
#select the following coloumns:  First, Last & Mark from the Student and Grade tables
studentMarks_df = pd.read_sql_query("SELECT First, Last, Mark FROM student, grade WHERE (Student.ID = Grade.ID) AND Mark >= '50'", conn)

print(studentMarks_df)

       First       Last  Mark
0       Emma      Brown    55
1       Emma      Brown    76
2      Nancy      Young    50
3    Felicia     Foster    54
4       Beth  Alexander    75
..       ...        ...   ...
633     Kyle     Foster    54
634     Matt   Sullivan    57
635    Laura      Smith    74
636    Diana     Murphy    53
637     Kyle     Foster    54

[638 rows x 3 columns]


In [40]:
#modify the query to gather the 'Finance Management Course and all studnet that obtained a mark between 55 and 65
studentMarks_df = pd.read_sql_query("SELECT First, Last, Mark FROM student, grade, course WHERE (Student.ID = Grade.ID) AND (grade.Code = course.Code) AND course.code = 'C115' AND Mark BETWEEN '55' AND '65'", conn)

print(studentMarks_df)

      First       Last  Mark
0      Beth  Alexander    57
1     Ethan      Price    60
2   Jasmine     Foster    58
3      Beth   Campbell    61
4     Isaac      Smith    59
5    Hannah     Harris    59
6     Wendy     Foster    57
7      Nina      Price    65
8    Sophia      Baker    58
9   Michael      Allen    63
10      Bob      Brown    57
11    Fiona     Thomas    62
12  Timothy      Brown    64
13  Yasmine     Parker    57


**STEP 9: ORDER BY statements**

Can be set to be either ASC or DESC.  The syntax is ORDER BY added to the select statement.

In [41]:
studentMarks_df = pd.read_sql_query("SELECT * FROM grade ORDER BY Mark DESC", conn)

print (studentMarks_df)

        ID  Code  Mark
0    S1023  C109    82
1    S1029  C105    82
2    S1001  C117    82
3    S1037  C124    81
4    S1015  C123    81
..     ...   ...   ...
795  S1033  C116    37
796  S1027  C107    35
797  S1014  C103    35
798  S1032  C118    35
799  S1036  C108    35

[800 rows x 3 columns]


**TASK**

Modify the statement to order by Course Code and then Mark

In [48]:
studentMarks_df = pd.read_sql_query("SELECT * FROM grade ORDER BY code, Mark DESC", conn)
print (studentMarks_df)

        ID  Code  Mark
0    S1011  C101    78
1    S1039  C101    76
2    S1012  C101    76
3    S1034  C101    76
4    S1031  C101    75
..     ...   ...   ...
795  S1035  C124    47
796  S1001  C124    47
797  S1020  C124    44
798  S1022  C124    42
799  S1017  C124    40

[800 rows x 3 columns]


Modify the statement to order by Code, Mark and obtain the student Name.  Only show Name and Mark for Course Id

In [50]:
studentMarks_df = pd.read_sql_query("SELECT First, Last, Mark, Code FROM student, grade WHERE (student.ID = grade.ID) ORDER BY code , Mark DESC", conn)
print (studentMarks_df)

       First       Last  Mark  Code
0        Bob      Lewis    78  C101
1       Matt   Sullivan    76  C101
2      Laura      Smith    76  C101
3       Beth  Alexander    76  C101
4      Ethan      Price    75  C101
..       ...        ...   ...   ...
795      Ian    Russell    47  C124
796    Alice      Smith    47  C124
797  Timothy      Brown    44  C124
798   Victor     Carter    42  C124
799   Robert      Scott    40  C124

[800 rows x 4 columns]


Modify the statement to select for a specific course and display the course code and title, order by the grade ASC - Statistics For Python

In [52]:
studentMarks_df = pd.read_sql_query("SELECT First, Last, Mark, Course.Code, Title FROM grade, student, course WHERE (student.id = grade.id) AND (course.title = 'Statistics For Python') ORDER BY Mark ASC", conn)
print (studentMarks_df)

       First    Last  Mark  Code                  Title
0      Aaron   Evans    35  C113  Statistics For Python
1      Nancy   Young    35  C113  Statistics For Python
2    Felicia  Foster    35  C113  Statistics For Python
3    Jasmine  Foster    35  C113  Statistics For Python
4     Robert   Adams    37  C113  Statistics For Python
..       ...     ...   ...   ...                    ...
795   Sophia   Baker    81  C113  Statistics For Python
796  Timothy   Brown    81  C113  Statistics For Python
797    Wendy  Foster    82  C113  Statistics For Python
798  Yasmine  Parker    82  C113  Statistics For Python
799    Alice   Smith    82  C113  Statistics For Python

[800 rows x 5 columns]


**STEP 10: Arthimatic and Aggregating functions**

Simple arithmatic on the columns.  

When running we make use of AS - to make better readibility for the column name.

In [53]:
#simple Adding values, Subtracting Values, Multiplyig Values:
studentCount_df = pd.read_sql_query("SELECT Mark, Mark/2 AS DIVIDED, Mark*2 AS DOUBLED, MArk+10 AS MODERATED FROM grade", conn)
print (studentCount_df)

     Mark  DIVIDED  DOUBLED  MODERATED
0      55       27      110         65
1      76       38      152         86
2      47       23       94         57
3      50       25      100         60
4      54       27      108         64
..    ...      ...      ...        ...
795    54       27      108         64
796    57       28      114         67
797    74       37      148         84
798    53       26      106         63
799    54       27      108         64

[800 rows x 4 columns]


In [54]:
#simple Count:  Count up the rows:
studentCount_df = pd.read_sql_query("SELECT COUNT(*) AS COUNT FROM grade", conn)
print (studentCount_df)

   COUNT
0    800


In [55]:
#simple SUM for the column values
studentCount_df = pd.read_sql_query("SELECT SUM(mark) AS TOTAL FROM grade", conn)
print (studentCount_df)

   TOTAL
0  48275


In [56]:
#simple MAX for the column values
studentCount_df = pd.read_sql_query("SELECT MAX(mark) AS BEST FROM grade", conn)
print (studentCount_df)

#modify to print the Lowest Mark

   BEST
0    82


**TASK**

How would you calculate the RANGE of values in the marks column.

In [58]:
studentCount_df = pd.read_sql_query("SELECT MAX(mark) - MIN(mark) AS Range FROM grade", conn)
print (studentCount_df)

   Range
0     47


How would you find a specific students marks for all modules taken and the average mark.

The student is Laura Smith



In [61]:
studentMarks_df = pd.read_sql_query("SELECT First, Last, COUNT(*) AS Modules, SUM(mark) as TOTAL, SUM(mark)/COUNT(*) AS AVERAGE FROM student, grade WHERE (student.ID = Grade.ID) AND first = 'Laura'", conn)
print (studentMarks_df)

   First   Last  Modules  OTAL  AVERAGE
0  Laura  Smith       20  1238       61


**STEP 11: Group BY Statements**

When we need to pull a group of rows together and carry out an aggregation of data.

In [62]:
studentGroup_df = pd.read_sql_query("SELECT AVG(Mark), Code AS AVERAGE FROM grade GROUP BY Code", conn)
print (studentGroup_df)

    AVG(Mark) AVERAGE
0   59.823529    C101
1   62.366667    C102
2   57.911765    C103
3   61.228571    C104
4   60.515152    C105
5   57.909091    C106
6   57.900000    C107
7   58.540541    C108
8   59.638889    C109
9   59.906250    C110
10  61.424242    C111
11  60.028571    C112
12  60.781250    C113
13  64.093750    C114
14  60.218750    C115
15  58.542857    C116
16  61.393939    C117
17  60.542857    C118
18  62.000000    C119
19  60.638889    C120
20  58.361111    C121
21  63.709677    C122
22  61.600000    C123
23  60.093750    C124


**TASK**

Create a query to show the First, Last name of the student and there average grade across all modules.

In [82]:
studentGroup_df = pd.read_sql_query("SELECT First, Last, AVG(Mark) AS AVERAGE FROM student, grade WHERE (student.id = grade.id) GROUP BY Last, First ORDER BY AVERAGE DESC", conn)
print (studentGroup_df)

#modify the query to order by AVERAGE grade DESC

      First       Last  AVERAGE
0   Yasmine     Parker    65.35
1    Robert      Adams    64.20
2     Isaac      Smith    63.95
3      Beth      Hayes    63.55
4     David     Taylor    63.50
5      Beth  Alexander    63.15
6    Sophia      Baker    62.90
7   Michael      Allen    62.70
8   Timothy       Hall    62.55
9     Wendy     Foster    62.40
10    Nancy      Young    62.30
11    Aaron      Evans    62.05
12    Laura      Smith    61.90
13   Oliver       King    61.70
14      Bob      Brown    61.60
15    Ethan      Price    61.55
16  Yasmine      Price    61.05
17   Victor     Carter    61.00
18  Felicia     Foster    60.95
19  Charlie    Johnson    60.85
20  Timothy      Brown    60.70
21     Emma      Brown    60.65
22    Diana     Murphy    60.25
23    Fiona     Thomas    60.00
24     Beth   Campbell    59.65
25     Kyle     Foster    59.40
26    Alice      Smith    58.90
27    Paula      Smith    58.70
28   Hannah     Harris    58.60
29   Gordon     Bryant    58.35
30   Geo

We can use HAVING to repalce a WHERE when we have items that have been grouped together.

In [88]:
#using your above query identify stuent that have an AVG of more then 65
studentGroup_df = pd.read_sql_query("SELECT * FROM student, grade WHERE (student.id = grade.id) GROUP BY Last, First HAVING (AVG(Mark) BETWEEN 55 AND 65)", conn)
print (studentGroup_df)

#how would you change this so it's between 52 AND 58

       ID    First       Last     ID  Code  Mark
0   S1018   Robert      Adams  S1018  C106    62
1   S1034     Beth  Alexander  S1034  C114    75
2   S1013  Michael      Allen  S1013  C107    52
3   S1019   Sophia      Baker  S1019  C103    72
4   S1002      Bob      Brown  S1002  C123    47
5   S1005     Emma      Brown  S1005  C123    55
6   S1020  Timothy      Brown  S1020  C104    74
7   S1026  Zachary      Brown  S1026  C121    70
8   S1033   Gordon     Bryant  S1033  C121    56
9   S1028     Beth   Campbell  S1028  C110    61
10  S1022   Victor     Carter  S1022  C102    74
11  S1027    Aaron      Evans  S1027  C122    78
12  S1032  Felicia     Foster  S1032  C116    54
13  S1036  Jasmine     Foster  S1036  C104    74
14  S1037     Kyle     Foster  S1037  C124    81
15  S1023    Wendy     Foster  S1023  C112    66
16  S1021  Timothy       Hall  S1021  C111    70
17  S1008   Hannah     Harris  S1008  C110    46
18  S1038     Beth      Hayes  S1038  C111    66
19  S1003  Charlie  

**STEP 12: Using JOINS**

Cross JOIN A & B - retunrs all pairs of rows from A and B

Natural JOIN A & B - returns pairs of rwos with comon values for idnetical names columns and without dupilcating columns

Inner JOIN A & B - returns pairs of rows satisfying a condition

In [89]:
#CROSS JOIN
studentJoin_df = pd.read_sql_query("SELECT * FROM student CROSS JOIN grade", conn)
print (studentJoin_df)

          ID  First   Last     ID  Code  Mark
0      S1001  Alice  Smith  S1005  C123    55
1      S1001  Alice  Smith  S1005  C124    76
2      S1001  Alice  Smith  S1011  C124    47
3      S1001  Alice  Smith  S1014  C117    50
4      S1001  Alice  Smith  S1032  C116    54
...      ...    ...    ...    ...   ...   ...
31995  S1040   Nina  Price  S1037  C117    54
31996  S1040   Nina  Price  S1039  C103    57
31997  S1040   Nina  Price  S1012  C114    74
31998  S1040   Nina  Price  S1030  C104    53
31999  S1040   Nina  Price  S1037  C102    54

[32000 rows x 6 columns]


In [90]:
#NATURAL JOIN
studentJoin_df = pd.read_sql_query("SELECT * FROM student NATURAL JOIN grade", conn)
print (studentJoin_df)

        ID    First      Last  Code  Mark
0    S1005     Emma     Brown  C123    55
1    S1005     Emma     Brown  C124    76
2    S1011      Bob     Lewis  C124    47
3    S1014    Nancy     Young  C117    50
4    S1032  Felicia    Foster  C116    54
..     ...      ...       ...   ...   ...
795  S1037     Kyle    Foster  C117    54
796  S1039     Matt  Sullivan  C103    57
797  S1012    Laura     Smith  C114    74
798  S1030    Diana    Murphy  C104    53
799  S1037     Kyle    Foster  C102    54

[800 rows x 5 columns]


In [91]:
#INNER JOIN
studentJoin_df = pd.read_sql_query("SELECT * FROM student INNER JOIN grade USING (ID)", conn)
print (studentJoin_df)

        ID    First      Last  Code  Mark
0    S1005     Emma     Brown  C123    55
1    S1005     Emma     Brown  C124    76
2    S1011      Bob     Lewis  C124    47
3    S1014    Nancy     Young  C117    50
4    S1032  Felicia    Foster  C116    54
..     ...      ...       ...   ...   ...
795  S1037     Kyle    Foster  C117    54
796  S1039     Matt  Sullivan  C103    57
797  S1012    Laura     Smith  C114    74
798  S1030    Diana    Murphy  C104    53
799  S1037     Kyle    Foster  C102    54

[800 rows x 5 columns]
