**NOTE: Distributing or uploading this course material to a public repository (e.g., GitHub) is strictly prohibited.**

## **Database Creation**

We want to implement an OpenCourseWare service using SQLite3. For this purpose, we will create the following four tables:

-  `"kmooc_instructor"` - instructor information
-  `"kmooc_course"` - course information
-  `"kmooc_learningpath"` - learning path (or curriculum) information
-  `"kmooc_learningpath_courses"` - mapping between courses to learning paths

The given 'create_database_proj1.sql' file contains the definition of the schema and a sample set of the tuples. You can download it from [here](https://drive.google.com/file/d/1IOSBZ2a2WWvgkTKfR5lcel0PIC_uo0wq/view?usp=share_link).

Please see the file for detailed schema.

We will implement several core modules of the service.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
# Execute the SQL statements in 'create_database_proj1.sql'.
# First, upload the file on your Google Drive folder.
# If you placed this file in the 'MyDrive' folder, its path is '/content/drive/MyDrive/create_database_proj1.sql'.

f = open("/content/drive/MyDrive/create_database_proj1.sql", 'r')
sql_file = f.read()
f.close()
sql_command = sql_file.split(';')
for command in sql_command:
  cur.execute(command)

conn.commit()
conn.close()

## **Module 1**

We need to extract the top 10 highest rated courses from this database.

Create a query that will join the tables `"kmooc_course"` and `"kmooc_instructor"` (LEFT JOIN) and extract the top 10 courses (*rating* column of the `"kmooc_course"` table).

Display the following columns in the output table:

- *title* (`"kmooc_course"` table)
- *rating* (`"kmooc_course"` table)
- *instructor* - concatenation of the *first_name* and *last_name* columns with a space character (`"kmooc_instructor"` table)

Sort the output table in the descending order of the *rating* column.

Note that the third column is made by concatenating the *first_name* and *last_name* columns.

**Format**:
>```
('course_title', course_rating, 'first_name last_name')
...
```

Print the result using the code cell below.

In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
cur.execute("""
  SELECT c.title "course_title",
    c.rating "course_rating",
    i.first_name || ' ' || i.last_name "name"
  FROM kmooc_course AS c
  LEFT JOIN kmooc_instructor AS i
  ON c.instructor_id = i.id
  ORDER BY course_rating DESC
  LIMIT 10;
""")
for row in cur.fetchall():
  print(row)


conn.commit()
conn.close()

('SQL Bootcamp - SQLite Databases - Part IV', 5.0, 'James Smith')
('100+ Exercises - Advanced Python Programming', 5.0, 'James Smith')
('150+ Exercises - Data Structures in Python - Hands-On', 5.0, 'James Smith')
('150+ Exercises - Object Oriented Programming in C++ - OOP', 5.0, 'Mary Brown')
('150+ Exercises - Object Oriented Programming in Python - OOP', 4.92, 'James Smith')
('Machine Learning Bootcamp in Python Part II - from A to Z', 4.89, 'James Smith')
('250+ Exercises - Data Science Bootcamp in Python', 4.89, 'James Smith')
('Artificial Intelligence - Computer Vision in Python', 4.87, 'James Smith')
('Machine Learning Bootcamp in Python Part III - Exercises', 4.87, 'James Smith')
('Machine Learning - Decision Trees and Random Forests - Python', 4.86, 'James Smith')


## **Module 2**

We need to extract all the names of the learning paths with the names of the courses included in the path and the subcategory of each component course (see below).

Display the following columns in the output table:

- *title* column from `"kmooc_learningpath"` table, aliased as `"path_title"`
- *title* column from `"kmooc_course"` table, aliased as `"course_title"`
- *subcategory* column from the `"kmooc_course"` table

Sort the output table in the ascending order of the *path_title* and *course_title* columns. Limit the result to the first 10 records.

**Format**:
>```
('path_title', 'course_title', 'course_subcategory')
...
```

Print the result using the code cell below.

In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
cur.execute("""
  SELECT lp.title "path_title",
    c.title "course_title",
    c.subcategory
  FROM kmooc_course AS c
  JOIN kmooc_learningpath as lp
  JOIN kmooc_learningpath_courses as lpc
  ON c.id == lpc.course_id and lp.id = lpc.learningpath_id
  ORDER BY path_title, course_title ASC
  LIMIT 10;
""")
for row in cur.fetchall():
  print(row)

conn.commit()
conn.close()

('Path All-in-One', '100+ Exercises - Advanced Python Programming', 'programming languages')
('Path All-in-One', '120+ Exercises in Python - Data Science - NumPy', 'data science')
('Path All-in-One', '130+ Exercises in Python - Data Science - Pandas', 'data science')
('Path All-in-One', '150+ Exercises - Object Oriented Programming in C++ - OOP', 'programming languages')
('Path All-in-One', '150+ Exercises - Object Oriented Programming in Python - OOP', 'programming languages')
('Path All-in-One', '150+ Exercises - Programming in C language - from A to Z', 'programming languages')
('Path All-in-One', '150+ Exercises - Programming in C++ - from A to Z', 'programming languages')
('Path All-in-One', '200+ Exercises - Programming in Python - from A to Z', 'programming languages')
('Path All-in-One', '210+ Exercises - Python - Embedded Modules - A to Z', 'programming languages')
('Path All-in-One', '250+ Exercises - Data Science Bootcamp in Python', 'programming languages')


## **Module 3**

We need to extract all the names of the learning paths with the number of courses in each path (see below).

Display the following columns in the output table:

- *title* column from `"esmartdata_learningpath"` table, aliased as `"path_title"`
- *num_courses* - the number of courses for a given path

Sort the output table in the descending order of the *num_courses* column.

**Format**:
>```
('path_title', num_courses)
...
```

Print the result using the code cell below.

In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
cur.execute("""
  SELECT lp.title "path_title",
    count(lp.title) "num_courses"
  FROM kmooc_learningpath AS lp
  JOIN kmooc_learningpath_courses AS lpc
  ON lp.id = lpc.learningpath_id
  GROUP BY lp.title
  ORDER BY "num_courses" DESC;
""")
for row in cur.fetchall():
  print(row)

conn.commit()
conn.close()

('Path All-in-One', 30)
('Path Data Scientist / Deep Learning Engineer', 21)
('Path Data Scientist / Machine Learning Engineer', 19)
('Path BI Analyst / Data Analyst', 15)
('Path Big Data Analyst', 13)
('Path Python Developer', 8)
('Path C++ Developer', 3)
('Path C Developer', 2)


## **Module 4**

We need to extract all the names of the learning paths with the names of the courses included in the path and the name of each course's instructor (see below).

Display the following columns in the output table:

- *title* column from `"kmooc_learningpath"` table, aliased as `"path_title"`
- *title* column from `"kmooc_course"` table, aliased as `"course_title"`
- *instructor* - concatenation of the *first_name* and *last_name* columns with a space character (`"kmooc_instructor"` table)

Sort the output table in the ascending order of the *path_title* and *course_title* columns. Limit the result to the first 10 records.

**Format**:
>```
('path_title', 'course_title', 'first_name last_name')
...
```

Print the result using the code cell below.

In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
cur.execute("""
  SELECT lp.title "path_title",
    c.title "course_title",
    i.first_name || ' ' || i.last_name "name"
  FROM kmooc_course AS c
  JOIN kmooc_learningpath AS lp
  JOIN kmooc_learningpath_courses AS lpc
  ON c.id = lpc.course_id and lp.id = lpc.learningpath_id
  JOIN kmooc_instructor AS i
  ON c.instructor_id = i.id
  ORDER BY path_title, course_title ASC
  LIMIT 10;
""")
for row in cur.fetchall():
  print(row)

conn.commit()
conn.close()

('Path All-in-One', '100+ Exercises - Advanced Python Programming', 'James Smith')
('Path All-in-One', '120+ Exercises in Python - Data Science - NumPy', 'James Smith')
('Path All-in-One', '130+ Exercises in Python - Data Science - Pandas', 'James Smith')
('Path All-in-One', '150+ Exercises - Object Oriented Programming in C++ - OOP', 'Mary Brown')
('Path All-in-One', '150+ Exercises - Object Oriented Programming in Python - OOP', 'James Smith')
('Path All-in-One', '150+ Exercises - Programming in C language - from A to Z', 'Mary Brown')
('Path All-in-One', '150+ Exercises - Programming in C++ - from A to Z', 'Mary Brown')
('Path All-in-One', '200+ Exercises - Programming in Python - from A to Z', 'James Smith')
('Path All-in-One', '210+ Exercises - Python - Embedded Modules - A to Z', 'James Smith')
('Path All-in-One', '250+ Exercises - Data Science Bootcamp in Python', 'James Smith')


## **Module 5**

We need to extract the number of courses for each instructor in all learning paths.

Display the following columns in the output table:

 - *title* column from `"kmooc_learningpath"` table, aliased as `"path_title"`
 - *instructor* - concatenation of the *first_name* and *last_name* columns with a space character (`"kmooc_instructor"` table)
 - *num_courses* - number of courses per instructor in the given learning path

Sort the output table in the ascending order of the *path_title* and *instructor* columns.


**Format**:
>```
('path_title', 'first_name last_name', num_courses)
...
```

Print the result using the code cell below.

In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
cur.execute("""
  SELECT lp.title "path_title",
    i.first_name || ' ' || i.last_name "name",
     count("instructor") "num_courses"
  FROM kmooc_learningpath AS lp
  JOIN kmooc_course AS c
  JOIN kmooc_learningpath_courses AS lpc
  ON c.id = lpc.course_id and lp.id = lpc.learningpath_id
  JOIN kmooc_instructor AS i
  ON c.instructor_id = i.id
  GROUP BY "path_title", "name"
  ORDER BY "path_title", "name" ASC;
""")
for row in cur.fetchall():
  print(row)

conn.commit()
conn.close()

('Path All-in-One', 'James Smith', 25)
('Path All-in-One', 'Mary Brown', 5)
('Path BI Analyst / Data Analyst', 'James Smith', 15)
('Path Big Data Analyst', 'James Smith', 13)
('Path C Developer', 'Mary Brown', 2)
('Path C++ Developer', 'Mary Brown', 3)
('Path Data Scientist / Deep Learning Engineer', 'James Smith', 21)
('Path Data Scientist / Machine Learning Engineer', 'James Smith', 19)
('Path Python Developer', 'James Smith', 8)


## **Module 6**

We need to extract the number of courses at the category, sub-category, and instructor levels (see below).

Display the following columns in the output table:

 - *category* column from the `"kmooc_course"` table
 - *subcategory* column from the `"kmooc_course"` table
 - *instructo*r - concatenation of the *first_name* and *last_name* columns with a space character (`"kmooc_instructor"` table)
 - *num_courses* - number of courses per *category*, *subcategory* and *instructor*

Sort the output table in the descending order of the *num_courses* column.

**Format**:
>```
('category', 'subcategory', 'first_name last_name', num_courses)
...
```

Print the result using the code cell below.

In [None]:
import sqlite3

conn = sqlite3.connect('kmooc.sqlite3')
cur = conn.cursor()

# Enter your solution here
cur.execute("""
  SELECT c.category,
    c.subcategory,
    i.first_name || ' ' || last_name "name",
    count() "num_courses"
  FROM kmooc_course AS c
  JOIN kmooc_instructor AS i
  ON c.instructor_id = i.id
  GROUP BY c.category, c.subcategory, "name"
  ORDER BY "num_courses" DESC;
""")
for row in cur.fetchall():
  print(row)

conn.commit()
conn.close()

('development', 'programming languages', 'James Smith', 18)
('development', 'data science', 'James Smith', 14)
('development', 'database design & development', 'James Smith', 7)
('development', 'programming languages', 'Mary Brown', 5)
('development', 'web development', 'James Smith', 1)
