<h2>
<center>
LEARNING PLATFORM DATABASE
</center>
</h2>

<br><h3>
Introduction
</h3>
This Jupyter notebook illustrates the process to create the contents of a database underlying an online learning platform. The database structure is based on my learning experience on platforms such as Coursera or edX. In this first notebook I introduce a conceptual schema of the database I want to implement. <br> In a first version of the schema, there existed one more entity (<i>Unit</i>) and one more relationship linking unit and learning material (<i>Use resources</i>). I restructured the schema and in its final version I preferred to omit these two constructs for ease of implementation. Now <i>Unit</i> is an attribute of <i>Learning material</i>, allowing to identify in which unit the learning material is present. <br> Although the conceptual schema naturally flows into the implementation of a relational schema, and in fact below I present each construct as a list of attribute, mimicking the way relational tables are usually presented, it serves mainly to have a glimpse of the elements of the database and of the way they interact, while the actual implementation will take advantage of both the RDBMS and No-SQL worlds. <br> I will use these notebooks to build a table for each construct and then save these tables as csv files. Next, I will import these csv files into 5 different DBMSs: <a href = 'https://hub.docker.com/_/mysql'>MySQL</a>, <a href = 'https://hub.docker.com/_/mongo'>MongoDB</a>, <a href = 'https://hub.docker.com/_/cassandra'>Cassandra</a>, <a href = 'https://hub.docker.com/_/redis'>Redis</a> and <a href = 'https://hub.docker.com/_/neo4j'>Neo4j</a>. Then I will study the performance of each of these DBMS in implementing queries on this schema and assess which is the one that performs better. <br> The database I'm going to generate is not a true database (although some of its elements may be taken from reality, such as the list of courses, which is in fact a list of the most popular Coursera courses according to <a href = 'https://www.classcentral.com/report/coursera-top-courses/'>Class Central</a>) and it is generated by means of <i>Python</i> dictionaries of lists (the tables). The simulated data is created both using the <i>Python</i> <a href = 'https://faker.readthedocs.io/en/master/'><b>Faker</b></a> module and programmatically (with heavy use of the <i>Python</i> <b>random</b> module).

<br><h3>
Import required modules
</h3>
Here is a list of the modules I'm going to use to simulate the data. The main ones are the <b>Faker</b> module and the <b>random</b> module, while the <b>csv</b> module is also crucial for reading and writing csv files, which will be the final output of these notebooks. The <b>pyplot</b> module from <b>matplotlib</b> is not necessary. I used it to plot the conceptual schema, but this may be incorporated in markdown which seems more efficient and better presented. Finally, the <b>datetime</b> module is very useful for manipulating dates.

In [1]:
from matplotlib import pyplot as plt
from faker import Faker
from faker.providers import DynamicProvider
from random import randint, sample
import datetime
import csv

<br><h3>
Schema definition
</h3>
<b><u>E-R model</u></b><br>
Here I present the entity-relationship model illustrating the database concept that I want to implement.<br> There are 5 entities and 4 relationships. Each entity represents a distinct element of the database, the main constituents and actors of the application. Relationships describe the way in which the elements interact with each other.
<img src = 'sources/e-r_final.png'>
Entities are characterised by attributes, which describe qualities that they possess and that are useful for uniquely identifying their instances. The moment entities interact with each other through relationships, they can acquire attributes that characterize the interaction.

<br><h3>
Preliminary operations
</h3>
In the following code block I define the local path that hosts the project. I use two strings, one for a Windows system, one for a Mac system that I will use alternatively. The block also hosts a few lines to plot the conceptual schema by means of the <b>pyplot</b> module, but this is commented out because I've already shown it within the above Markdown block.

In [2]:
# Main path where the notebooks are located (Windows path and Mac path)
winPath = 'C:/Users/mzlarosa/OneDrive - unime.it/Learning/CdL Informatica/Anno II - Database/Module B/project/'
macPath = '/Users/mau/OneDrive - unime.it/Learning/CdL Informatica/Anno II - Database/Module B/project/'

# The code that follows is commented out because I have already shown the conceptual schema above.
#plt.figure(figsize = (9, 6))
#figure = plt.imread(winPath + 'sources/e-r_final.png')
#figure = plt.imread(macPath + 'sources/e-r_final.png')
#plt.imshow(figure)
#plt.axis('off')

<h4>
<u>Python schema implementation</u><br>
</h4>    
The following code can effectively represent the entities and relationships of the database application. Each statement creates lists of attributes. Each list name correspond to an entity or relationship name, while the elements of the list (strings) represent their attributes. The lines, taken together, represent the database schema in a relational fashion, where each line is a table schema. Through iteration of the lists' elements I will create the keys of the Python dictionaries of lists that will store the the table instances with the actual (fake) data.

In [3]:
# tables' schemas
studentSchema = ['studentID', 'firstName', 'lastName', 'dateOfBirth', 'genre', 'country', 'town', 'email']
instructorSchema = ['instructorID', 'role', 'firstName', 'lastName', 'genre', 'email']
courseSchema = ['courseID', 'discipline', 'courseName', 'courseYear', 'syllabus']
materialSchema = ['materialID', 'unit', 'materialType', 'name', 'dimension']
assessmentSchema = ['assessmentID', 'assessmentType', 'name', 'thresholdScore', 'maxScore']
enrollingSchema = ['course', 'student', 'enrollDate']
teachingSchema = ['course', 'instructor']
learningSchema = ['course', 'student', 'learning_material', 'accessDate']
gradingSchema = ['course', 'student', 'assessment', 'score', 'attemptDate']

In [4]:
print('\nShow the tables in the database with their attributes:\n')
print('studentSchema:', ' ' * (18 - len('studentSchema:')), studentSchema)
print('instructorSchema:', ' ' * (18 - len('instructorSchema:')), instructorSchema)
print('courseSchema:', ' ' * (18 - len('courseSchema:')), courseSchema)
print('assessmentSchema:', ' ' * (18 - len('assessmentSchema:')), assessmentSchema)
print('materialSchema:', ' ' * (18 - len('materialSchema:')), materialSchema)
print('enrollingSchema:', ' ' * (18 - len('enrollingSchema:')), enrollingSchema)
print('teachingSchema:', ' ' * (18 - len('teachingSchema:')), teachingSchema)
print('gradingSchema:', ' ' * (18 - len('gradingSchema:')), gradingSchema)
print('learningSchema:', ' ' * (18 - len('learningSchema:')), learningSchema)


Show the tables in the database with their attributes:

studentSchema:      ['studentID', 'firstName', 'lastName', 'dateOfBirth', 'genre', 'country', 'town', 'email']
instructorSchema:   ['instructorID', 'role', 'firstName', 'lastName', 'genre', 'email']
courseSchema:       ['courseID', 'discipline', 'courseName', 'courseYear', 'syllabus']
assessmentSchema:   ['assessmentID', 'assessmentType', 'name', 'thresholdScore', 'maxScore']
materialSchema:     ['materialID', 'unit', 'materialType', 'name', 'dimension']
enrollingSchema:    ['course', 'student', 'enrollDate']
teachingSchema:     ['course', 'instructor']
gradingSchema:      ['course', 'student', 'assessment', 'score', 'attemptDate']
learningSchema:     ['course', 'student', 'learning_material', 'accessDate']
