# Project #4

## Kristjan Lõhmus, Rimmo Rõõm

Description: \
We desire to build a system to manage the internship locations for the Professional training center. For
this purpose, in mind, we host the discussion with the management committee to look for their
specifications. The resumes from this meeting are:
* A city is located a specific region (or state, for example, for federal countries like the USA), country
and continent. By this way, a city can be identified by the triplet (e.g. Tartu linn, Estonia, Europe).
According to their explanation, a city is associated with a single region, a single country and a
single continent. These three pieces of data are mandatory for each city in the db.
* An organization (for example a large company, a university or a research center) is structured into
services (which may have as name of service the words department, division, laboratory, etc.). A
service is characterized by an address, which is the city in which the service is located.
* The student’s supervisors or service employees are characterized by their names and contact
details and a list of keywords which define their sector of activity, and linked to the service in
which they work.

## Task 1. Modelling

### 1. CDM model
Here's a high level CDM that I would present to some C-level management: \
![High-level CDM model](./img/high_level_cdm_model.jpg)

And here's the low-level one that I would present to more technical people: \
![Low-level CDM model](./img/low_level_cdm_model.jpg)


### 2. RDM

The RDM is pretty much the same as the low-level CDM with foreign keys added (Service has Address added as an extra field).

### 3. Document Data model

#TODO

## 2. Implementation

### 1. Implement each structure on a native data engine

#### PostgreSQL

In [20]:
import psycopg2
import pandas.io.sql as sqlio
import pandas as pd
import warnings
import time
warnings.filterwarnings('ignore')

In [3]:
conn = psycopg2.connect(
    host= 'localhost',
    password = "postgres",
    user = "postgres",
    port = 5432,
    )
conn.autocommit = True
cursor = conn.cursor()
cursor.execute('CREATE SCHEMA training_centre;')
cursor.execute('set search_path = "training_centre";')

In [4]:
cursor.execute('''
CREATE TABLE Continent (
    ContinentID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);
''')

In [5]:
cursor.execute('''
CREATE TABLE Country (
    CountryID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL,
    ContinentID INT NOT NULL,
    FOREIGN KEY (ContinentID) REFERENCES Continent(ContinentID)
);
''')

In [6]:
cursor.execute('''
CREATE TABLE Region (
    RegionID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL,
    CountryID INT NOT NULL,
    FOREIGN KEY (CountryID) REFERENCES Country(CountryID)
);
''')

In [7]:
cursor.execute('''
CREATE TABLE City (
    CityID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL,
    RegionID INT NOT NULL,
    FOREIGN KEY (RegionID) REFERENCES Region(RegionID)
);
''')

In [8]:
cursor.execute('''
CREATE TABLE Organization (
    OrganizationID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);
''')

In [9]:
cursor.execute('''
CREATE TABLE Service (
    ServiceID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL,
    Address INT NOT NULL,
    OrganizationID INT NOT NULL,
    FOREIGN KEY (Address) REFERENCES City(CityID),
    FOREIGN KEY (OrganizationID) REFERENCES Organization(OrganizationID)
);
''')

In [10]:
cursor.execute('''
CREATE TABLE Supervisor (
    SupervisorID SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL,
    ContactDetails TEXT NOT NULL,
    Keywords TEXT NOT NULL,
    ServiceID INT NOT NULL,
    FOREIGN KEY (ServiceID) REFERENCES Service(ServiceID)
);
''')

#### MongoDB

In [None]:
#TODO

### 2. Implement the uniqueness constraint on the fields: continent name, organization name on both.

#### PostgreSQL

In [11]:
cursor.execute('''
ALTER TABLE Continent
ADD CONSTRAINT unique_continent_name UNIQUE (Name);
''')

In [12]:
cursor.execute('''
ALTER TABLE Organization
ADD CONSTRAINT unique_organization_name UNIQUE (Name);
''')

#### MongoDB

In [None]:
# TODO

### 3. Populate your database with at least the following cardinalities [10 organizations, 5 Services/per organization (randomly assigned to different continents)]

#### PostgreSQL

In [13]:
cursor.execute('''
INSERT INTO Continent (Name) VALUES ('Europe'), ('Asia'), ('Africa'), ('North America'), ('South America');
''')

In [14]:
cursor.execute('''
INSERT INTO Country (Name, ContinentID) VALUES 
('France', 1), 
('Germany', 1),
('China', 2), 
('India', 2), 
('Kenya', 3), 
('South Africa', 3), 
('USA', 4), 
('Canada', 4), 
('Brazil', 5), 
('Argentina', 5);
''')

In [15]:
cursor.execute('''
INSERT INTO Region (Name, CountryID) VALUES 
('Paris', 1), 
('Bavaria', 2), 
('Guangdong', 3), 
('Maharashtra', 4), 
('Nairobi', 5), 
('Western Cape', 6), 
('California', 7), 
('Ontario', 8), 
('São Paulo', 9), 
('Buenos Aires', 10);
''')

In [16]:
cursor.execute('''
INSERT INTO City (Name, RegionID) VALUES 
('Paris', 1), 
('Munich', 2), 
('Guangzhou', 3), 
('Mumbai', 4), 
('Nairobi', 5), 
('Cape Town', 6), 
('San Francisco', 7), 
('Toronto', 8), 
('São Paulo', 9), 
('Buenos Aires', 10);
''')

In [17]:
cursor.execute('''
INSERT INTO Organization (Name) VALUES 
('Harvard University'), 
('MIT'), 
('Stanford University'), 
('Oxford University'), 
('Cambridge University'), 
('Tsinghua University'), 
('Peking University'), 
('ETH Zurich'), 
('University of Tokyo'), 
('Max Planck Institute');
''')

In [18]:
cursor.execute('''
INSERT INTO Service (Name, Address, OrganizationID) VALUES 
('Department of Computer Science', 1, 1), 
('Department of Mathematics', 2, 1), 
('Department of Physics', 3, 1), 
('Department of Biology', 4, 1), 
('Department of Chemistry', 5, 1), 
('Division of Engineering', 6, 2), 
('Division of Humanities', 7, 2), 
('Division of Social Sciences', 8, 2), 
('Division of Natural Sciences', 9, 2), 
('Division of Arts', 10, 2), 
('Institute of Technology', 1, 3), 
('Institute of Medicine', 2, 3), 
('Institute of Law', 3, 3), 
('Institute of Business', 4, 3), 
('Institute of Education', 5, 3), 
('Faculty of Science', 6, 4), 
('Faculty of Engineering', 7, 4), 
('Faculty of Arts', 8, 4), 
('Faculty of Law', 9, 4), 
('Faculty of Medicine', 10, 4), 
('School of Engineering', 1, 5), 
('School of Business', 2, 5), 
('School of Arts', 3, 5), 
('School of Education', 4, 5), 
('School of Law', 5, 5), 
('Research Lab A', 6, 6), 
('Research Lab B', 7, 6), 
('Research Lab C', 8, 6), 
('Research Lab D', 9, 6), 
('Research Lab E', 10, 6), 
('Center for Advanced Studies', 1, 7), 
('Center for Basic Sciences', 2, 7), 
('Center for Applied Sciences', 3, 7), 
('Center for Theoretical Physics', 4, 7), 
('Center for Molecular Biology', 5, 7), 
('Institute of Advanced Research', 6, 8), 
('Institute of Fundamental Research', 7, 8), 
('Institute of Applied Research', 8, 8), 
('Institute of Social Research', 9, 8), 
('Institute of Economic Research', 10, 8), 
('Laboratory of Physics', 1, 9), 
('Laboratory of Chemistry', 2, 9), 
('Laboratory of Biology', 3, 9), 
('Laboratory of Computer Science', 4, 9), 
('Laboratory of Environmental Science', 5, 9), 
('School of Humanities', 6, 10), 
('School of Social Sciences', 7, 10), 
('School of Natural Sciences', 8, 10), 
('School of Engineering', 9, 10), 
('School of Health Sciences', 10, 10);
''')

In [19]:
cursor.execute('''
INSERT INTO Supervisor (Name, ContactDetails, Keywords, ServiceID) VALUES 
('John Doe', 'john@example.com', 'Software Development', 1), 
('Jane Smith', 'jane@example.com', 'Data Science', 2), 
('Jim Brown', 'jim@example.com', 'Networking', 3), 
('Jill White', 'jill@example.com', 'AI Research', 4), 
('Jack Black', 'jack@example.com', 'Cybersecurity', 5);
''')

#### MongoDB

In [None]:
# TODO

## Querying

### 1. Display the name of the organizations and respectively the number of services located on the European continent.

#### PostgreSQL

In [21]:
start_time = time.time()
result = sqlio.read_sql_query("""
SELECT 
    O.Name AS OrganizationName, 
    COUNT(S.ServiceID) AS NumberOfServices
FROM 
    Organization O
JOIN 
    Service S ON O.OrganizationID = S.OrganizationID
JOIN 
    City C ON S.Address = C.CityID
JOIN 
    Region R ON C.RegionID = R.RegionID
JOIN 
    Country CO ON R.CountryID = CO.CountryID
JOIN 
    Continent CON ON CO.ContinentID = CON.ContinentID
WHERE 
    CON.Name = 'Europe'
GROUP BY 
    O.OrganizationID, O.Name;

""",conn)
end_time = time.time()
result.head()

Unnamed: 0,organizationname,numberofservices
0,Harvard University,2
1,Stanford University,2
2,Cambridge University,2
3,Peking University,2
4,University of Tokyo,2


#### MongoDB

In [None]:
# TODO

### 2. Analyze and compare the execution performance on the two systems.

#### PostgreSQL

In [22]:
print(f'PostgreSQL executed in {round(end_time - start_time, 2)} seconds')

PostgreSQL executed in 0.22 seconds


#### MongoDB

In [None]:
# TODO