## CS 210 Spring 2024 - Apr 8
### Relational Databases

---

### <font color="brown">Nobel Prize Winners Database</font>

#### We are going to build a database for the noble prize winners data that we saw previously in our discussion on JSON (Week 7)

---

In [1]:
import json, requests

nobel_url = 'http://api.nobelprize.org/v1/prize.json'
resp = requests.get(nobel_url)
nobels = json.loads(resp.text)

In [2]:
nobels['prizes'][0]

{'year': '2023',
 'category': 'chemistry',
 'laureates': [{'id': '1029',
   'firstname': 'Moungi',
   'surname': 'Bawendi',
   'motivation': '"for the discovery and synthesis of quantum dots"',
   'share': '3'},
  {'id': '1030',
   'firstname': 'Louis',
   'surname': 'Brus',
   'motivation': '"for the discovery and synthesis of quantum dots"',
   'share': '3'},
  {'id': '1031',
   'firstname': 'Aleksey',
   'surname': 'Yekimov',
   'motivation': '"for the discovery and synthesis of quantum dots"',
   'share': '3'}]}

**3 people shared the prize for Chemistry in 2023. share=3 means they all got an equal share**

In [3]:
nobels['prizes'][1]

{'year': '2023',
 'category': 'economics',
 'laureates': [{'id': '1034',
   'firstname': 'Claudia',
   'surname': 'Goldin',
   'motivation': '"for having advanced our understanding of women’s labour market outcomes"',
   'share': '1'}]}

**Each entry in the list of values for nobels['prizes'] is a dictionary of year+category+list of laureates**

##### <font color="brown">The data we want is year, category, first Name, surname, motivation, and share for each Nobel prize winner (laureate)</font>

In [4]:
for i,prize in enumerate(nobels['prizes']):
    if not 'laureates' in prize:
        break
nobels['prizes'][i]

{'year': '1972',
 'category': 'peace',
 'overallMotivation': '"No Nobel Prize was awarded this year. The prize money for 1972 was allocated to the Main Fund."'}

In [5]:
for prize in nobels['prizes']:
    if not 'laureates' in prize:
        continue
    done=False
    for winner in prize['laureates']:
        if not 'surname' in winner:
            print(prize['year'], prize['category'])
            print(winner)
            done=True
            break
    if done:
        break

2022 peace
{'id': '1019', 'motivation': '"The Peace Prize laureates represent civil society in their home countries. They have for many years promoted the right to criticise power and protect the fundamental rights of citizens. They have made an outstanding effort to document war crimes, human right abuses and the abuse of power. Together they demonstrate the significance of civil society for peace and democracy."', 'share': '3', 'firstname': 'Memorial'}


**There is at least one year in which the nobel prize was not awarded, and at least one year in which there is a winner without surname. For the latter, the firstname appears to be the name of an organization.**

---

#### <font color="brown">Preparing to store in database</font>

##### In order to set aside most appropriate amount of space for storage in the database, we need to know maximum lengths for the motivation, first name, and last name strings. 

In [6]:
# max lengths of motivation, first name, and last name
category_maxlen=0
motiv_maxlen=0
fname_maxlen=0
lname_maxlen=0
no_laureates = []
for prize in nobels['prizes']:
    if not 'laureates' in prize:
        no_laureates.append(prize['year'])
        continue
    cat = prize['category']
    category_maxlen = max(category_maxlen, len(cat))
    for winner in prize['laureates']:
        motiv = winner.get('motivation').strip('"')
        motiv_maxlen = max(motiv_maxlen, len(motiv))
        fname = winner.get('firstname')
        fname_maxlen = max(fname_maxlen, len(fname))
        lname = winner.get('surname','')
        lname_maxlen = max(lname_maxlen, len(lname))

print(f'No laureates in the years: {no_laureates}\n')
print('Max lengths')
print(f'Category: {category_maxlen}')
print(f'Motivation: {motiv_maxlen}')
print(f'First name: {fname_maxlen}')
print(f'Surname: {lname_maxlen}')

No laureates in the years: ['1972', '1967', '1966', '1956', '1955', '1948', '1943', '1943', '1942', '1942', '1942', '1942', '1942', '1941', '1941', '1941', '1941', '1941', '1940', '1940', '1940', '1940', '1940', '1939', '1935', '1934', '1933', '1932', '1931', '1928', '1925', '1924', '1924', '1923', '1921', '1919', '1918', '1918', '1918', '1917', '1917', '1916', '1916', '1916', '1916', '1915', '1915', '1914', '1914']

Max lengths
Category: 10
Motivation: 374
First name: 59
Surname: 26


---

#### <font color="brown">Creating a nobels database</font>

##### <font color="brown">1. Create a database named nobels</font>

##### Execute the following commands in the Terminal window to create the 'nobels' database. 

##### Note that every database you create must be prefixed with your netid_, so my databases are venugopa_\<dbname>

- Log into mysql
    
    (MariaDB is compatible with MySQL)
    
    <pre>
    venugopa@data8:~/cs210_s24/lectures$ mysql
    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 53
    Server version: 10.6.16-MariaDB-0ubuntu0.22.04.1 Ubuntu 22.04

    Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    MariaDB [(none)]>
    </pre>
    
    Logging in will put you in a MySQL client session where you can execute MySQL commands.

- Create the database with the <tt>create database</tt> statement<br>
https://dev.mysql.com/doc/refman/8.0/en/create-database.html<br>
Remember, you need to prefix the database name with **netid_**

    <pre>
    MariaDB [(none)]> create database venugopa_nobels;
    Query OK, 1 row affected (0.004 sec)
    </pre>
    
- You can verify that the database has been created by using the show databases command
    
    <pre>
    MariaDB [(none)]> show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | venugopa_nobels    |
    +--------------------+
    2 rows in set (0.002 sec)
    
    MariaDB [(none)]>
    </pre>

---

##### <font color="brown">2. Create a winners table in the database</font>

##### **i) Define a couple of columns to start with - execute the following in the MySQL client window:**

- Choose the nobels database

<pre>
    MariaDB [(none)]> use venugopa_nobels
    Database changed
    MariaDB [venugopa_nobels]>
    
</pre>

- Now that you are using the <tt>nobels</tt> database, you can see what tables there are if any:

<pre>
    MariaDB [venugopa_nobels]> show tables;
    Empty set (0.002 sec)   
</pre>

- There are no tables, since we didn't make any yet. Go ahead and create a table called <tt>winners</tt><br>
    https://dev.mysql.com/doc/refman/8.0/en/create-table.html
    
<pre>
   MariaDB [venugopa_nobels]> create table winners (year year not null, category char(10) not null);
   Query OK, 0 rows affected (0.013 sec)
</pre>

- Column <tt>year</tt> has datatype **<tt>year</tt>**<br>
https://dev.mysql.com/doc/refman/8.0/en/year.html

- Column <tt>category</tt> has datatype **<tt>char(10)</tt>** meaning space worth 10 characters.<br> 
The actual number of characters stored in this space may be less than 10, but 10 characters of space is always set aside.<br>
https://dev.mysql.com/doc/refman/8.0/en/char.html

- When you set a column to be **not null**, it means when you add a row, that column *must* have a value, otherwise the database system will reject the add.<br>

- You can verify that the table has been created by using the **<tt>show tables</tt>** statement

<pre>
    MariaDB [venugopa_nobels]> show tables;
    +---------------------------+
    | Tables_in_venugopa_nobels |
    +---------------------------+
    | winners                   |
    +---------------------------+
    1 row in set (0.002 sec)
    
</pre>

- You can see the table schema (structure) with the **<tt>desc</tt>** statement:

<pre>
MariaDB [venugopa_nobels]> desc winners;
+----------+----------+------+-----+---------+-------+
| Field    | Type     | Null | Key | Default | Extra |
+----------+----------+------+-----+---------+-------+
| year     | year(4)  | NO   |     | NULL    |       |
| category | char(10) | NO   |     | NULL    |       |
+----------+----------+------+-----+---------+-------+
2 rows in set (0.005 sec)
</pre>

- Alternatively, you can use the **<tt>show columns from<tt>** statement<br>
   https://dev.mysql.com/doc/refman/8.0/en/show-columns.html
    
<pre>
MariaDB [venugopa_nobels]> show columns from winners;
+----------+----------+------+-----+---------+-------+
| Field    | Type     | Null | Key | Default | Extra |
+----------+----------+------+-----+---------+-------+
| year     | year(4)  | NO   |     | NULL    |       |
| category | char(10) | NO   |     | NULL    |       |
+----------+----------+------+-----+---------+-------+
2 rows in set (0.002 sec)
</pre>
    
- The default value being NULL can be ignored since the Null column says you must supply a value. <br>
However, if you had chosen to NOT say *not null*, it means a null (no value) is allowed. In which case, you may specifiy a default, which would be the assumed value if none is supplied - see the **lname** column below.

##### **ii) Add in the rest of the columns**

- Recall the max lengths we found earlier for various attributes:

<pre>
Category: 10
Motivation: 374
First name: 59
Surname: 26
</pre>

- We are going to be somewhat conservative, and use extra space toward motivation, first name, and surname, in case future additions are longer.

<pre>
MariaDB [venugopa_nobels]> alter table winners add column fname varchar(80) not null;
MariaDB [venugopa_nobels]> alter table winners add column lname varchar(40);
MariaDB [venugopa_nobels]> alter table winners add column motivation varchar(500) not null;
MariaDB [venugopa_nobels]> alter table winners add column share tinyint not null;
</pre>

- The **<tt>alter table</tt>** statement can be used to add or modify columns <br>
https://dev.mysql.com/doc/refman/8.0/en/alter-table.html

- You don't *have* to do it this way (create with a few columns, then add in all the rest), but it is sometimes easier to break it up like this than have a super long <tt>create table</tt> statement

- The **<tt>fname</tt>**, **<tt>lname</tt>**, and **<tt>motivation</tt>** columns are all of type **<tt>varchar</tt>**. Basically <tt>varchar</tt> is for variable characters, meaning the space specified is the maximum required, but unlike **<tt>char</tt>** the actual storage will not necessarily equal the maximum specified. Instead, it will be the actual number of characters, plus a very small amount of constant extra space to tell how many actual characters there are:<br>
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings

- The **<tt>share</tt>** column is of type **</tt>tinyint</tt>**, which is the least amount of integer space you can set aside for a column<br>
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-numeric

<pre>
MariaDB [venugopa_nobels]> desc winners;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| year       | year(4)      | NO   |     | NULL    |       |
| category   | char(10)     | NO   |     | NULL    |       |
| fname      | varchar(80)  | NO   |     | NULL    |       |
| lname      | varchar(40)  | YES  |     | NULL    |       |
| motivation | varchar(500) | NO   |     | NULL    |       |
| share      | tinyint(4)   | NO   |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
6 rows in set (0.004 sec)
</pre>

**Key** can be any column or combinaton of columns that has unique values.<br> 
In this table, no column has unique values.<br>
We could do a combination of year and category as unique, since no two rows can have the same combination of year and category, but we will leave it be for now.

---

#### <font color="brown">Loading data into winners table</font>

- The JSON data can be added to the **winners** table via Python, but we will get to the Python-based database access later. For now, we will load data from a pre-done database file, **nobelsv1.sql**

- In Terminal, exit the mysql session: 

<pre>
    MariaDB [venugopa_nobels]> exit
    Bye
    venugopa@data8:~/cs210_s24/lectures$
</pre>

- Then, issue the following command:

<pre>
    venugopa@data8:~/cs210_s24/lectures$ mysql venugopa_nobels < nobelsv1.sql
</pre>

**Note: the nobelsv1.sql file must be in the folder where you are executing the command.**

---

#### <font color="brown">Querying the winners Table</font>

##### **Queries are done using the <tt>select</tt> statement**
https://dev.mysql.com/doc/refman/8.0/en/select.html

##### **The result of any query is a table**

---

- Log into MySQL again, directly into the nobels database:

<pre>
venugopa@data8:~/cs210_s24/lectures$ mysql venugopa_nobels
...
MariaDB [venugopa_nobels]>
</pre>

---

##### **1. How many entries are in the table?**

<pre>
MariaDB [venugopa_nobels]> select count(*) from winners;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
</pre>

##### **2. Show the first 5 rows in the table**

<pre>
MariaDB [venugopa_nobels]> select * from winners limit 5;
+------+------------+---------+---------+---------------------------------------------------------------------------+-------+
| year | category   | fname   | lname   | motivation                                                                | share |
+------+------------+---------+---------+---------------------------------------------------------------------------+-------+
| 2023 | Chemistry  | Moungi  | Bawendi | for the discovery and synthesis of quantum dots                           |     3 |
| 2023 | Chemistry  | Louis   | Brus    | for the discovery and synthesis of quantum dots                           |     3 |
| 2023 | Chemistry  | Aleksey | Yekimov | for the discovery and synthesis of quantum dots                           |     3 |
| 2023 | Economics  | Claudia | Goldin  | for having advanced our understanding of women’s labour market outcomes   |     1 |
| 2023 | Literature | Jon     | Fosse   | for his innovative plays and prose which give voice to the unsayable      |     1 |
+------+------------+---------+---------+---------------------------------------------------------------------------+-------+
5 rows in set (0.001 sec)
</pre>

**<tt>select \*</tt>** selects all columns.<p>

The output can be hard to read if any of the columns is extra long. <br>
An alternative is to use this variant, with a '\G' at the end of the query:
    
<pre>
MariaDB [venugopa_nobels]> select * from winners limit 5\G;
*************************** 1. row ***************************
      year: 2023
  category: Chemistry
     fname: Moungi
     lname: Bawendi
motivation: for the discovery and synthesis of quantum dots
     share: 3
*************************** 2. row ***************************
      year: 2023
  category: Chemistry
     fname: Louis
     lname: Brus
motivation: for the discovery and synthesis of quantum dots
     share: 3
*************************** 3. row ***************************
      year: 2023
  category: Chemistry
     fname: Aleksey
     lname: Yekimov
motivation: for the discovery and synthesis of quantum dots
     share: 3
*************************** 4. row ***************************
      year: 2023
  category: Economics
     fname: Claudia
     lname: Goldin
motivation: for having advanced our understanding of women’s labour market outcomes
     share: 1
*************************** 5. row ***************************
      year: 2023
  category: Literature
     fname: Jon
     lname: Fosse
motivation: for his innovative plays and prose which give voice to the unsayable
     share: 1
5 rows in set (0.001 sec)

ERROR: No query specified
</pre>


**Show the last five rows, use limit with offset**

<pre>
MariaDB [venugopa_nobels]> select * from winners limit 5 offset 995\G;
*************************** 1. row ***************************
      year: 1901
  category: Literature
     fname: Sully
     lname: Prudhomme
motivation: in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect
     share: 1
*************************** 2. row ***************************
      year: 1901
  category: Peace
     fname: Henry
     lname: Dunant
motivation: for his humanitarian efforts to help wounded soldiers and create international understanding
     share: 2
*************************** 3. row ***************************
      year: 1901
  category: Peace
     fname: Frédéric
     lname: Passy
motivation: for his lifelong work for international peace conferences, diplomacy and arbitration
     share: 2
*************************** 4. row ***************************
      year: 1901
  category: Physics
     fname: Wilhelm Conrad
     lname: Röntgen
motivation: in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him
     share: 1
*************************** 5. row ***************************
      year: 1901
  category: Medicine
     fname: Emil
     lname: von Behring
motivation: for his work on serum therapy, especially its application against diphtheria, by which he has opened a new road in the domain of medical science and thereby placed in the hands of the physician a victorious weapon against illness and deaths
     share: 1
5 rows in set (0.002 sec)

ERROR: No query specified
</pre>

##### **3. What are all the years for which nobel winners are listed?**

<pre>
MariaDB [venugopa_nobels]> select distinct(year) from winners;
+------+
| year |
+------+
| 2023 |
| 2022 |
| 2021 |
| 2020 |
| 2019 |
| 2018 |
  ...
  ...
| 1903 |
| 1902 |
| 1901 |
+------+
</pre>

##### **4. For how many years are winners listed?**

<pre>
MariaDB [venugopa_nobels]> select count(distinct(year)) from winners;
+-----------------------+
| count(distinct(year)) |
+-----------------------+
|                   120 |
+-----------------------+
1 row in set (0.001 sec)
</pre>

##### **5. Who are all the winners in 2020, and in which category?**

<pre>
MariaDB [venugopa_nobels]> select category, fname, lname from winners where year=2020;
+------------+----------------------+-------------+
| category   | fname                | lname       |
+------------+----------------------+-------------+
| Chemistry  | Emmanuelle           | Charpentier |
| Chemistry  | Jennifer A.          | Doudna      |
| Economics  | Paul                 | Milgrom     |
| Economics  | Robert               | Wilson      |
| Literature | Louise               | Glück       |
| Peace      | World Food Programme |             |
| Physics    | Roger                | Penrose     |
| Physics    | Reinhard             | Genzel      |
| Physics    | Andrea               | Ghez        |
| Medicine   | Harvey               | Alter       |
| Medicine   | Michael              | Houghton    |
| Medicine   | Charles              | Rice        |
+------------+----------------------+-------------+
12 rows in set (0.00 sec)
</pre>
- Any number of columns can be specified in the **select** statement, separated by commas
- The **where** clause sets up a condition

##### **6. Who were the winners of the Literature prize in the years 2011 thru 2021?**

<pre>
MariaDB [venugopa_nobels]> select fname, lname, year from winners where category='Literature' and year between 2010 and 2020;
+----------+--------------+------+
| fname    | lname        | year |
+----------+--------------+------+
| Louise   | Glück        | 2020 |
| Peter    | Handke       | 2019 |
| Olga     | Tokarczuk    | 2018 |
| Kazuo    | Ishiguro     | 2017 |
| Bob      | Dylan        | 2016 |
| Svetlana | Alexievich   | 2015 |
| Patrick  | Modiano      | 2014 |
| Alice    | Munro        | 2013 |
| Mo       | Yan          | 2012 |
| Tomas    | Tranströmer  | 2011 |
| Mario    | Vargas Llosa | 2010 |
+----------+--------------+------+
11 rows in set (0.002 sec)
</pre>
- For numeric types, you can use the **between** keyword to select range

##### **7. List all details of all the prizes that Marie Curie won**

<pre>
MariaDB [venugopa_nobels]> select * from winners where lname='Curie' and fname='Marie'\G;
*************************** 1. row ***************************
      year: 1911
  category: Chemistry
     fname: Marie
     lname: Curie
motivation: in recognition of her services to the advancement of chemistry by the discovery of the elements radium and polonium, by the isolation of radium and the study of the nature and compounds of this remarkable element
     share: 1
*************************** 2. row ***************************
      year: 1903
  category: Physics
     fname: Marie
     lname: Curie
motivation: in recognition of the extraordinary services they have rendered by their joint researches on the radiation phenomena discovered by Professor Henri Becquerel
     share: 4
2 rows in set (0.001 sec)
</pre>

##### **8. List all details of all the prizes Bardeen won**

<pre>
MariaDB [venugopa_nobels]> select * from winners where lname='Bardeen'\G;
*************************** 1. row ***************************
      year: 1972
  category: Physics
     fname: John
     lname: Bardeen
motivation: for their jointly developed theory of superconductivity, usually called the BCS-theory
     share: 3
*************************** 2. row ***************************
      year: 1956
  category: Physics
     fname: John
     lname: Bardeen
motivation: for their researches on semiconductors and their discovery of the transistor effect
     share: 3
2 rows in set (0.001 sec)
</pre>