# COMSW4111_003_2024_3: Selected Final Exam Answers

## Introduction

It is easy to find the correct answers to many of the answers to the W4111 - Introduction to Databases, Section 003/V03, Fall 2024 final exam. For many questions, slides from the lectures or from the slides associated with the recommended textbook directly provide answers. ChatGPT is extremely good and generating correct answers, although it tends to be to verbose. In fact, it borders on blovoation.

The correct answers for some questions are not as easily determined. Moreover, the rubric or what we were specifically looking for might be unclear. This notebook contains answers and explanations for selected questions from the final exam.

## Initialize

In [1]:
%load_ext sql

In [2]:
%sql mysql+pymysql://root:dbuserdbuser@localhost

## Answers to Specific Questions

### 8. Explain cascading actions in referential integrity constraints.

Slide 3.58 from the recommended textbook and lectures slides provide an explanation.

Key concepts that we were specifically looking for in the answer were:
- Applies to foreign-key, referential integrity constraints.
- Applies to DELETE and UPDATE on the referenced table.
- UPDATE is only an issue if the update _changes the referenced column(s)._
- Automatically modifies the referencing table/rows based on the change.
- A change the violates a constraint is rejected if there is not a supporting CASCADE definition.

### 9. <div style="margin-left: 70px;">Write SQL DDL statements that implement the following Crow’s Foot diagram.<br> You can assume that all data types are text. <br>We focusing on your understanding of concepts. <br>We are not focusing on memorization of SQL and perfectly following the syntax. <br>Place your DDL on the next page.<\div>


<img src="./ER-Diagram.jpg" width="600px;">


Some specific things to consider and which were a focus:
- Despite the question stating the all data types can be text, we expected you to know that changing the type from text is necessary for keys. You should at least have added a statement/comment to that effect.
- Specifying columns to be NULL or NOT NULL is necessary for correct implementation. For example,
    - The ```Comment -> Customer``` relationship is mandatory/exactly one (double line) $\Rightarrow$ ```NOT NULL.```
    - The ```Comment -> SalesRep``` relationship is optional/0 or 1 (circle line) $\Rightarrow$ ```NULL.```
- ```CustomerSalesRep``` primary key is composite, i.e. ```(customer_ID, sales_rep_ID).```

For completeness and reference, my answer is below. _We did not expect yours to be this complete._ This is for reference. I would make several other changes/improvements in a real implementation.

In [13]:
%%sql

drop schema if exists w4111_f24_final_answers;
create schema w4111_f24_final_answers;
use w4111_f24_final_answers;


/*
    Note that you need to drop the tables that reference other tables first, and
    possibly in order. Otherwise, you get foreign key errors. That is not an issue
    in this case because I drop the scheme above.
*/
drop table if exists comment;
drop table if exists customer_sales_rep;
drop table if exists customer;
drop table if exists sales_rep;


create table customer
(
   /*
    Despite stating that the columns can be text, I expect the students to know that
    they need to change the type for keys.
    */
   ID varchar(32) primary key,
   last_name text,
   first_name text
);


create table sales_rep
(
   ID varchar(32) primary key,
   last_name text,
   first_name text
);


create table customer_sales_rep
(
    customer_ID varchar(32),
    sales_rep_ID varchar(32),
    start_date text not null,
    end_date text,
    
    /*
        The fact that in this type of associative entity, at least the customer_id and sales_rep_id
        are part of the primary key is critical. If I had stated or you assumed that we tracked the relationship
        over time, you would need to add something else like the start date or a sequence number.
    */
    primary key (customer_ID, sales_rep_id),
    constraint customer_fk foreign key (customer_ID) references customer(ID),
    constraint sales_rep_fk foreign key (sales_rep_ID) references sales_rep(ID)
);


create table comment
(
   ID varchar(32) primary key,


   /*
    There is a little "gotcha" here. The Crow's Foot diagram indicates that the one
    of the foreign keys cannot be NULL and one must be NULL. NULL is how you would implement
    0-1. NOT NULL is how you implement exactly 1.
    */
   customer_ID varchar(32) not null,
   sales_rep_ID varchar(32) null,
   comment_value text,
   constraint customer_fk_2 foreign key (customer_ID) references customer(ID),
   constraint sales_rep_fk_2 foreign key (sales_rep_ID) references sales_rep(ID)
);


 * mysql+pymysql://root:***@localhost
4 rows affected.
1 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.


[]

### 2.3 SQL, 16 points each

__11.	Consider the following subset of the IMDB schema shown in the ER diagram and DDL.__

<img src="imdb.jpg" width="700px;">

```
create table if not exists w4111_f24_final.name_basics
(
    nconst         text         null,
    primaryName    text         null,
    knownForTitles text         null
);

create table if not exists w4111_f24_final.title_basics
(
    tconst        text null,
    primary_title text null
);


create table if not exists w4111_f24_final.title_principals
(
    nconst text null,
    tconst text null
);
```

You can assume that the following tables contain representative data. That is, the values indicate the size, type, and content of the columns. 

The fields are the following:
- nconst is a string that is a primary key identifying a row in name_basics.
- primaryName is a string of the form “first_name  last_name.” You can assume that the strings always contain a first name, ‘ ‘, and last name.
- tconst is a string that is a primary key identifying a row in title_basics.
- primary_title is string representing the primary title of the film, episode, etc.
- The table title_principals “connects” name_basics and title_basics entires.
- knownForTitles is a comma delimited string containing 0, 1, 2, 3 or 4 tconst values for the titles for which the person is best known.

<img src="imdb_data.jpg" width="700px;">

__Please list below what changes you would make to the schema to make it better and why? (4 points)__ 

There are a lot of possible changes/improvements. My major changes would be:
- Modifying the schema to have primary and foreign keys.
- ```name_basics.primaryName``` is clearly non-atomic and is a composite attribute. I would change this to two atomic domains, ```first_name, last_name.```
- The really tricky one is knownForValues. This attribute is clearly multi-valued. More subtlety, this is actually an attribute of the many-to-many relationships in ```title_principals.``` That is, a ```name_basics``` is related to many ```title_basics``` and is "known for" some of them.
- I would also add an index on some columns.

__Write the new DDL statements for the schema based on your changes. (12 points)__

In [14]:
%%sql

drop schema if exists w4111_f24_final_answers_imdb;
create schema w4111_f24_final_answers_imdb;
use w4111_f24_final_answers_imdb;

create table if not exists name_basics
(
    nconst     varchar(16)  not null
        primary key,

    /*
        I would make last_name not null. Yes, this might offend Zendaya or Bono, but in fact these
        people do have real names. First name could also be not null.

        The key concept is normalizing the primaryName.
    */
    last_name  varchar(128) not null,
    first_name varchar(128) null,

    /*
     This is totally cool but not necessary.
     */
    primary_name VARCHAR(512) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) STORED
);

/*
    Note that nconst should also have an index, but this happens automatically because of the
    primary key declaration.
*/
create index name_basics_last_name_idx
    on name_basics (last_name);

/*
    As long as you have one index, we would accept the answer.
*/
create index name_nasics_first_name_idx
    on name_basics (first_name);


create table if not exists title_basics
(
   tconst        varchar(16) primary key,


   /* This should not be a TEXT column and probably should not be NULL. */
   primary_title varchar(512) not null
);

/*
    This is totally cool and not expected, but I do want you to know that most databases
    have some form of text search and indexing. Sophisticated full text search is almost
    always done with a separate text search engine. however.
*/
create fulltext index title_basics_primary_tile_idx on title_basics(primary_title);

/*
    This one was tricky. But, I repeatedly stressed the concept of attributes on associative entities.
    I also EXPLCITLY said that I would do it this way in a lecture.
*/
create table if not exists title_principals
(
    nconst      varchar(16) not null,
    tconst      varchar(16) not null,

    /* This was the tricky bit. */
    isKnownFor  boolean not null default FALSE,

    /*
        This is also important. The primary key is composite.
    */
    primary key (nconst, tconst),

    foreign key to_name_basics (nconst) references name_basics(nconst),
    foreign key to_title_basics (tconst) references title_basics(tconst)
);

/*
    This is important because the primary key on (nconst, tconst) IS NOT also an index on just tconst.
    I stressed this several times in class and examples.
*/
create index title_principals_tconst_idx
    on title_principals (tconst);



 * mysql+pymysql://root:***@localhost
3 rows affected.
1 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.


[]