# Case Study: Books Database on AWS RDS

## MySQL bookdb
### Author: Dr. Esma Yildirim

In this case study, we are going to create a `mysql` database instance using `RDS` service and another instance to connect to it using `EC2` service. 

Additional materials: 
- authors.csv
- titles.csv
- author_ISBN.csv

### Step1: Create two security groups.

<b>Instructions for Sandbox</b>
- Create two security groups under Work VPC.

- rds-security-group: will be assigned to RDS instance. This security group must have an inbound rule allowing MySQL/Aurora connection on port 3306. It should be set so that the security group of the EC2 instance (rds-ec2-security) can connect to it to ensure security. 

- rds-ec2-security-group: will be assigned to EC2 instance. This security group must have two inbound rules. The first one must allow SSH connection on port 22 from MyIP only. The second rule must allow MySQL/Aurora connection on port 3306 to only rds-security-group. 

This way, we can connect to EC2 instance from outside and the EC2 instance can connect to RDS instance only. 

### Step2: Create another subnet 

- Under Work VPC create another subnet with CIDR 10.0.1.0/24 in availability zone us-east1b. 

### Step3: Create RDS database. 

- Go to Subnet Groups under RDS. Create a subnet group with all the subnets of Work VPC. 
- Create a database with database instance id, username and password. If you want to create a database, it can be done if you give a database name.
- The VPC, subnet and availability zones must the the same as EC2 instance. 
- The database name: bookdb
- Assign rds-security-group as the security group of the instance. 

### Step4: Create EC2 instance.
- Use the default parameters to create an EC2 instance. 
- Assign rds-ec2-security-group as the security group of the instance. 

### Step5. Connect to EC2 instance using SSH with your pem file. Then use the following command to connect to RDS instance from your EC2 instance:

```
mysql -u admin -p -h mydbhost.cy86rwnhmv8p.us-east-1.rds.amazonaws.com
```
Here admin is the username, password will be entered after the command is executed. hostname is available on RDS console after the database is created. 

### Step6. Create the database tables using the following sql commands. 

In this `bookdb`, we are going to store authors and book titles. We need to define the relationship model between them.

![BookDB](er.jpeg)

An author can have multiple books while a book can have multiple authors. Therefore the relationship between `authors` and `titles` tables is `n-to-n`. If that is the case, another table must be created which will include the primary keys of both tables as foreign keys. Together they become the primary key of the third table, in this case: `author_ISBN` table. 

Use the following commands to change the database name and create authors table. The authors table has `id`, `first` and `last` name as fields. `id` is the primary key and can be created using auto_increment feature. All fields are `NOT NULL` because you cannot leave them empty. 
```
USE bookdb
DROP TABLE IF EXISTS author_ISBN;
DROP TABLE IF EXISTS titles;
DROP TABLE IF EXISTS authors;

CREATE TABLE authors (
    id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
    first TEXT NOT NULL,
    last TEXT NOT NULL
);

```


#### Exercise: 

Create titles table. This table has isbn, title, edition, copyright as fields. isbn is the primary key and has VARCHAR(255) as data type. Title is TEXT, edition is INTEGER and copyright is TEXT. Because isbn is primary key we cannot use TEXT as its type and VARCHAR must be used. All fields are NOT NULL. 


The following sql code is for author_ISBN. Run it against mysql database. 

```
CREATE TABLE author_ISBN (
    id INTEGER NOT NULL,
    isbn VARCHAR(255) NOT NULL,
    PRIMARY KEY (id, isbn),
    FOREIGN KEY (id) REFERENCES authors(id) ON DELETE CASCADE, 
    FOREIGN KEY (isbn) REFERENCES titles (isbn) ON DELETE CASCADE
);

```

### Step 6. To upload data, there are two ways. The first way is to execute the following sql commands:

```
INSERT INTO authors (first, last)
VALUES 
    ('Paul','Deitel'), 
    ('Harvey','Deitel'),
    ('Abbey','Deitel'), 
    ('Dan','Quirk'),
    ('Alexander', 'Wald');

INSERT INTO titles (isbn,title,edition,copyright)
VALUES
    ('0135404673','Intro to Python for CS and DS',1,'2020'),
    ('0132151006','Internet & WWW How to Program',5,'2012'),
    ('0134743350','Java How to Program',11,'2018'),
    ('0133976890','C How to Program',8,'2016'), 
    ('0133406954','Visual Basic 2012 How to Program',6,'2014'),
    ('0134601548','Visual C# How to Program',6,'2017'),
    ('0136151574','Visual C++ How to Program',2,'2008'),
    ('0134448235','C++ How to Program',10,'2017'),
    ('0134444302','Android How to Program',3,'2017'),
    ('0134289366','Android 6 for Programmers',3,'2016');

INSERT INTO author_ISBN (id,isbn)
VALUES
    (1,'0134289366'),
    (2,'0134289366'),
    (5,'0134289366'),
    (1,'0135404673'),
    (2,'0135404673'),
    (1,'0132151006'),
    (2,'0132151006'),
    (3,'0132151006'),
    (1,'0134743350'),
    (2,'0134743350'),
    (1,'0133976890'),
    (2,'0133976890'),
    (1,'0133406954'),
    (2,'0133406954'),
    (3,'0133406954'),
    (1,'0134601548'),
    (2,'0134601548'),
    (1,'0136151574'),
    (2,'0136151574'),
    (4,'0136151574'),
    (1,'0134448235'),
    (2,'0134448235'),
    (1,'0134444302'),
    (2,'0134444302');

```

#### Exercise: Pick three of your books. Find out the necessary information from the internet and insert them into the tables. 



The second way is to upload from .csv files. Use the following sql command to upload each file: 

```
load data local infile 'authors.csv' into table authors fields terminated by ',' ignore 1 lines;
load data local infile 'titles.csv' into table titles fields terminated by ',' ignore 1 lines;
load data local infile 'author_ISBN.csv' into table author_ISBN fields terminated by ',' ignore 1 lines;
```

The .csv files were extracted from a sqlite database using the syntax:
```
>sqlite3 books.db
sqlite> .headers on
sqlite> .mode csv
sqlite> .output titles.csv
sqlite> SELECT isbn,
   ...>        title,
   ...>        edition,
   ...>        copyright
   ...>   FROM titles;
sqlite> .quit

```


### Step7. Run the following Python code on EC2 instance to execute some SQL queries. Make sure you install pymysql and pandas modules with

```
pip install pandas
pip install pymysql
```


In [1]:

# Connecting to the Database in Python

import pymysql

connection = pymysql.connect(host= 'mydbhost.cy86rwnhmv8p.us-east-1.rds.amazonaws.com',
                             user= 'admin', password='', database='bookdb') # fill the password value

# Viewing the authors Table Content
import pandas as pd

pd.options.display.max_columns = 10

df = pd.read_sql('SELECT * FROM authors', connection,
            index_col=['id'])
print(df)

#EXERCISE select all fields from titles and author_ISBN Table


# SQL Keywords

# 18.3.2 SELECT Queries
df = pd.read_sql('SELECT first, last FROM authors', connection)
print()
print(df)


##EXERCISE: 
#Select title and copyright from titles table and list them.


# 18.3.3 WHERE Clause
df = pd.read_sql("""SELECT title, edition, copyright
                FROM titles 
                WHERE copyright > '2016'""", connection)
print()
print(df)

##EXERCISE :Select the author whose first name is 'Paul'

# Pattern Matching: Zero or More Characters 
df = pd.read_sql("""SELECT id, first, last
                FROM authors 
                WHERE last LIKE 'D%'""", 
             connection, index_col=['id'])
print()
print(df)
            
# Pattern Matching: Any Character
df = pd.read_sql("""SELECT id, first, last
                FROM authors 
                WHERE first LIKE '_b%'""", 
             connection, index_col=['id'])
print()
print(df)

#EXERCISE: Select titles that starts with "V" and ends with "m"


# 18.3.4 ORDER BY Clause
df = pd.read_sql('SELECT title FROM titles ORDER BY title ASC',
             connection)
print()
print(df)
# Sorting By Multiple Columns
df = pd.read_sql("""SELECT id, first, last
                FROM authors 
                ORDER BY last, first""", 
             connection, index_col=['id'])
print()
print(df)
df = pd.read_sql("""SELECT id, first, last
                FROM authors 
                ORDER BY last DESC, first ASC""", 
             connection, index_col=['id'])
print()
print(df)

## EXERCISE: Select titles and editions and copyright and order alphabetically title and descending order of edition.


# Combining the WHERE and ORDER BY Clauses
df = pd.read_sql("""SELECT isbn, title, edition, copyright
                FROM titles
                WHERE title LIKE '%How to Program'
                ORDER BY title""", connection)
print()
print(df)
# 18.3.5 Merging Data from Multiple Tables: INNER JOIN
df = pd.read_sql("""SELECT first, last, isbn
                FROM authors
                INNER JOIN author_ISBN
                    ON authors.id = author_ISBN.id
                ORDER BY last, first""", connection).head()
print()
print(df)

# Exercise: Select ids of authors who has copyright years as 2017


# 18.3.6 INSERT INTO Statement
cursor = connection.cursor()

cursor.execute("""INSERT INTO authors (first, last)
                            VALUES ('Sue', 'Red')""")

df = pd.read_sql('SELECT id, first, last FROM authors',
             connection, index_col=['id'])
print()
print(df)
# Note Regarding Strings That Contain Single Quotes

# 18.3.7 UPDATE Statement
#cursor = connection.cursor()
cursor.execute("""UPDATE authors SET last='Black'
                            WHERE last='Red' AND first='Sue'""") 

print()
print(cursor.rowcount)

df = pd.read_sql('SELECT id, first, last FROM authors',
             connection, index_col=['id'])
print()
print(df)

# Exercise: Update the first name of the above author to SueAllen.

# 18.3.8 DELETE FROM Statement
#cursor = connection.cursor()
cursor.execute("""DELETE FROM authors where first='Sue' """) 

print(cursor.rowcount)

df = pd.read_sql('SELECT id, first, last FROM authors',
             connection, index_col=['id'])
print()
print(df)

#Exercise : delete the authors where last name is Deitel

# Closing the Database
connection.close()

# SQL in Big Data


ModuleNotFoundError: No module named 'pymysql'

*The books dataset is from the repository: https://github.com/pdeitel/IntroToPython
