# Introduction to Databases

In this chapter, we won't work directly with SQL but will more talk about the databases in general, and start to look at SQL in particular.

To experiment with the data used for the examples, I downloaded and installed MySQL, and loaded the sakila example database.

## Definition of Database

A database is a set of related information. In a database, we would like to
* find information quickly
* find information easily
* be able to update it easily


### Types of Database Systems

- Nonrelational Database Systems: An example of a nonrelational database is a hierarchical database structured like a tree. This could be a bank account organised by name $\to$ accounts $\to$ transactions $\to$ etc. Each node may have either zero or one parent and zero, or more children. Another common approach is the network database system which consists of nodes and sets of links that define relationships between the nodes. This multiparent hierarchical database allows you to access a certain node from multiple places.

- The Relational Model: In the relational model, data is respresented as sets of tables. Instead of using pointers to navigate, redundant data is used to link records in different tables. The structure is as follows:
    1. Primary Keys: Each table includes information that uniquely identifies a row in that table (*primary key*) as for example a user ID. One could also use a combination two or more entries as the primary key (*compound key*). Choosing customer names as primary keys would be referred to natural keys, whereas customer IDs are *surrogate keys*. Which one to choose depends on the case. But it has to be one that doesn't change in the future.
    2. Foreign Keys: Additional information in a table that is used to navigate to another table is the earlier mentioned "redundant data". For example, the next table could include the customer ID to navigate back to the customer. Or it could include a product abbreviation to navigate to a product that further defines the customer's entity. These are called *foreign keys*. Redundant data that refers to other places should also be permanent. You don't want to change a foreign key in every place of your data.
    
Note: A single column shouldn't contain multiple pieces of information. For example, an address should separately store street, city, state, zip code, or a name should store first and last name separately.


### Terminology

| Term        | Definition                       |
|-------------|----------------------------------|
| Entity      | Sth. of interest to the database user community. <br>Example: customers, geographic locations, etc.             |
| Column      | An individual piece of data stored in a table.             |
| Row         | A set of columns that together completely describe an entity or some action on an entity. <br> Also called a record.             |
| Table       | A set of rows, held either in memory (nonpersistent) <br> or on permanent storage (persistent).            |
| Result set  | Another name for a nonpersistent table, <br> generally the result of an SQL query.             |
| Primary key | One or more columns that can be used as a unique identifier for each row in a table.|
| Foreign key | One or more columns that can be used together to identify a single row in another table.             |

### SQL Statement Classes

* **SQL schema statements**: used to define the data structures stored in a database, for example
``` 
CREATE TABLE corporation
    (corp_id SMALLINT,
    name VARCHAR(30);
    CONSTRAINT pk_corporation PRIMARY_KEY (corp_id)
    );
```
creates a table with two columns, `corp_id` and `name`, with the `corp_id` column identified as the primary key for the table. All database elements created via SQL schema statements are stored in *data dictionaries*.
* **SQL data statements**: used to manipulate the data structures previously defined using a SQL schema statement. For example
``` 
INSERT INTO corporation (corp_id, name)
VALUES (27, 'Acme Paper Corporation');
```
adds a row to the corporation table with a value of 27 for the `corp_id` column and a value of `Acme Paper Corporation` for the `name` column. Another example is the `select` statement to retrieve the data that was just created:
```
    mysql< SELECT name
        -> FROM corporation
        -> WHERE corp_id = 27;
    +----------------------------+
    | name                       |
    +----------------------------+
    | Acme Paper Corporation     |
    +----------------------------+
```
This repository (and the book it is based on) mostly concentrate on the SQL data statements.
* **SQL transaction statement**: used to begin, end, and roll back transactions


#### Summary:

| **Category**              | **Description**                                         | **Examples**                              |
|--------------------------|---------------------------------------------------------|--------------------------------------------|
| **Schema Statement**     | Defines/modifies database structure (DDL)               | `CREATE TABLE`, `ALTER TABLE`, `DROP`      |
| **Data Statement**       | Manipulates or queries data (DML)                       | `SELECT`, `INSERT`, `UPDATE`, `DELETE`     |
| **Transaction Statement**| Controls transaction boundaries (TCL)                   | `BEGIN`, `COMMIT`, `ROLLBACK`              |

### SQL: A Nonprocedural Language

- In a **procedural language**, you are in complete control of what the program does. It defines both the desired results and the mechanism, or process, by which the results are generated. 
- In a **nonprocedural language**, the desired results are also defined, but the process by which the results are generated is left to an external agent. The manner in which a statement is executed is left to the *optimizer*. 

SQL is a nonprocedural language and, hence, you will not be able to write complete applications. So you will need to integrate SQL with your favorite programming language. If you're using python, you will need a toolkit called `Python DB` to execute SQL statements from your code. 

**Check Python setup**

The book uses the `mysql` command-line tool to run the examples and format the results.

### SQL Examples

An SQL statement that would return all transactions against George Blake's checking account would look like:

```
    SELECT t.txn_id, t.txn_type_cd, t.txn_date, t.amount
    FROM individual i
        INNER JOIN account a ON i.cust_id = a.cust_id
        INNER JOIN product p ON p.product_cd = a.product_cd
        INNER JOIN transaction t ON t.accound_id = a.account_id
    WHERE i.fname = 'George' AND i.lname = 'Blake'
        AND p.name = 'checking account';
        
    +--------+--------------+---------------------+-------------+
    |txn_id  | txn_type_cd  | txn_date            | amount      |
    +--------+--------------+---------------------+-------------+
    | 11     | DBT          | 2008-01-05 00:00:00 | 100.00      |
    +--------+--------------+---------------------+-------------+
    1 row in set (0.00 sec)
```
The query identifies the row in the `individual` table for George Blake and the row in the `product` table for the "checking" product, finds the row in the `account` table for this individual/product combination, and returns four columns from the `transaction` table for all transactions posted to this account. 
Another way of doing it is
```
    SELECT t.txn_id, t.txn_type_cd, t.txn_date, t.amount
    FROM account a 
        INNER JOIN transaction t ON t.account_id = a.account_id
    WHERE a.cust_id = 8 AND a.product_cd = 'CHK';
```
if you know that George Blake's customer ID is 8, and that checking accounts are designated by the code `'CHK'`. In that case, you simply find George's Blake's checking account in the `account` table based on the customer ID and use the account ID to find the appropriate transaction.

The general structure of the above statements is given by three different clauses
```
    SELECT /* one or more things */ ...
    FROM /* one or more places */ ...
    WHERE /* one or more conditions apply */ ...
```
The general logic is:
1. Determine which table or tables will be needed -> add them to your `from` clause.
2. Add conditions to your `where` clause to filter out the data from these tables that you aren't interested in.
3. Decide which columns from the different tables need to be retrieved -> add them to your `select` clause. 

Example: Find all customers with the last name "Smith":
```
    SELECT cust_id, fname
    FROM individual
    WHERE lname = 'Smith';
```
This query searches the `individual` table for all rows whose `lname` column matches the string `'Smith'` and returns the `cust_id` and `fname` columns from those rows.

If you would like to populate and modify your database, you would use statements like this
```
    INSERT INTO product (product_cd, name)
    VALUES ('CD', 'Certificate of Depysit')
```
to insert a new row into the `product` table, and
```
    UPDATE product
    SET name = 'Certificate of Deposit'
    WHERE product_id = 'CD';
```
since you  (!) misspelled "Deposit". `Where` statements are very important since you only want to modify certain rows. When executing an SQL data statement you will receive feedback about how many rows were
* Returned by your `select` statement
* Created by your `insert` statement
* Modified by your `update` statement
* Removed by your `delete` statement

The feedback is particuarly important when using the `delete` statement!

### About MySQL

MySQL is a free, open-source database server and will be used in the following with all its examples.  