<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Session 3 - Managing Data

## Part 2 - Databases

- learn about relational databases
- learn about non-relational databases and when they're more appropriate
- practise getting data out of relational databases using SQL

## What is a database?

What can a database do?

- **store** data

- **retrieve** data

- describe the structure of our data

- manage user permissions

## Relational vs. non-relational

### Relational databases

Describe **entities** in our data,

which are organised into **tables**,

where **rows** are individual observations,

and properties of our entities are stored as **columns**

The description of a database is its **schema**.

Example:

Imagine we made a dataset of everyone in this room and wanted to store:

- name
- contact details
- favourite books

To store name and contact details, what would be the:

- rows?

- columns?

Examples of **R**elational **D**ata**b**ase **M**anagement Systems (RDBMS):

- MySQL (free, open source)
- SQLite (free, open source)
- Microsoft SQL Server (proprietary)
- Oracle DB2 (proprietary)

### Key-value stores

An alternative to having "entities" is to store key-value entries.

Similar to a massive Python dictionary.

**Advantages**

- super fast to read entries
- massively scalable
- used for caching things that are accessed frequently

**Disadvantages**

- no structure (must come from business logic)
- slow to write to

Examples:

- Oracle NoSQL Database
- Redis
- Memcached
- "Project Voldemort"

### NoSQL

- entities stored as "documents" (typically in JSON)
- relationships are described **within** entities

```json


{
    "user_1":
    {
        "name": "Sally",
        "favourite_books": ["Moby Dick", "Alice in Wonderland"]
    }
}

```

**Advantages**:

- fast to read
- self-contained information
    - no need to join multiple entity lists together

**Disadvantages**:

- each document/entity could have a different schema!
- less mature than relational databases

Examples:

- MongoDB
- CouchDB

Relational databases also describe **relationships** between entities (e.g. people and books)

## Relationships

### One-to-one

- each record is only related to another one

- each person has exactly **one** favourite book

- **and** two people can't have the same favourite book

- this is enforced by business rules, **not** the database

### One-to-many

- on one side each record is only linked to another one

- each person has exactly **one** favourite book

- **but** favourite books can be shared (think "**one** book to **many** people")

### Many-to-many

- there can be multiple links between entities

- people can have many favourite books

- favourite books can be shared

What if we had a dataset of people and we wanted to store who their mother was?

- if we have a **person** table
- and a **mothers** table
- what is the relationship between these tables?

**One-to-many** (one mother to many possible children)

Relationships are governed by columns that are designated as **keys**

### Primary keys

- these are columns that **uniquely identify** a record

- usually called something like "ID" and **numeric** (but they don't have to be)

- can be made up of multiple columns

### Foreign keys

- a column that identifies a record **in another table**

- a foreign key has to correspond to another table's **primary key**

![](assets/db/example_tables.png)

## One-to-many

![](assets/db/example_tables_onetomany.png)

## Many-to-many

![](assets/db/example_tables_manytomany.png)

# SQL for Data Science

Remember the Data Science workflow:

![](../01_welcome_to_data_science/assets/Data-Framework-White-BG.png)

Relational databases are often where business data is kept

SQL helps us in the **Preparation** stage to actually **acquire** data

# Additional content

### Learn SQL

[W3 schools](https://www.w3schools.com/sq)

### SQLite clients

To browse SQLite files (and test SQL queries):

- https://plot.ly/free-sql-client-download/
- https://github.com/sqlitebrowser/sqlitebrowser

### SQLite and Python

http://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html