# Intro to Relational Databases

### Introduction

When we say relational databases, we mean databases like SQL, or Postgres.  Relational databases are used to store and retrieve data, and to do so quickly.  So far, we have used simply used CSV to store or data, or perhaps in a past life we have some experience with Microsoft Excel.  These are good tools, but when the size of our data increases and as the questions we have about the data become more complex, we need to move to relational databases like SQL.

### SQL vs Excel

Now one way to learn about SQL is to compare it to some software that we have already used in our non-coding life: namely a spreadsheet like Microsoft Excel or a Google spreadsheet.  

> Now your not familiar with spreadsheets, that's ok -- we'll explore the concepts that we'll need to know.  

A spreadsheet is good for organizing, storing, and asking questions about our data.  Let's get started by using a Google spreadsheet to organize some information.  Imagine that we run a barber shop, and we want to use a Google doc to help us keep track of our customers and employees.  To do so, we created the following spreadsheet.

<img src="./barbershop.png" width='80%'/>

<img src="table-names.png" width="40%">

At the very bottom of the Google spreadsheet, you can see that the first sheet in the file is for storing information about Employees and the second stores information about Customers.

Now a lot of the components that we see in the Google spreadsheet we'll also see in SQL.

* Table
    * The `Employees` spreadsheet is similar to a table in a database.  A table stores information about just a single entity.  So for example, we have separate tables for `Customers` and `Employees`.  We'll discuss how to know when to separate data into multiple tables in future lessons.
    
* Columns
    * The table above has columns of `Name`, `Phone Number`, and `Email`.  In a database, each table will also have columns used to store different attributes about our data.
    
* Rows
    * We see each individual `employee` is stored in a separate row.  It will be the same in SQL.  For each individual *member* of a table, we will have a separate row, and each attribute of that row is in a column. 
    
* Document Name
    * Finally, notice that our Google document has a name of `Barbershop` at the top.  This document holds separate spreadsheets about employees and customers.  Similarly, we will create SQL database named `Barbershop` that will hold our tables of `employees` and `customers`.

### Get started with SQL

There is various relational database software that we can use: Postgres, SQL, or SQLite.  They all work similarly.  So we'll get started with SQLite3 as it's lightweight and easy to set up.

If we have a Mac, we can install SQLite3 with the following:

`brew install sqlite3`

So now that we have installed the SQLite3 software, the next step is create a database.

![](./create-database.png)

In [None]:
sqlite3 barbershop < create_employees.sql

Now a database only makes sense if we have at least one table.  Let's create a table.

```sql
CREATE TABLE employees (name TEXT, phone_number TEXT, email TEXT);
```

Creating a table involves a few different components:

* CREATE TABLE table_name
    * Let's look at the first part of the statement "CREATE TABLE employees". "CREATE TABLE" is the SQL command used to create a new table in the database. The term that follows will be the name of the table, in this case "employees".

* Each column name followed by the data type
    * The second part of the statement, the terms enclosed in the parentheses, concerns the columns we want to have in our new table. Included is both the name of the column and the type of data (explained below) we want to be stored in that column. In our example, the first column is called "name" and in that column we want to store data of the type "TEXT". We include in the parentheses all of the columns we want included in the table, seperated by commas.

#### Data Types
Two of the most common data types are Integer and Text. You can view a list of other common data types [here](https://www.w3schools.com/sql/sql_datatypes.asp). When determining what data type to use for a column, we need to think not just about what the data represents (i.e. a number), but also how we may want to manipulate and organize that data. Our name column really only has one data type that would work, TEXT. But it isn't always this simple. For example, our column phone_number is being stored as TEXT, but it could also be stored as an integer. Storing data as an integer could be helpful if we wanted to compare the size of integers, or if we wanted to perform certain mathematical calculations on the data (i.e. the mean). Since we don't compare phone numbers in that way (we probably aren't interested in the average phone number or the sum of phone numbers), in this case, it makes more sense to store it as text.

#### Primary Keys
What happens when two rows in a table have the same exact records in each of the columns? How do we differentiate between the two or more duplicates?


```sql
CREATE TABLE employees (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, phone_number TEXT, email TEXT);
```


You'll notice the above CREATE TABLE statement includes one new column at the beginnning: 


`id INTEGER PRIMARY KEY AUTOINCREMENT, `

A primary key is a field in a table which uniquely identifies each row/record in a database table. Primary keys must contain unique values. When we include id INTEGER PRIMARY KEY AUTOINCREMENT, we are inlcuding a column that will act as a primary key for each row in the table. AUTOINCREMENT just means that SQL will automatically update each new row in the table with the next available primary key. This will become more clear later on, when we begin to add data to our tables


### Creating The Employees Table

`sqlite3 barbershop.db < create_employees.sql`

When we run the above on the command line, we are telling SQLite3 to run the SQL statements in the create_employees.sql file in our barbershop database. In this case, creating our employees tabe:
```sql
CREATE TABLE employees (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, phone_number TEXT, email TEXT);
```

Our table is now stored in the barbershop.db file. We can check on this by running the following in the command line:

`sqlite3 barbershop.db`

`.tables`

If successful, you should receive a response with the table names.

Running `.schema` in place of `.tables`, should return the schema of the tables in the datatbase.

As our database can store multiple tables, lets add a table for customers as well.

### Conclusion

At the beginning of the lesson we talked about some of the key concepts of SQL databases. Databases are made up of **tables** which store information about a single entity. These tables are made up of **columns** that store different attributes about our data. **Rows** in the table represent individual members of a table. For each individual member of a table, we will have a separate row, and each attribute of that row is in a column. 
In our example, the barbershop database has a table called `employees`. The columns of our `employees` table are attributes of the employees, like name and phone number. 
Each row of our `employees` table will represent one member of this entity. In our next lesson, we will learn how to insert the data, or rows, into the table.