# How to create and modify databases

We'll be working with the SQLite shell — a command-line interface to interact with SQLite.
To launch the SQLite shell, you use the `sqlite3` command followed by the name of the database file as an argument.
` sqlite3 chinook.db`

When you launch the SQLite shell, you will be shown the SQLite prompt. Let's see what that looks like when we open our `chinook.db` file:

If you write a query, you'll get the response displayed in your console. If you press enter, the SQLite prompt will change to ...> and you can continue writing your query on multiple lines. As a result, unlike any of the places you've written SQL queries so far, including a semicolon `(;)` is necessary in the SQLite shell. If you don't use it, the shell will not know if you are finished writing your query. Let's write a query to look at one of our tables from the Chinook database.


In [6]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

In [7]:
%%sql
SELECT track_id, name, album_id 
FROM track 
WHERE album_id = 3;

 * sqlite:///chinook.db
Done.


track_id,name,album_id
3,Fast As a Shark,3
4,Restless and Wild,3
5,Princess of the Dawn,3


The first thing you may notice is that we don't have the column names displayed. SQLite has a number of dot commands which you can use to help you work with databases. When you use a `dot command`, you don't need to use a semicolon. One that you'll want to use often is `.headers on`, which switches column headers on. Let's see what the output of our query looks like after we turn column headers on:

```
sqlite> .headers on
sqlite> SELECT
   ...>   track_id,
   ...>   name,
   ...>   album_id
   ...> FROM track
   ...> WHERE album_id = 3;
track_id|name|album_id
3|Fast As a Shark|3
4|Restless and Wild|3
5|Princess of the Dawn|3
```

The next thing is that it's hard to read down the columns, since they don't line up. There's another dot command, `.mode`, that will help us out here. The `.mode` dot command allows us to select from a few different display modes. We'll use `.mode column` to allow for easier to read outputs. Here's the output of our query after switching to column mode:

```
sqlite> .mode column
sqlite> SELECT
   ...>   track_id,
   ...>   name,
   ...>   album_id
   ...> FROM track
   ...> WHERE album_id = 3;
track_id  name                  album_id
--------  --------------------  --------
3         Fast As a Shark       3
4         Restless and Wild     3
5         Princess of the Dawn  3
```

There are several other dot commands you'll use often:

- `.help` - Displays help text showing all dot commands and their function.
- `.tables` - Displays a list of all tables and views in the current database.
- `.shell [command]` - Run a command like `ls` or `clear` in the system shell.
- `.quit` - Quits the SQLite shell.

```
sqlite> sqlite3 chinook.db
   ...> .tables  # display tables
   ...> .headers on
   ...> .mode column
   ...> SELECT
   ...> track_id,
   ...> name,
   ...> album_id
   ...> FROM track
   ...> WHERE album_id = 3;
 ```

## Creating Tables

```
CREATE TABLE [table_name] (
    [column1_name] [column1_type],
    [column2_name] [column2_type],
    [column3_name] [column3_type],
    [...]
);
```
Each column in SQLite must have a type. While some database systems have as many as 50 distinct data types, SQLite uses only 5 behind the scenes:

- `TEXT`
- `INTEGER`
- `REAL`
- `NUMERIC`
- `BLOB`

If you have any experience with other database systems, you might be familiar with other types such as `VARCHAR`, `REAL`, and `DATETIME`. SQLite accepts most common types in a `CREATE` statement, but behind the scenes will convert them to one of its 5 base types.

The table below shows each of the types, along with examples of data commonly stored in the type, and some 'equivalent' types from other database systems. If you're not familiar with these other types, don't be concerned - we'll cover them in some more detail in a later course.

[Untitled](https://www.notion.so/0d3b4332e06a493a893b53015e285174)

Just like with views, if you try to create a table that already exists you will get an error. If you make a mistake when you create a table, you can use the `DROP` statement to remove the table so you can create it again:

`DROP TABLE [table_name];`

You can also use the SQLite dot command `.schema [table_name]` to view the schema for a table you have just created to check where you might have gone wrong.

To practice, we'll create a new table in a new database file. If you launch the SQLite shell with the argument of a filename that doesn't exist, SQLite will create an empty database with that filename.


## Primary and Foreign Keys

We have been using schema diagrams to identify the relationships between tables. Below is an excerpt of the schema diagram for the Chinook database which shows the relationship between the `invoice` and `invoice_line` tables:

![https://s3.amazonaws.com/dq-content/192/chinook_pk_fk.svg](https://s3.amazonaws.com/dq-content/192/chinook_pk_fk.svg)

We previously learned that each table has one or more columns shaded in yellow, which indicates they are the **primary key**. A primary key is a unique identifier for each row - you cannot have two rows in a table with the same value for the primary key column(s).

When two tables have a relation, there will be a column in one table that is a primary key in another table. For example, in the `invoice_line` table, the `invoice_id` column is the primary key from the `invoice` table. This is known as a **foreign key**. By defining a foreign key, our database engine will prevent us from adding rows where the foreign key value doesn't exist in the other table, which helps to prevent errors in our data (note that by default SQLite doesn't force foreign key constraints, however we have [changed the default](https://stackoverflow.com/a/44857286/4691920) for this mission).

Usually, a primary key is specified as part of a create statement. Once the primary key is defined, the database engine will prevent any new rows from being added to the database if they have the same primary key as any existing rows. If we wanted to re-create the table from the previous exercise with a primary key, we would use this syntax:

`CREATE TABLE user ( user_id INTEGER PRIMARY KEY, first_name TEXT, last_name TEXT);`

Let's say we wanted to create a new table `purchase` which tracks basic information about a purchase made by one of our users. Our create statement might look like this:

`CREATE TABLE purchase ( purchase_id INTEGER PRIMARY KEY, user_id INTEGER, purchase_date TEXT, total NUMERIC, FOREIGN KEY (user_id) REFERENCES user(user_id));`

By adding a `FOREIGN KEY` clause, we can define one of our columns as a foreign key and specify the table and column that it references. We're going to use what we've learned about creating tables, primary keys and foreign keys to add new tables to our Chinook database that allow customers to create "wishlists" of tracks they would like to buy.

We'll start by adding a table to store the name of the wishlist and the customer that created the wishlist. The schema is shown below:

![https://s3.amazonaws.com/dq-content/192/wishlist_1.svg](https://s3.amazonaws.com/dq-content/192/wishlist_1.svg)

```
sqlite3 chinook.db 
CREATE TABLE wishlist (
    wishlist_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    name TEXT,
    FOREIGN KEY (customer_id) REFERENCES customer(customer_id)
);

DROP TABLE wishlist;
```

## Database Normalization

When we created our `wishlist` table, we didn't include a `track_id` column to store which tracks are in the users wishlist. To understand why, let's take a look at what the table might look like if we stored all the data in a single table.

[Untitled](https://www.notion.so/960fb4c994ee4d369c1a1d0971b8fa6f)

There are are some drawbacks to storing the data this way:

- **Data Duplication** - we are storing the name of each wishlist multiple times.
- **Data Modification** - if we want to change the name of one of the wishlists, we have to modify multiple rows.
- **Data Integrity** - There is nothing to stop a row being added with the wrong wishlist name, and if that happened we wouldn't know which was the correct name.

The process of optimizing the design of databases to minimize these issues is called **database normalization**. In database normalization theory, there are several different phases of normalization, known as [normal forms](http://www.bkent.net/Doc/simple5.htm). Knowing each normal form is not as important as understanding the goals of normalization, and designing your databases to avoid data duplication and integrity issues. We'll learn more about database normalization in the next mission, however for now let's look at how we can design our wishlist tables with normalization in mind:

![https://s3.amazonaws.com/dq-content/192/wishlist_2.svg](https://s3.amazonaws.com/dq-content/192/wishlist_2.svg)

In addition to the `wishlist` table we made in the previous screen, we have added a new `wishlist_track` table and shown its relationship to the existing `track` table. The `wishlist_track` table has two columns that are both yellow to indicate that they're primary keys, since neither column will uniquely identify each row by itself. When two or more columns combine to form a primary key it is called a **compound primary key**. To create a compound primary key, you use the `PRIMARY KEY` clause:

```
CREATE TABLE [table_name]
( [column_one_name] [column_one_type], 
[column_two_name] [column_two_type],
[column_three_name] [column_three_type], [column_four_name] [column_four_type],
PRIMARY KEY (column_one_name, column_two_name));
```

Both columns in the `wishlist_track` table also have lines to indicate that they are foreign keys. To create a table with multiple foreign keys, you simply use multiple `FOREIGN KEY` clauses.

```
CREATE TABLE wishlist_track (
    wishlist_id INTEGER,
    track_id INTEGER,
    PRIMARY KEY (wishlist_id, track_id),
    FOREIGN KEY (wishlist_id) REFERENCES wishlist(wishlist_id),
    FOREIGN KEY (track_id) REFERENCES track(track_id)
);
```
## Inserting and Deleting Rows

Now that we've created the tables to hold our wishlist data, let's add some rows to those tables. To add rows to a SQL table, we'll use the `INSERT` statement:

`INSERT INTO [table_name] ( [column1_name], [column2_name], [column3_name]) VALUES ( [value1], [value2], [value3]);`

If you are inserting values into every column in a table, you don't need to list the column names:

`INSERT INTO [table_name] 
VALUES ([value1], [value2], [value3]);`

Additionally, you can insert multiple rows in a single statement:

`INSERT INTO [table_name]
VALUES ([value1], [value2], [value3]), ([value4], [value5], [value6]), [...]`

Because of our foreign key constraints, we'll need to start by adding rows to the `wishlist` table, and then add rows to the `wishlist_track` table. If we don't, our insert statement will fail. At the end, we want our `wishlist` table to contain this data:

[Untitled](https://www.notion.so/c485f1da445e4ab0af8aad44ad4dfce5)

And our `wishlist_track` table to contain this data:

[Untitled](https://www.notion.so/004879ad90204bdeab8c53c748320854)

If you make an error while inserting new rows, you'll need to use the `DELETE` statement to remove all rows:

`DELETE FROM [table_name]`

Or use it with a where statement to remove selected rows:

`DELETE FROM [table_name] WHERE [expression];`

```
INSERT INTO wishlist
VALUES
    (1, 34, "Joao's awesome wishlist"),
    (2, 18, "Amy loves pop");
INSERT INTO wishlist_track
VALUES
    (1, 1158),
    (1, 2646),
    (1, 1990),
    (2, 3272),
    (2, 3470);
    
```

## Adding Columns to a Table

We now have two tables to track our wishlist data, and have seeded some data. But what should we do when a customer wants to remove a track from their wishlist? One approach might be just to `DELETE` the row from `wishlist_track`. The downside of this approach is that we don't retain any historical data on which tracks were added to wishlists which reduces our ability to analyze this in the future.

A better approach would be to add a column that has a boolean value to show whether the row is active or not, and just change that value if the user wants to delete a track. We can do a similar thing with the wishlists themselves, so users can delete (or technically, deactivate) wishlists they no longer want to use.

We'll need to add a column to each of our tables. We can use the `ALTER` statement to do this.

`ALTER TABLE [table_name]
ADD COLUMN [column_name] [column_type];`

As we learned earlier, SQLite supports only five basic types - the closest thing to a boolean type is `NUMERIC`, where the values `1` and `0` will represent true and false respectively. Let's create active columns for both of our wishlist tables.

```
ALTER TABLE wishlist
ADD COLUMN active NUMERIC;
ALTER TABLE wishlist_track
ADD COLUMN active NUMERIC;
```

## Adding Values to Existing Rows

We've added our columns to both wishlist tables, but currently they don't have any data in them. To change values for existing rows, we use the `UPDATE` statement:

`UPDATE [table_name]
SET [column_name] = [expression]
WHERE [expression]`

The `WHERE` clause is optional, and can contain any expression that would be valid in a `SELECT` statement.

There are several variations we can use for our `SET` clause. First we can use a **single value**:

`UPDATE customer
SET phone = "+55 (12) 3921-4464"
WHERE customer_id = 1`

We can use a **subquery that returns a single value**:

`UPDATE track
SET unit_price = ( 
SELECT AVG(unit_price) 
FROM track )`

We can use a **column, or function on an existing column**:

`UPDATE track
SET unit_price = unit_price * 1.1`

Lastly, we can **set more than one column at once**:

`UPDATE wishlist_track
SET active = 1, purchased = 0;`

Because our active columns will store a `1` for true and `0` for false, we'll set the values to `1` for every row.

```
UPDATE wishlist
SET active = 1;
UPDATE wishlist_track
SET active = 1;
```

```
ALTER TABLE invoice
ADD COLUMN tax NUMERIC;
ALTER TABLE invoice
ADD COLUMN subtotal NUMERIC;
UPDATE invoice
SET
    tax = 0,
    subtotal = total;
```

In this mission, we learned:

- How to work with the SQLite shell.
- How to create new tables and assign primary and foreign keys.
- Basic concepts of database normalization.
- How to insert new rows into tables.
- How to add new columns to existing tables.
- How to update existing data in tables.