# Introduction to the SQL Data Manipulation Language

In the previous notebook, *8.1 Data Definition Language in SQL*, we looked at SQL's data definition language, which allows us to define tables in a relational database, including data types for each of the columns, and a primary key.

In this notebook, we will look at SQL's *data manipulation language*. The data manipulation language allows us to insert data into tables, delete that data and update it. This notebook will cover:
* how to insert data into tables using the `INSERT` statement, 
* how to delete data from tables using the `DELETE` statement, and
* how to change data in tables using the `UPDATE` statement.

You should spend around an hour on this notebook.


Note that many of the concepts discussed in this notebook have also been touched on in the notebooks for Part 3. Although this notebook is intended to stand alone, you might find it useful to refer back to those notebooks to refresh your memory.

## Setting up

The next group of cells set up your database connection, and reset the database to a clean state. Check notebook *8.1 Data Definition Language in SQL* if you are unsure what the next cells do.

You may need to change the given values of the variables `DB_USER` and `DB_PWD`, depending on which environment you are using

In [None]:
# Make the connection

%run sql_init.ipynb
print("Connecting with connection string : {}".format(DB_CONNECTION))
%sql $DB_CONNECTION

In [None]:
%run reset_databases.ipynb

## Simple data insertion with `INSERT INTO`

Having defined the tables in the database, we will now generally want to use those tables to contain some data. Let's start by using the `patient` table as defined in notebook *8.1 Data Definition Language in SQL*. Let's redefine the table here, remembering that the line `DROP TABLE IF EXISTS patient` removes the table completely if it already exists, otherwise it does nothing (the table should have been removed if you ran the database cleanup script at the start of this notebook). This means that we will have an empty version of the `patient` table after executing the cell:

In [None]:
%%sql

DROP TABLE IF EXISTS patient;

CREATE TABLE patient (
    
    patient_id CHAR(4),
    patient_name VARCHAR(20),
    date_of_birth DATE,
    gender CHAR(6),
    height_cm DECIMAL(4,1),
    weight_kg DECIMAL(4,1),
    doctor_id CHAR(4),
    
    PRIMARY KEY (patient_id)
 );

Again, we can visualise the structure of the table using the `%schema` IPython line magic:

In [None]:
%load_ext schemadisplay_magic

In [None]:
%schema --connection_string $DB_CONNECTION -t patient

and to see that the table is empty, we can make a `SELECT` query to see the table's contents:

In [None]:
%%sql 

SELECT *
FROM patient;

You should receive a message saying `0 rows affected`, and an empty table showing that no rows were returned.

### Line by line `INSERT`ion

Suppose we have the following data to put into the `patient` table:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg| doctor_id |
| ------ | ------ | ------ | ------ | ------ | ------| ------|
| p001 | Thornton | 1980/01/22 | F | 162.3 | 71.6|d06|
| p007 | Tennent | 1980/04/01 | M | 176.8 | 70.9|d07|
| p008 | James | 1980/07/08 | M | 167.9 | 70.5|d07|
| p009 | Kay | 1980/09/25 | F | 164.7 | 53.2|d06|
| p015 | Harris | 1980/12/04 | M | 180.6 | 64.3|d06|
| p031 | Rubinstein | 1980/12/23 | F | -  |  -   |d07|
| p037 | Boswell | 1981/06/11 | F |   -  |  -   |d10|
| p038 | Ming | 1981/09/23 | M | 186.3 | 85.4|d11|
| p039 | Maher | 1981/10/09 | F | 161.9 | 73.0|d11|
| p068 | Monroe | 1981/02/21 | F | 165.0 | 62.6|d10|
| p071 | Harris | 1981/12/12 | M | 186.3 | 76.7|d10|
| p078 | Hunt | 1982/02/25 | M | 179.9 | 74.3|d10|
| p079 | Dixon | 1982/05/05 | F | 163.9 | 56.5|d06|
| p080 | Bell | 1982/06/11 | F | 171.3 | 49.2|d07|
| p087 | Reed | 1982/06/14 | F | 160.0 | 59.1|d07|
| p088 | Boswell | 1982/08/23 | M | 168.4 | 91.4|d06|
| p089 | Jarvis | 1982/11/09 | F | 172.9 | 53.4|d10|

where we do not have the height or weight details for the patients with identifiers p031 and p037.

Consider the first row of this table:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|------|
| p001 | Thornton | 22/01/1980 | F | 162.3 | 71.6|d06|

To insert this data into the patient table, we can use the `INSERT INTO` statement. The basic form of the statement is:

<code>INSERT INTO &#x2329;table name&#x232A;( &#x2329;column 1&#x232A;, &#x2329;column 2&#x232A;,... &#x2329;column n&#x232A; )
VALUES ( &#x2329;value &#x232A;, &#x2329;value 2&#x232A;, ..., &#x2329;value n&#x232A;); </code>

For example, to insert the first row of data into the `patient` table, we can use:

In [None]:
%%sql

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('p001', 'Thornton', '1980-01-22', 'F', 162.3, 71.6, 'd06');

And check that the values have been inserted:

In [None]:
%%sql

SELECT *
FROM patient;

The query should have returned a single row, for the patient named `Thornton` with `patient_id` value of `p001`.

An advantage of specifying the arguments is that the arguments can be presented in any order that is most convenient to the programmer. For example, to add the next line of the table, we can use:


In [None]:
%%sql

INSERT INTO patient(date_of_birth, weight_kg, patient_name, doctor_id, height_cm, patient_id, gender)
VALUES('1980/04/01', 70.9, 'Tennent', 'd07', 176.8, 'p007', 'M');

which adds the row with the data in the correct columns, even though the specified order was different from that in the table. Again, we can check that the data is in the correct columns with a `SELECT` query:

In [None]:
%%sql

SELECT *
FROM patient;

You should now see the two rows, with the data in appropriate columns.

It is also possible to add multiple rows at a time, by simply enumerating the data in the `VALUES` clause of the `INSERT` statement. For example, to add the next three rows of the table, we can execute the following:

In [None]:
%%sql

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('p008', 'James', '1980/07/08', 'M', 167.9, 70.5, 'd07'),
       ('p009', 'Kay', '1980/09/25', 'F', 164.7, 53.2, 'd06'),
       ('p015', 'Harris', '1980/12/04', 'M', 180.6, 64.3, 'd06');

As before, we can check that the rows have been added correctly with a `SELECT` query:

In [None]:
%%sql

SELECT *
FROM patient;

### Activity 1

What do you think will happen if you try execute the following statement (to the table as it is currently populated)? Why?

```sql
INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('p008', 'Smith', '1981/03/13', 'M', 169.3, 81.7, 'd06');
```

Write your answer in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

The following cell executes the given statement:

In [None]:
%%sql

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('p008', 'Smith', '1981/03/13', 'M', 169.3, 81.7, 'd06');


If you try to execute the cell, you should receive an `IntegrityError` with the additional information that `duplicate key value violates unique constraint`. In this case, SQL has tried to enter the given values, creating a new row, but with the value `p008` in the `patient_id` column. `patient_id` has been specified as the primary key, and as you saw in [Activity 8.5](https://learn2.open.ac.uk/mod/oucontent/olinkremote.php?website=TM351&targetdoc=Part%208%20Introduction%20to%20relational%20databases&targetptr=4), the values in the primary key column of a table must be unique. In this case, attempting to add a row with a duplicated primary key has raised an error.

#### End of Activity 1

-------------------------------------------

## Missing values

If you now look at the next two rows of data for the `patient` table, the patients with identifiers p031 and p037 do not have specified values for either their height or their weight:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|------|
| p031 | Rubinstein | 1980/12/23 | F |   -   |   -  |d07|
| p037 | Boswell | 1981/06/11 | F |    -  |  - |d10|


To include this (lack of) data in the table, we can use the column specification again, omitting the columns for which there is not specified value:

In [None]:
%%sql

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, doctor_id)
VALUES ('p031', 'Rubinstein', '1980/12/23', 'F', 'd07');

Alternatively, we can include the data making explicit that the missing values should be represented with `NULL`:

In [None]:
%%sql

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('p037', 'Boswell', '1981/06/11', 'F', NULL, NULL, 'd10');

If we now `SELECT` the values from the table, the values for `height_cm` and `weight_kg` in those columns should be missing:

In [None]:
%%sql

SELECT *
FROM patient;

Note that, because the sql magic returns the table as a pandas DataFrame, the missing values appear as `None` or `NaN` (Not a Number), which is the pandas representation of `NULL` values. When making queries using the sql magic, remember to express the query in SQL terms. For example, if we now wanted to see all the data about patients whose height is not known we would use `NULL` (rather than `NaN` or `None`) in the `FROM` clause of the `SELECT` query: 

In [None]:
%%sql

SELECT *
FROM patient
WHERE height_cm IS NULL;

The query should have returned the two rows for which `height_cm` is missing.

### Activity 2

What do you think will happen if you try to execute the following statement? Why?

```
INSERT INTO patient(patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('Ming', '1980-09-23', 'M', 186.3, 85.4, 'd11');
```

Write your answer in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

The following cell executes the given statement:

In [None]:
%%sql

INSERT INTO patient(patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('Ming', '1980-09-23', 'M', 186.3, 85.4, 'd11');

If you try to execute the `INSERT` statement, you should receive an `IntegrityError`, with the additional information that `null value in column "patient_id" violates not-null constraint`. In this case, SQL has tried to enter the given values, and put `None` in the unspecified columns. However, one of these columns is `patient_id`, the primary key. As you saw in [Activity 8.5](https://learn2.open.ac.uk/mod/oucontent/olinkremote.php?website=TM351&targetdoc=Part%208%20Introduction%20to%20relational%20databases&targetptr=4), the values in the primary key column of a table may not be NULL, and so failing to specify the primary key value in this case has raised an error.

#### End of Activity 2

-------------------------------------------------------------

### Omitting the column name arguments in `INSERT` statements

When you make the `INSERT` statement, you can omit the argument list, provided that the number of arguments to `VALUES` matches the columns of the table and are in the correct order.

Let's try to add the next three rows of the `patient` table, which are:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|-----|
| p038 | Ming | 1981/09/23 | M | 186.3 | 85.4|d11|
| p039 | Maher | 1981/10/09 | F | 161.9 | 73.0|d11|
| p068 | Monroe | 1981/02/21 | F | 165.0 | 62.6|d10|


By providing the row data in the correct order, the column names in the `INSERT` clause can be omitted:

In [None]:
%%sql

INSERT INTO patient
VALUES ('p038', 'Ming', '1981/09/23', 'M', 186.3, 85.4, 'd11'),
       ('p039', 'Maher', '1981/10/09', 'F', 161.9, 73.0, 'd11'),
       ('p068', 'Monroe', '1981/02/21', 'F', 165.0, 62.6, 'd10');

If you execute the previous cell, the three rows specified should be correctly added to the `patient` table. We can check this with a `SELECT` query:

In [None]:
%%sql

SELECT *
FROM patient;

The table should now contain ten rows, including the data we just added.

However, bear in mind that omitting the column names in the `INSERT` clause is not always good practice.

For example, work through the next activity, which looks at what happens when some of the values have been omitted:

### Activity 3

What do you think will happen if you try execute the following statement (in which values for the `gender` and `doctor_id` columns have been omitted)? Why?

```sql
INSERT INTO patient
VALUES('p071', 'Harris', '1981-12-12', 186.3, 76.7);
```

Write your answer in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

The following cell executes the given statement:

In [None]:
%%sql

INSERT INTO patient
VALUES('p071', 'Harris', '1981-12-12', 186.3, 76.7);

The `INSERT` statement seems to have executed OK (in that it has not raised an error). However, if we now look at the resulting row in the `patient` table:

In [None]:
%%sql

SELECT * 
FROM patient
WHERE patient_id='p071';

we see that the data that has been inserted is definitely not what we wanted! Rather than placing `NULL` in the `gender` and `doctor_id` columns, the fourth value sent in the `VALUES` clause has been placed in the `gender` column (and been cast into a `CHAR(6)`, the data type for that column). The intended value for `weight_kg` has been placed in the `height_cm` column, and the `NULL` value which we were expecting in the `gender` column has ended up in the `weight_kg` column.

We will see in notebook *8.3 Adding column constraints to tables* how check constraints can be used to guard against some of these problems, but in general, it is wiser to be explicit about which data items should go into which column.

#### End of Activity 3

----------------------------------------------------------

## `UPDATE`ing and `DELETE`ing rows with `WHERE`

### `Update`ing rows

After using `INSERT` to enter new data into a table, you may later find that it needs changing. This is achieved using SQL's `UPDATE` statement, in which the rows to be changed are specified with a `WHERE` clause (similar to the `WHERE` clause in a `SELECT` query).

Consider the row we added in the previous activity, which should have been:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|------|
| p071 | Harris | 1981/12/12 |  None   | 186.3 | 76.7|None |

but ended up as:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|-------|
| p071 | Harris | 1981/12/12 | 186.3 | 76.7| None | None|

(If you did not execute the code for the previous activity, you can execute it now.)

We would like to alter this row in the table so that it contains the intended data (that is, once the correct values for `gender` and `doctor_id` have been introduced):

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|-------|
| p071 | Harris | 1981/12/12 | M | 186.3 | 76.7|d10

In [None]:
%%sql

SELECT * 
FROM patient
WHERE patient_id='p071';

To do this, we can use the `UPDATE` statement. The general form is:

<code>UPDATE &#x2329;table_name&#x232A;
SET column1 = value1, column2 = value2, ...
    WHERE &#x2329;condition&#x232A;;</code>

where the `WHERE` condition takes the same form as those in a `SELECT` query. Beware! If you do not specify a `WHERE` condition, then the update will be applied to *all* the rows in the database. (Just as all the rows of a table are returned if your `SELECT` query does not contain a `WHERE` condition.)

So to update the row for the patient named Harris (with identifier p071), we can use the following:

In [None]:
%%sql

UPDATE patient
SET gender='M', height_cm=186.3, weight_kg=76.7, doctor_id='d10'
    WHERE patient_id='p071';

If we have another look at the table, we should see that the row with the value `p071` in the `patient_id` column now has the desired values:

In [None]:
%%sql

SELECT *
FROM patient
WHERE patient_id='p071';

In this case, because we used the condition

<code>WHERE patient_id=p071</code>

the update was only applied to that specific row. We identified the row by the primary key, which we know is constrained to be unique and so can be used to identify that individual row.

### Activity 4

If you have been working through this notebook, the `patient` table should contain the row:

| patient_id | patient_name | date_of_birth | gender | height_cm | weight_kg|doctor_id|
| ------ | ------ | ------ | ------ | ------ | ------|-------|
| p001 | Thornton | 1980/01/22 | F | 162.3 | 71.6|d06|

Suppose you learn that this patient's name is not, in fact *Thornton*, but *Thorson* (perhaps the result of a transcription error). Write an SQL statement to alter this row so that the patient's name is recorded as *Thorson* rather than *Thornton*.

In [None]:
# Write your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

A straightforward statement to make the change would be:

In [None]:
%%sql

UPDATE patient
SET patient_name='Thorson'
    WHERE patient_id='p001';

And check that the change has been made:

In [None]:
%%sql

SELECT *
FROM patient
WHERE patient_id='p001';

You should find that in the row with `patient_id` having a value of `p001`, the value in the `patient_name` column is `Thorson`, rather than `Thornton`.

Note that although the following might have worked in this case:

`UPDATE patient
SET patient_name='Thorson'
    WHERE patient_name='Thornton';`

the statement would have changed *all* entries for which the value of `patient_name` is `Thornton`. If there had been several patients called Thornton, then this would have changed all of them (for example, the complete table contains two patients named Boswell). However, by using the primary key to identify the rows that need changing, we can ensure that only those rows we want to change have been altered.

#### End of Activity 4

--------------------------------------------

### Activity 5

The `gender` column in the `patient` table currently contains the values `M` and `F`. You wish to alter the table so that these values are `Male` and `Female`. Write one or more `UPDATE` statements to change all occurrences of `F` in the `patient` table to `Female`, and all the occurrences of `M` to `Male`.

In [None]:
# Write your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

For this task, rather than use the primary key to choose which rows to update, the `WHERE` clause should be written to apply the desired changes to the appropriate value in the `gender` column. So one possible way to do the update would be:

In [None]:
%%sql

UPDATE patient
SET gender='Female'
WHERE gender='F';

which will set `gender` to `'Female'` for every row in which the value of `gender` is `'F'`:

In [None]:
%%sql

SELECT *
FROM patient;

And the statement to change the values `'M'` to `'Male'` takes the same form:

In [None]:
%%sql

UPDATE patient
SET gender='Male'
WHERE gender='M';

In [None]:
%%sql

SELECT *
FROM patient;

At this point, the order of the rows in the `patient` table might have changed from the original order in which they were added. If you have studied any theory of databases previously (for example, in TM254 or some early versions of M269), you may remember that relational theory assumes that rows in a table are not ordered. When developing software that uses a relational database, you should write your code in a way that does not assume that the rows will appear in any particular order.

If you do want the rows to appear in a particular order, you can use an `ORDER BY` clause:

In [None]:
%%sql

SELECT *
FROM patient
ORDER BY patient_id;

#### End of Activity 5

------------------------------------

### `DELETE`ing rows

Deleting rows from a table is straightforward: the general form of the call is:

<code>DELETE FROM &#x2329;table_name&#x232A;
WHERE &#x2329;condition&#x232A;;</code>

So if we decided that we wanted to remove all patients from the database who were taller than 170cm, we could use the following statement:

In [None]:
%%sql

DELETE FROM patient
WHERE height_cm > 170 ;

If you execute the previous cell, you should find that those rows for which the value of `height_cm` is greater than 170 have been removed:

In [None]:
%%sql

SELECT *
FROM patient;

Note that those rows which do not have a value for `height_cm` have not been removed (where `patient_id` is `p031` and `p037`). Only those rows for which the `WHERE` clause is strictly true have been removed.

Similarly, to remove those rows which do not have an entry in the `height_cm` (ie. for which the value in `height_cm` is `NULL`), we can identify the rows with a suitable query:

In [None]:
%%sql

SELECT *
FROM patient
WHERE height_cm IS NULL;

and remove them with the same <code>WHERE</code> clause:

In [None]:
%%sql

DELETE FROM patient
WHERE height_cm IS NULL;

In [None]:
%%sql

SELECT *
FROM patient;

As with `UPDATE`, beware when using `DELETE`! The following code removes all the data from the `patient` table: this is a deceptively simple statement that can do a lot of damage.

In [None]:
%%sql

DELETE FROM patient;

In [None]:
%%sql

SELECT *
FROM patient;

Executing the two previous cells will show that the `DELETE` statement has deleted all the data from the `patient` table. Use with care!

## CREATEing from another table with SELECT

If you have been working through this notebook, at this point, your `patient` table will either be very different from how it started out, or completely empty. The next three cells will redefine the table, populate it with the data from the entity diagrams, and show the result. You should execute these cells before going any further.

In [None]:
%%sql

-- Redefine the patient table

DROP TABLE IF EXISTS patient;

CREATE TABLE patient (
    
    patient_id CHAR(4),
    patient_name VARCHAR(20),
    date_of_birth DATE,
    gender CHAR(6),
    height_cm DECIMAL(4,1),
    weight_kg DECIMAL(4,1),
    doctor_id CHAR(4),
    
    PRIMARY KEY (patient_id)
 );

In [None]:
%%sql

-- Add the data for the complete table

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, height_cm, weight_kg, doctor_id)
VALUES ('p001', 'Thornton', '1980-01-22', 'F', 162.3, 71.6, 'd06'),
       ('p007', 'Tennent', '1980-04-01', 'M', 176.8, 70.9, 'd07'),
       ('p008', 'James', '1980-07-08', 'M', 167.9, 70.5, 'd07'),
       ('p009', 'Kay', '1980-09-25', 'F', 164.7, 53.2, 'd06'),
       ('p015', 'Harris', '1980-12-04', 'M', 180.6, 64.3, 'd06'),
       ('p038', 'Ming', '1981-09-23', 'M', 186.3, 85.4, 'd11'),
       ('p039', 'Maher', '1981-10-09', 'F', 161.9, 73.0, 'd11'),
       ('p068', 'Monroe', '1981-02-21', 'F', 165.0, 62.6, 'd10'),
       ('p071', 'Harris', '1981-12-12', 'M', 186.3, 76.7, 'd10'),
       ('p078', 'Hunt', '1982-02-25', 'M', 179.9, 74.3, 'd10'),
       ('p079', 'Dixon', '1982-05-05', 'F', 163.9, 56.5, 'd06'),
       ('p080', 'Bell', '1982-06-11', 'F', 171.3, 49.2, 'd07'),
       ('p087', 'Reed', '1982-06-14', 'F', 160.0, 59.1, 'd07'),
       ('p088', 'Boswell', '1982-08-23', 'M', 168.4, 91.4, 'd06'),
       ('p089', 'Jarvis', '1982-11-09', 'F', 172.9, 53.4, 'd10');

INSERT INTO patient(patient_id, patient_name, date_of_birth, gender, doctor_id)
VALUES ('p031', 'Rubinstein', '1980-12-23', 'F', 'd07'),
       ('p037', 'Boswell', '1981-06-11', 'F', 'd10');

In [None]:
%%sql

SELECT *
FROM patient
ORDER BY patient_id;

Sometimes, you will want to be able to create a new table by using data from another table (although in many cases you may wish to use a view instead: we will consider views in [Part 11](https://learn2.open.ac.uk/mod/oucontent/olinkremote.php?website=TM351&targetdoc=Part%2011%20Subqueries%20and%20views&targetptr=3)). 

However, suppose that you wanted to define a table containing a subset of the data in the `patient` table. For example, perhaps you want a table containing the name, patient identifier and date of birth of all the female patients.

We can construct a query which will return a table containing the data that we are interested in:

In [None]:
%%sql

SELECT patient_id, patient_name, date_of_birth
FROM patient
WHERE gender='F';

The query should have returned a table containing the identifier and name of each of the female patients.

Now, suppose that you wanted to define a table which contained only this data, let's call it `female_patient`. We can create this table using the general form:

<code>CREATE TABLE &#x2329;table name&#x232A; AS
    &#x2329;query&#x232A;; </code>

where the new table <code>&#x2329;table name&#x232A;</code> is the table returned by <code>&#x2329;query&#x232A;</code>. So we can now create the table `female_patient` using the result of the previous query:

In [None]:
%%sql

DROP TABLE IF EXISTS female_patient;

CREATE TABLE female_patient AS
    SELECT patient_id, patient_name, date_of_birth
    FROM patient
    WHERE gender='F';

And we can now query the table in its own right:

In [None]:
%%sql

SELECT *
FROM female_patient;

If we now look at the schema displays for the `patient` and `female_patient` tables, we will see that as well as inheriting all the data from the `patient` table, the `female_patient` table has also inherited the same types for each column:

In [None]:
%schema --connection_string $DB_CONNECTION -t patient

In [None]:
%schema --connection_string $DB_CONNECTION -t female_patient

But beware! Notice that `female_table` does not have a primary key defined on it (there is no `(PK)` after `patient_id` in the `female patient` table): although the data and column types are the same as for `patient`, we still need to define a primary key on `female_patient`:

In [None]:
%%sql

ALTER TABLE female_patient
ADD CONSTRAINT female_patient_pk
    PRIMARY KEY (patient_id);

(We will take a more detailed look at adding or removing constraints from tables in notebook *8.3 Adding column constraints to tables*.)

As you will see in Part 11, it is more common to use a view than to copy part of the table. Copying the table runs considerable risks of creating integrity errors: the whole point of data modelling is to avoid having multiple copies of the same piece of data. 

However, this kind of construction can be important for restructuring a database, as we shall see when we normalise database tables in Part 10.

### Activity 6

Write one or more SQL statements to create a table `young_patient`, which has columns `patient_id`, `patient_name` and `gender`, and populate it with the data from `patient` for which the date of birth falls on or after the 1st January, 1982.

In [None]:
# Write your  code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

As before, we can create the table from an appropriate query. The query we need is:

In [None]:
%%sql

SELECT patient_id, patient_name, gender
FROM patient
WHERE date_of_birth >= '1982-01-01';

and then we can use the appropriate form of `CREATE TABLE` statement to create a table with the results of the query:

In [None]:
%%sql

CREATE TABLE young_patient AS
    SELECT patient_id, patient_name, gender
    FROM patient
    WHERE date_of_birth >= '1982-01-01';

We also need to define a primary key on the table:

In [None]:
%%sql

ALTER TABLE young_patient
ADD CONSTRAINT young_patient_pk
    PRIMARY KEY (patient_id);

And look at the result:

In [None]:
%%sql

SELECT *
FROM young_patient;

#### End of Activity 6

--------------------------------------

## What next?

You have now seen how to create tables and populate them with data, either line-by-line or by selecting rows from an existing database. You have also seen how to update data in a table, and how to delete data from a table.

You can now move on to the final notebook of this part, *08.3 Adding column constraints to tables*. We will look at how to define some constraints on the values which may be entered into tables, and how to exchange data between SQL tables and pandas dataframes.