# Transaction anomalies

In this notebook, we'll change the transaction isolation levels to show the problems which can occur with insufficiently isolated transactions.

First, let's run the database setup:

## Setting up

The next group of cells set up your database connection, and reset the database to a clean state. Check notebook *08.1 Data Definition Language in SQL* if you are unsure what the next cells do.

You may need to change the given values of the variables `DB_USER` and `DB_PWD`, depending on which environment you are using

In [None]:
# Set up the PostgreSQL environment

%run sql_init.ipynb
print("Connecting with connection string : {}".format(DB_CONNECTION))
%sql $DB_CONNECTION

In [None]:
%run reset_databases.ipynb

## When things go wrong

As with notebook `12.1 Concurrent Transactions`, in order to carry out this notebook, it is important that you do not have any other notebooks running which might be making their own calls on the database. It is quite likely that you may still have some open database connections from other notebooks. Check the [Jupyter Running](/tree#running) tab to see if any week 11 or week 12 notebooks other than this one are running, and if so, shut them down.

At some point, when you're working through this notebook, you will probably end up in a real muddle. 

Don't despair.

If you need to reset the database, first close all the existing connections:

At this point, you should be able to simply rerun the cells in the section **Setting up**, which will call the reset script. However, in a worst case scenario, you can consider resetting your server, as described in the software guide.

# Concurrent transactions


We'll create three connections to the database, to show how transactions work in a concurrent multiuser environment. As with notebook `12.1 Concurrent Transactions`, we will make the connections explicitly using the functions in `psycopg2`, so let's start by importing that:

In [None]:
import psycopg2 as pg
import psycopg2.extensions as pge

And as in notebook 12.1, we can define a short function, `transaction_status`, which returns a string describing the transaction status of the given connection:

In [None]:
def transaction_status(conn):
    '''
    Return a string showing the transaction status of the
    given connection, conn.
    '''
    transaction_status_dict={
        pg.extensions.TRANSACTION_STATUS_IDLE:"The session is idle and there is no current transaction.",
        pg.extensions.TRANSACTION_STATUS_ACTIVE:"A command is currently in progress.",
        pg.extensions.TRANSACTION_STATUS_INTRANS:"The session is idle in a valid transaction block.",
        pg.extensions.TRANSACTION_STATUS_INERROR:"The session is idle in a failed transaction block.",
        pg.extensions.TRANSACTION_STATUS_UNKNOWN:"Reported if the connection with the server is bad."
    }
    return transaction_status_dict[conn.get_transaction_status()]
    


We'll create three connections to the database, to show how transactions work in a concurrent multiuser environment.

Because it's very easy to get confused when dealing with several different users (i.e. connections), we'll give the concurrent connection doctor names and pictures. 

Meet the doctors:

### Dr Gibson ![Gibson](images/gibson.png) 

In [None]:
gibson = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use


gibson.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

gibson.autocommit = False

###  Dr Paxton ![Paxton](images/paxton.png)

In [None]:
paxton = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use


paxton.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

paxton.autocommit = False 

### Dr Tamblin ![Tamblin](images/tamblin.png) 

In [None]:
tamblin = pg.connect(dbname=DB_USER,     # the name of the database
                     host='localhost',   # the host on which the database engine is running
                     user=DB_USER,       # id of the user who is logging in
                     password=DB_PWD,    # the user's password
                     port=5432,          # the port on which the database engine is listening
                     options="-c search_path=hospital")  # the schema to use


tamblin.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

tamblin.autocommit = False

As in notebook 12.1, we have set the `autocommit` flag to `False`. This means that we always need to explicitly tell the transaction to complete with a `commit` statement. Also, we set the isolation level using `pge.ISOLATION_LEVEL_SERIALIZABLE`: this means that both connections have an isolation level of serializable, as described in the opening section of the notebook.

## Transaction anomaly: Non-repeatable read


A non-repeatable read occurs when a transaction reads some data, which is then altered by another transaction. If the first transaction re-reads the first record, it will see a different value. This situation might occur if the first transaction is compiling two reports from the same data.

In this example, you'll see how different transaction isolation levels can give different results.

![Gibson](images/gibson.png) ![Tamblin](images/tamblin.png) Gibson and Tamblin want to run some reports on the patients. They're both generating the reports at the same time. They both want to see the patient weights, but ordered in two ways: by patient's identifier and by the patient's name. To ensure their reports are consistent, both decide to put both queries in one transaction. 

To keep things a bit more readable, let's create two strings which hold the queries that the two doctors will execute. This also means we can be sure that they are using exactly the same code:

In [None]:
# SELECT data on all patients, ordered by patient identifier
select_all_patients_by_pid = '''
                                SELECT patient_id, patient_name, weight_kg 
                                FROM patient
                                ORDER BY patient_id;
                              '''

# SELECT data on all patients, ordered by name
select_all_patients_by_name = '''
                                SELECT patient_id, patient_name, weight_kg 
                                FROM patient
                                ORDER BY patient_name;
                              '''

![Paxton](images/paxton.png)
Unbeknown to them, Paxton is updating some data at the same time.

Here's how different transaction isolation levels can keep things consistent, or not.

We will set the isolation level for Gibson as *read committed*. Section 3.1 of Part 12 tells us that this isolation level will allow a non-repeatable read.

In [None]:
gibson.isolation_level = pge.ISOLATION_LEVEL_READ_COMMITTED

We will set the isolation level for Tamblin as *serializable*. This is the strictest isolation level, and should *not* permit non-repeatable reads.

In [None]:
tamblin.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

![Gibson](images/gibson.png) 

**Step 1**: Gibson starts her transaction and runs the first query. Remember, if we directly make a select query on a connection using `pd.read_sql()`, a transaction is begun by default, but not automatically committed (we set 
`gibson.autocommit = False`) — we need to explictly commit it to close the transaction.

Gibson puts the results of her query into the dataframe `gibson_df_1`:

In [None]:
gibson_df_1=pd.read_sql_query(select_all_patients_by_pid, gibson)

We can preview the first few rows of the table returned by the query, the results from which have been stored in a *pandas* dataframe:

In [None]:
gibson_df_1.head()

![Tamblin](images/tamblin.png)**Step 2**: Tamblin starts her transaction and runs the first query.

She puts the results of her first query into the dataframe `tamblin_df_1`:

In [None]:
tamblin_df_1=pd.read_sql_query( select_all_patients_by_pid, tamblin)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
tamblin_df_1.head()

Note that Gibson's and Tamblin's connections are still part of ongoing transactions:

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")
print(f"{'Tamblin'} : {transaction_status(tamblin)}")

![Paxton](images/paxton.png) **Step 3**: Now Paxton weighs the patient Thornton with id `p001` and updates the record to 75kg. 

In [None]:
with paxton.cursor() as paxton_curson:
    paxton_curson.execute("BEGIN;")
    
    paxton_curson.execute('''
        UPDATE patient
        SET weight_kg = 75.0
            WHERE patient_id = 'p001';
        ''')

Paxton can see the changed weight in the database:

In [None]:
pd.read_sql_query('''
                SELECT patient_id, patient_name, weight_kg
                FROM patient
                WHERE patient_id='p001';
                ''', paxton)

**Step 4**: Paxton commits his work, completing his transaction.

In [None]:
paxton.commit()

![Gibson](images/gibson.png) **Step 5**: Gibson now runs the second query for her report, storing the result in the dataframe `gibson_df_2`.

In [None]:
gibson_df_2=pd.read_sql_query(select_all_patients_by_name, gibson)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
gibson_df_2.head()

![Tamblin](images/tamblin.png) **Step 6**: Tamblin also runs the second query for her report, storing the result in the dataframe `tamblin_df_2`.

In [None]:
tamblin_df_2=pd.read_sql_query( select_all_patients_by_name , tamblin)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
tamblin_df_2.head()

**Step 6**: Finally, both Gibson and Tamblin commit their (read only) work, completing their transactions;

In [None]:
gibson.commit()
tamblin.commit()

We now have the results of four queries: one for each of Gibson and Tamblin taken before Paxton's update, and one for each of Gibson and Tamblin taken after Paxton's update. Do they contain the same data? As a check, we will look at the average weight of the patients in each of the four returned tables:

In [None]:
gibson_df_1['weight_kg'].mean()

In [None]:
tamblin_df_1['weight_kg'].mean()

In [None]:
gibson_df_2['weight_kg'].mean()

In [None]:
tamblin_df_2['weight_kg'].mean()

We can see that something has gone wrong here. Tamblin's tables both contain the same average weight for the patients, but Gibson's second table (constructed after Paxton's update) contains a different average weight.

This is what happened.

![Nonrepeatable read](images/12.2.1.png)

![Tamblin](images/tamblin.png) Because Tamblin was using a **high** transaction isolation level, she saw a consistent view of the database for both queries in the same transaction: all her reads are from a version of the database when her transaction started. She doesn't see Paxton's update, but her view of the database is consistent with her report completing before Paxton's update started.

We can look at the values for patient `p001` in each of the two dataframes:

In [None]:
tamblin_df_1[tamblin_df_1['patient_id']=='p001']

In [None]:
tamblin_df_2[tamblin_df_2['patient_id']=='p001']

For Tamblin with the `SERIALIZABLE` isolation level, the two dataframes contain the same value.

![Gibson](images/gibson.png) Because Gibson was using a **low** transaction isolation level, her view of the database changed _during_ her transaction in response to Paxton's update. Her reads reflect the state of the database at the time of the read, not the time of the transaction starting.

In [None]:
gibson_df_1[gibson_df_1['patient_id']=='p001']

In [None]:
gibson_df_2[gibson_df_2['patient_id']=='p001']

For Gibson with the `READ COMMITTED` isolation level, the two dataframes contain different values from before and after Paxton's update.

### Activity 1

Before carrying out this activity, complete any ongoing transactions with `commit`:

In [None]:
paxton.commit()
tamblin.commit()
gibson.commit()

and reset the weight value for patient `p001` with Paxton's connection:

In [None]:
with paxton.cursor() as paxton_curson:
    paxton_curson.execute("BEGIN;")
    
    paxton_curson.execute('''
        UPDATE patient
        SET weight_kg = 75.0
            WHERE patient_id = 'p001';
        ''')

paxton.commit()

According to section 3.1 of Part 12, the `REPEATABLE READ` isolation level should prevent this non-repeatable read anomaly. Does it?

Repeat the steps above, but with `gibson` at the `REPEATABLE READ` isolation level.

In [None]:
gibson.isolation_level = pge.ISOLATION_LEVEL_REPEATABLE_READ
tamblin.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

As with the exercise above, you will need several cells to move each of the three doctors through their transactions.

In [None]:
# Write your code in this cell (adding more if necessary)

#### Our solution

To reveal our solution, click on the triangle symbol on the left-hand end of this cell.

Most of this task can be completed by just cutting and pasting the appropriate earlier cells in the notebook:

![Gibson](images/gibson.png) 

**Step 1**: Gibson starts her transaction and runs the first query.

In [None]:
gibson_df_1=pd.read_sql_query(select_all_patients_by_pid, gibson)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
gibson_df_1.head()

![Tamblin](images/tamblin.png)**Step 2**: Tamblin starts her transaction and runs the first query.

In [None]:
tamblin_df_1=pd.read_sql_query(select_all_patients_by_pid, tamblin)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
tamblin_df_1.head()

Note that Gibson's and Tamblin's connections are still part of ongoing transactions:

In [None]:
print('Gibson:',transaction_status(gibson))
print('Paxton:',transaction_status(paxton))
print('Tamblin:',transaction_status(tamblin))

![Paxton](images/paxton.png) **Step 3**: Paxton weighs the patient with id `p001` and updates the record to 75kg. 

In [None]:
with paxton.cursor() as paxton_curson:
    paxton_curson.execute("BEGIN;")
    
    paxton_curson.execute('''
        UPDATE patient
        SET weight_kg = 75.0
            WHERE patient_id = 'p001';
        ''')

Paxton can see the changed weight in the database:

In [None]:
pd.read_sql_query('''
                SELECT patient_id, patient_name, weight_kg
                FROM patient
                WHERE patient_id='p001';
                ''', paxton)

**Step 4**: Paxton commits his work, completing his transaction.

In [None]:
paxton.commit()

![Gibson](images/gibson.png) **Step 5**: The second query for Gibson's report now runs.

In [None]:
gibson_df_2=pd.read_sql_query( select_all_patients_by_name, gibson)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
gibson_df_2.head()

![Tamblin](images/tamblin.png) **Step 6**: The second query for Tamblin's report also runs:

In [None]:
tamblin_df_2=pd.read_sql_query(select_all_patients_by_name, tamblin)

We can see (the first few rows of) the table returned by the query, which has been stored in the dataframe:

In [None]:
tamblin_df_2.head()

**Step 6**: Finally, both Gibson and Tamblin commit their work, completing their transactions;

In [None]:
gibson.commit()
tamblin.commit()

We now have the results of four queries: one for each of Gibson and Tamblin taken before Paxton's update, and one for each of Gibson and Tamblin taken after Paxton's update. Do they contain the same data? As a check, we will look at the average weight of the patients in each of the four returned tables:

In [None]:
gibson_df_1['weight_kg'].mean()

In [None]:
tamblin_df_1['weight_kg'].mean()

In [None]:
gibson_df_2['weight_kg'].mean()

In [None]:
tamblin_df_2['weight_kg'].mean()

In this case, all the tables should contain the same values for the average weight.

Again, we can look at the values for patient `p001` in each of the two dataframes for Tamblin's queries:

In [None]:
tamblin_df_1[tamblin_df_1['patient_id']=='p001']

In [None]:
tamblin_df_2[tamblin_df_2['patient_id']=='p001']

For Tamblin with the `SERIALIZABLE` isolation level, the two dataframes contain the same value.

And similarly for Gibson's dataframes:

In [None]:
gibson_df_1[gibson_df_1['patient_id']=='p001']

In [None]:
gibson_df_2[gibson_df_2['patient_id']=='p001']

For Gibson, the `REPEATABLE READ` isolation level has prevented the nonrepeatable read anomaly.

#### End of Activity 1

-------------------------------------------------------

## Transaction anomaly: Phantom read

A *phantom read* occurs when another transaction adds (or removes) a whole new row to a database table.

This time, Drs Gibson and Paxton are running the "weights of my patients" report, again generating two tables in the report (one ordered by ID, one by name). In this case, the reports are only required to contain the data for those patients in the care of the doctor generating the report.

Unfortunately, during the compilation of the reports, Dr Tamblin reallocates two of Dr Rampton's patients. The patient with id `p068` is moved to Gibson's (`d06`) care, and the patient with id `p071`, moves to Paxton (`d07`).

To see the problems that might occur, we will reset Gibson to a `READ COMMITTED` isolation level while Paxton has a `SERIALIZABLE` isolation level.

In [None]:
gibson.isolation_level = pge.ISOLATION_LEVEL_READ_COMMITTED
tamblin.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

![Gibson](images/gibson.png) **Step 1**: Gibson (`d06`) starts her transaction and runs the first query, collecting the weights of patients who are in her own care. As before, she stores the results of the transaction in the dataframe `gibson_df_1`.

In [None]:
gibson_df_1=pd.read_sql_query('''
                SELECT patient_id, patient_name, weight_kg
                FROM patient
                WHERE doctor_id = 'd06';
                ''', gibson)

gibson_df_1.head()

![Paxton](images/paxton.png) **Step 2**: Paxton (`d07`) also starts his transaction and runs his first query, storing the result in the dataframe `paxton_df_1`.

In [None]:
paxton_df_1=pd.read_sql_query('''
                SELECT patient_id, patient_name, weight_kg
                FROM patient
                WHERE doctor_id = 'd07';
                ''', paxton)
paxton_df_1.head()

![Tamblin](images/tamblin.png) **Step 3**: Tamblin reallocates patient `p068` to Gibson (`d06`), and patient `p071` to Paxton (`d07`).

In [None]:
with tamblin.cursor() as tamblin_cursor:
    
    tamblin_cursor.execute("BEGIN;")
    
    tamblin_cursor.execute('''
                    UPDATE patient 
                    SET doctor_id = 'd06' 
                        WHERE patient_id = 'p068';
                    ''')
    
    tamblin_cursor.execute('''
                    UPDATE patient 
                    SET doctor_id = 'd07' 
                        WHERE patient_id = 'p071';
                    ''')


**Step 4**: Tamblin commits the change, completing that transaction:

In [None]:
tamblin.commit()

Note that Gibson and Paxton's connections are still involved in the ongoing transactions:

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")
print(f"{'Tamblin'} : {transaction_status(tamblin)}")

![Gibson](images/gibson.png) **Step 5**: Gibson now runs the second query for her report, storing the result in the dataframe `gibson_df_2`.

In [None]:
gibson_df_2=pd.read_sql_query('''
                SELECT patient_id, patient_name, weight_kg 
                FROM patient
                WHERE doctor_id = 'd06'
                ORDER BY patient_name;
                ''', gibson)

gibson_df_2.head()

![Paxton](images/paxton.png) **Step 6**: Paxton also runs the second query for his report, storing the result in `paxton_df_2`.

In [None]:
paxton_df_2=pd.read_sql_query('''
                SELECT patient_id, patient_name, weight_kg
                FROM patient
                WHERE doctor_id = 'd07'
                ORDER BY patient_name;
                ''', paxton)

paxton_df_2.head()

**Step 7**: Paxton's and Gibson's changes are committed, completing their transactions.

In [None]:
gibson.commit()
paxton.commit()

How did they do? What are the average weights of patients in the results of all four queries?

In [None]:
gibson_df_1['weight_kg'].mean()

In [None]:
paxton_df_2['weight_kg'].mean()

In [None]:
gibson_df_2['weight_kg'].mean()

In [None]:
paxton_df_2['weight_kg'].mean()

Again, things aren't right. In this case, Gibson, with the `READ COMMITTED` isolation level, has ended up with different values in the tables.

![Phantom reads](images/12.2.2.png)

(In the diagram, the numbers refer to the count of patients for each doctor.)

![Paxton](images/paxton.png) Because Paxton was using a **high** transaction isolation level, he saw a consistent view of the database for both queries in the same transaction: all his reads are from the state of the database when his transaction started. He doesn't see Tamblin's update, but his view of the database is consistent with his report completing before Tamblin's update started.

![Gibson](images/gibson.png) Because Gibson was using a **low** transaction isolation level, her view of the database changed _during_ her transaction in response to Tamblin's update. As in the previous example, her reads reflect the state of the database at the time of the read, not the time her transaction started.

# Serialisation anomaly


A serialisation anomaly can occur when a transaction reads two or more records, but one at a time. In the time between reading the first and second record, some other transaction alters the first. The connection between the records reported by transaction A was never true in the database.

### Cleanup previous updates

Before starting the next example, we shall reset the database, and create fresh connections:

In [None]:
paxton.close()
gibson.close()
tamblin.close()

In [None]:
%run reset_databases.ipynb

In [None]:
paxton = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use

paxton.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

paxton.autocommit = False

In [None]:
gibson = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use

gibson.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

gibson.autocommit = False

In [None]:
tamblin = pg.connect(dbname=DB_USER,     # the name of the database
                     host='localhost',   # the host on which the database engine is running
                     user=DB_USER,       # id of the user who is logging in
                     password=DB_PWD,    # the user's password
                     port=5432,          # the port on which the database engine is listening
                     options="-c search_path=hospital")  # the schema to use

tamblin.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

tamblin.autocommit = False

## The task


Not every doctor has patients. We can see which doctors do not have patients with a query from Gibson's connection:

In [None]:
pd.read_sql_query('''
    SELECT doctor.doctor_id, doctor_name, COUNT(patient_id) AS number_of_patients
    FROM doctor LEFT OUTER JOIN patient 
         ON doctor.doctor_id = patient.doctor_id
    GROUP BY doctor.doctor_id, doctor_name
    ''', gibson)

(By using a `LEFT OUTER JOIN` here, we include the count for the doctor `Tamblin`, even though that doctor has no associated patients.)

In [None]:
# Commit to complete gibson's transaction 
gibson.commit()

An edict comes from on high saying that every doctor must have at least one patient. Drs Gibson (`d06`) and Paxton (`d07`), feeling hard done by, each decide to donate one of their patients to Dr Tamblin (`d09`). 

However, it's a race to see who can allocate a patient first: if both Gibson _and_ Paxton give a patient to Tamblin, Tamblin will have two patients and she'll be upset with the vast increase in caseload. 

Therefore, each of Gibson and Paxton will follow the same logic:

1. begin transaction
2. if Tamblin has no patients:
  1. find my patient with the lowest `patient_id`
  2. allocate that patient to Tamblin

Gibson and Paxton start at about the same time, so their transactions overlap: both count how many patients Tamblin has before either reallocates patients. Note that at no point are Gibson or Paxton modifying the same patient in the database, or even the same set of patients in the database. 

What will happen?

## Repeatable read isolation
We start by putting both Gibson and Paxton at the `REPEATABLE READ` isolation level which (according to the documentation) should allow serialisation anomalies. In this case the anomaly is a _write skew_.

In [None]:
gibson.isolation_level = pge.ISOLATION_LEVEL_REPEATABLE_READ
paxton.isolation_level = pge.ISOLATION_LEVEL_REPEATABLE_READ

**Step 1**: Gibson starts a transaction and sees that Tamblin has no patients.

In [None]:
gibson_count_df=pd.read_sql_query('''
                        SELECT doctor.doctor_id, doctor_name, 
                               COUNT(patient_id) AS number_of_patients
                        FROM doctor LEFT OUTER JOIN patient 
                             ON doctor.doctor_id = patient.doctor_id
                        GROUP BY doctor.doctor_id, doctor_name;
                        ''', gibson)
gibson_count_df

**Step 2**: Paxton also starts a transaction and sees that Tamblin has no patients.

In [None]:
paxton_count_df=pd.read_sql_query('''
                        SELECT doctor.doctor_id, doctor_name, 
                               COUNT(patient_id) AS number_of_patients
                        FROM doctor LEFT OUTER JOIN patient 
                             ON doctor.doctor_id = patient.doctor_id
                        GROUP BY doctor.doctor_id, doctor_name;
                        ''', paxton)
paxton_count_df

**Step 3**: Gibson moves a patient to Tamblin's care. (Gibson tries to move the patient with the identifier of `MIN(patient_id)`, which is just the patient who has the first alphabetic value of `patient_id`, .)

In [None]:
with gibson.cursor() as gibson_cursor:
        gibson_cursor.execute('''
                UPDATE patient 
                SET doctor_id = 'd09'
                WHERE patient_id = (SELECT MIN(patient_id)
                                    FROM patient
                                    WHERE doctor_id='d06');
                ''')

Within her transaction, Gibson can see that doctor `d09` (Tamblin) is now responsible for patient `p001`.

In [None]:
pd.read_sql_query('''
                        SELECT *
                        FROM patient
                        WHERE doctor_id='d09';
                        ''', gibson)

**Step 4**: Paxton also moves a patient to Tamblin's care.

In [None]:
with paxton.cursor() as paxton_cursor:
        paxton_cursor.execute('''
                UPDATE patient 
                SET doctor_id = 'd09'
                WHERE patient_id = (SELECT MIN(patient_id)
                                    FROM patient
                                    WHERE doctor_id='d07');
                ''')

Within his transaction, Paxton can see that doctor `d09` (Tamblin) is now responsible for patient `p007`.

In [None]:
pd.read_sql_query('''
                        SELECT *
                        FROM patient
                        WHERE doctor_id='d09';
                        ''', paxton)

**Step 5**: Both doctors commit their transactions.

In [None]:
gibson.commit()
paxton.commit()

Now Gibson looks to see which doctors have which patients.

In [None]:
pd.read_sql_query('''
        SELECT doctor.doctor_id, doctor_name,
            COUNT(patient_id) AS number_of_patients
        FROM doctor LEFT OUTER JOIN patient
            ON doctor.doctor_id = patient.doctor_id
        GROUP BY doctor.doctor_id, doctor_name;
                        ''', gibson)

Tamblin now has two patients. It seems that Gibson's and Paxton's transactions weren't isolated. This is what happened.

![Write skew](images/12.2.3.png)

### Activity 2


Repeat the above steps but with both Gibson and Paxton at the "serializable" isolation level. Do things work out differently?

Before you start, run the "cleanup" cells below to reset the database.

In [None]:
paxton.close()
gibson.close()
tamblin.close()

In [None]:
%run reset_databases.ipynb

In [None]:
paxton = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use

paxton.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

paxton.autocommit = False

In [None]:
gibson = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use

gibson.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

gibson.autocommit = False

In [None]:
tamblin = pg.connect(dbname=DB_USER,     # the name of the database
                     host='localhost',   # the host on which the database engine is running
                     user=DB_USER,       # id of the user who is logging in
                     password=DB_PWD,    # the user's password
                     port=5432,          # the port on which the database engine is listening
                     options="-c search_path=hospital")  # the schema to use

tamblin.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

tamblin.autocommit = False

Again, you will need multiple cells to step through the two transactions at the same time. Note: **At this isolation level, you should expect at least one of the transactions to fail.**

In [None]:
# Write your code in this cell (adding more if necessary)

#### Our solution

To reveal our solution, click on the triangle symbol on the left-hand end of this cell.

As before, the bulk of the activity can be addressed by cutting and pasting the cells from the original example:

In [None]:
gibson.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE
paxton.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

**Step 1**: Gibson starts a transaction and sees that Tamblin has no patients.

In [None]:
pd.read_sql_query('''
            SELECT doctor.doctor_id, doctor_name, 
                   COUNT(patient_id) AS number_of_patients
            FROM doctor LEFT OUTER JOIN patient 
                 ON doctor.doctor_id = patient.doctor_id
            GROUP BY doctor.doctor_id, doctor_name;
            ''', gibson)

**Step 2**: Paxton also starts a transaction and sees that Tamblin has no patients.

In [None]:
pd.read_sql_query('''
            SELECT doctor.doctor_id, doctor_name, 
                   COUNT(patient_id) AS number_of_patients
            FROM doctor LEFT OUTER JOIN patient 
                 ON doctor.doctor_id = patient.doctor_id
            GROUP BY doctor.doctor_id, doctor_name;
            ''', paxton)


**Step 3**: Gibson moves a patient to Tamblin's care. (Gibson tries to move the patient with the identifier of `MIN(patient_id)`, which is just the patient who has the first alphabetic value of `patient_id`, .)

In [None]:
with gibson.cursor() as gibson_cursor:
        gibson_cursor.execute('''
                UPDATE patient 
                SET doctor_id = 'd09'
                WHERE patient_id = (SELECT MIN(patient_id)
                                    FROM patient
                                    WHERE doctor_id='d06');
                ''')

Within her transaction, Gibson can see that doctor `d09` (Tamblin) is now responsible for patient `p001`.

In [None]:
pd.read_sql_query('''
                        SELECT *
                        FROM patient
                        WHERE doctor_id='d09';
                        ''', gibson)

**Step 4**: Paxton also moves a patient to Tamblin's care.

In [None]:
with paxton.cursor() as paxton_cursor:
        paxton_cursor.execute('''
                UPDATE patient 
                SET doctor_id = 'd09'
                WHERE patient_id = (SELECT MIN(patient_id)
                                    FROM patient
                                    WHERE doctor_id='d07');
                ''')

Within his transaction, Paxton can see that doctor `d09` (Tamblin) is now responsible for patient `p007`.

In [None]:
pd.read_sql_query('''
                        SELECT *
                        FROM patient
                        WHERE doctor_id='d09';
                        ''', paxton)

**Step 5**: Both doctors commit their transactions.

In [None]:
gibson.commit()

In [None]:
paxton.commit()

Paxton's transaction failed.

Paxton has to roll back.

In [None]:
paxton.rollback()

Gibson and Paxton now see that Tamblin has just one patient.

In [None]:
pd.read_sql_query('''
            SELECT doctor.doctor_id, doctor_name, 
                   COUNT(patient_id) AS number_of_patients
            FROM doctor LEFT OUTER JOIN patient 
                 ON doctor.doctor_id = patient.doctor_id
            GROUP BY doctor.doctor_id, doctor_name;
            ''', gibson)

In [None]:
pd.read_sql_query('''
            SELECT doctor.doctor_id, doctor_name, 
                   COUNT(patient_id) AS number_of_patients
            FROM doctor LEFT OUTER JOIN patient 
                 ON doctor.doctor_id = patient.doctor_id
            GROUP BY doctor.doctor_id, doctor_name;
            ''', paxton)

![Write skew prevented](images/12.2.4.png)

In this case, the DBMS detects that the two transactions overlap and have interleaved effects. This leads to non-serialised transactions. The DBMS prevents Paxton's transaction from committing. If Paxton now restarts his transaction, it will truly be serialised after Gibson's completed.

#### End of Activity 2

## Cleanup: close connections
Close all connections to the next notebook to reset the database.

--------------------------------------------------------

In [None]:
paxton.close()
gibson.close()
tamblin.close()

# Summary
In this Notebook you have seen how PostgreSQL supports transaction isolation, and how different isolation levels allow different anomalies.