# Concurrent Transactions

In PostgreSQL (and many other RDBMSs), _every_ SQL statement is implicitly treated as a transaction if a *transaction block* is not explictly defined. A group of statements is identified as a transaction block by wrapping them in `BEGIN TRANSACTION` (or `BEGIN`) and `COMMIT` or `ROLLBACK` commands respectively at the start and end of the block. This means that while you may write:

```SQL
    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';
```
the RDBMS will automatically convert that to
```SQL
    BEGIN;

    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';

    COMMIT;
```
In addition, the Python wrappers around the PostgreSQL connections add another layer of abstraction, by manipulating how the transactions start and stop.

According to the [PostgreSQL documentation](https://www.postgresql.org/docs/9.1/transaction-iso.html):

> The SQL standard defines four levels of transaction isolation. The most strict is Serializable, which is defined by the standard in a paragraph which says that any concurrent execution of a set of Serializable transactions is guaranteed to produce the same effect as running them one at a time in some order.

We will use this level of transaction isolation throughout this notebook. (We will look at the other isolation levels in the next notebook, 12.2 Transaction anomalies.)

The notebooks in this Part focus on exploring how transactions work at a detailed level of operation. As such, we will take a more controlling approach to database interaction than in previous Parts, in which we relied on using the `%%sql` `ipython-sql` block magic or `pandas.read_sql_query()` command.

Instead, we'll be using the same `psycopg2` Python package that the magic and *pandas* commands are built upon. This will involve working with it at more a basic level. This will give us the control we need over when transactions actually start and stop to help us to understand how they really work.

### The `psycopg2` Python Package


The fundamental concept in `psycopg2` is the **connection**. Commands to the database flow along the connection, and the results come back along the connection.

Transactions in `psycopg2` happen at the connection level:

- when a transaction starts, it starts on a particular connection;
- once a connection has started a transaction, subsequent commands on that connection are part of the same transaction.

The connection is responsible for finishing the transaction with either a `COMMIT` or `ROLLBACK`. A connection can only have zero or one transactions at any time.



To further complicate things, a connection can have several **cursors**. Typically, a cursor exists to keep track of the current position in a set of results, so that the database client can use the cursor to read result rows back at a pace that suits the client. But in `psycopg2`, the cursor is the object which sends commands to the database.

Cursors always exist in the context of a particular connection. In `psycopg2`, cursors can be created and destroyed. But the creation and destruction of a cursor has no effect on the status of any _transaction_ : the status of a transaction is purely a property of the _connection_. 

What this means is that, whereas a piece of SQL might look like this:

```SQL
    BEGIN;

    SELECT patient_id, patient_name, weight_kg 
    FROM patient
        WHERE patient_id = 'p001';

    UPDATE patient
    SET weight_kg = 200.0 
        WHERE patient_id = 'p001';

    COMMIT;
```

using the `psycopg2` package also means that we need to wrap the SQL code in Python so that it looks like this:

```python
    with my_connection.cursor() as c:
    
        c.execute('BEGIN;')

        c.execute('''
                  SELECT patient_id, patient_name, weight_kg 
                  FROM patient 
                  WHERE patient_id = 'p001';
                  ''')

        c.execute('''
                  UPDATE patient 
                  SET weight_kg = 75.0 
                      WHERE patient_id = 'p001';
                  ''')

        my_connection.commit()
```
This allows us to make it explicit which connection is making the update to the database.

(Remember that the triple quotes, <code>'''</code> are used for multiple-line strings.)

## When things go wrong

In order to carry out the notebooks for part 12, it is important that you do not have any other notebooks running which might be making their own calls on the database. It is quite likely that you may still have some open database connections from other notebooks. Check the [Jupyter Running](/tree#running) tab to see if any week 11 or week 12 notebooks other than this one are running, and if so, shut them down.

At some point, when you're working through this notebook, you will probably end up in a real muddle. 

Don't despair.

If you need to reset the database, first close all the existing connections:

At this point, you should be able to simply rerun the cells in the section **Setting up**, which will call the reset script. However, in a worst case scenario, you can consider resetting your server, as described in the software guide.

## Setting up

The next group of cells set up your database connection, and reset the database to a clean state. Check notebook *08.1 Data Definition Language in SQL* if you are unsure what the next cells do.

As usual, you will need to set the values of the variables `DB_USER` and `DB_PWD`, depending on which environment you are using.

In [None]:
# Set up the PostgreSQL environment

%run sql_init.ipynb
print("Connecting with connection string : {}".format(DB_CONNECTION))
%sql $DB_CONNECTION

In [None]:
%run reset_databases.ipynb

## Connecting to the database: multiple connections

First, we need to import the `psycopg2` library so that we can define connections to the database:

In [None]:
import psycopg2 as pg
import psycopg2.extensions as pge

In this notebook, we will go back to using the hospital database that we looked at in Parts 8 to 11, and we will reuse the `hospital` schema that we used in Part 9. We can tell specify what schema to use by using the option `options="-c search_path=hospital"` in the `psycopg2.connect` function calls.

In Parts 8 to 11, we created a single connection to the database, which we used to set up the sql magic. In part 12, we need to have multiple concurrent connections to the database, to show how transactions work in a concurrent multiuser environment. Because it's very easy to get confused when dealing with several different "users" (i.e. connections), we'll give the concurrent connection doctor names and pictures.

### ![Gibson](images/gibson.png) Dr Gibson

In [None]:
gibson = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use

gibson.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

gibson.autocommit = False

### ![Paxton](images/paxton.png) Dr Paxton

In [None]:
paxton = pg.connect(dbname=DB_USER,     # the name of the database
                    host='localhost',   # the host on which the database engine is running
                    user=DB_USER,       # id of the user who is logging in
                    password=DB_PWD,    # the user's password
                    port=5432,          # the port on which the database engine is listening
                    options="-c search_path=hospital")  # the schema to use


paxton.isolation_level = pge.ISOLATION_LEVEL_SERIALIZABLE

paxton.autocommit = False

Note that in both these cases, we have set the `autocommit` flag to `False`. This means that we always need to explicitly tell the transaction to complete with a `commit` statement.

Also, we set the isolation level using `pge.ISOLATION_LEVEL_SERIALIZABLE`: this means that both connections have an isolation level of serializable, as described in the opening section of the notebook.

#### `transaction_status` helper function

Throughout this notebook, it will be useful to see the transaction status of a particular connection.

The connection objects have a method defined on them, `get_transaction_status()`, which returns an integer reflecting the transaction status of the current connection. So far, we have created connections called `gibson` and `paxton`, but we have not started any transactions using them. 

We can call the `get_transaction_status()` method on `gibson`:

In [None]:
gibson.get_transaction_status()

The value `0` in this case represents the connection `gibson` being in an idle state. We can see that the `paxton` connection is also in an idle state in the same way:

In [None]:
paxton.get_transaction_status()

The meaning of each of the return values for `.get_transaction_status()` can be found in [the relevant section of the psycopg2 documentation](http://initd.org/psycopg/docs/extensions.html#transaction-status-constants), and we have also given them here:

| value | meaning |
|-------|---------|
|psycopg2.extensions.TRANSACTION_STATUS_IDLE|The session is idle and there is no current transaction.|
|psycopg2.extensions.TRANSACTION_STATUS_ACTIVE|A command is currently in progress.|
|psycopg2.extensions.TRANSACTION_STATUS_INTRANS|The session is idle in a valid transaction block.|
|psycopg2.extensions.TRANSACTION_STATUS_INERROR|The session is idle in a failed transaction block.|
|psycopg2.extensions.TRANSACTION_STATUS_UNKNOWN|Reported if the connection with the server is bad.|

(Each of the expressions `psycopg2.extensions.TRANSACTION_STATUS_XXX` maps onto one of the integers returned by the `.get_transaction_status()` method).

To make it easier to check on the transaction status of a connection, we can define a short function, `transaction_status`, which returns the string describing the transaction status of the given connection:

In [None]:
def transaction_status(conn):
    '''
    Return a string showing the transaction status of the
    given connection, conn.
    '''
    transaction_status_dict={
        pg.extensions.TRANSACTION_STATUS_IDLE:"The session is idle and there is no current transaction.",
        pg.extensions.TRANSACTION_STATUS_ACTIVE:"A command is currently in progress.",
        pg.extensions.TRANSACTION_STATUS_INTRANS:"The session is idle in a valid transaction block.",
        pg.extensions.TRANSACTION_STATUS_INERROR:"The session is idle in a failed transaction block.",
        pg.extensions.TRANSACTION_STATUS_UNKNOWN:"Reported if the connection with the server is bad."
    }
    return transaction_status_dict[conn.get_transaction_status()]
    

We will keep calling the `transaction_status` function throughout this notebook, to make it explicit what's going on with the transaction states.

For example, we can see the transaction states of the connections `gibson` and `paxton`:

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

## Transaction processing: Isolated updates

In this example, you'll see how the effects of an update transaction are isolated from other users while that transaction is still uncommitted.

Imagine that Dr Gibson and Dr Paxton are both looking at the weight of the patient with identifier `p001`. The following steps will take place:

1. Gibson looks at `p001`'s weight. 

2. Paxton updates `p001`'s weight, but does not commit the change.

3. Before Paxton commits the change, Gibson looks at `p001`'s weight again. If Paxton's transaction is isolated, Gibson shouldn't see the change-in-progress.

4. Paxton commits the change

5. Gibson looks at `p001`'s weight again. Now that the change has been committed, Gibson should now see the updated value.

![Isolated transactions](images/12.1.1.png)

We will now work through these steps, making the calls from Gibson and Paxton's connections, as appropriate.

![Gibson](images/gibson.png) **Step 1:** Gibson looks at `p001`'s weight:

In [None]:
with gibson.cursor() as gibson_cursor:

    gibson_cursor.execute('BEGIN;')

    df= pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)

df

Gibson sees the weight as 71.6kg.

Now, let's check Gibson's transaction status:

In [None]:
transaction_status(gibson)

This shows that Gibson has now entered a transaction block. Note that even just querying the database opens a transaction: a transaction does not have to make alterations to the contents of a database to be considered a transaction. To close the transaction, we need to call commit (even though Gibson made no changes to be committed):

In [None]:
gibson.commit()

And we can see that Gibson is no longer in a transaction:


In [None]:
transaction_status(gibson)


![Paxton](images/paxton.png) **Step 2:** Paxton updates `p001`'s weight, but does not commit the change.


Paxton weighs `p001` and finds that the record needs to be updated so that the patient's weight 75kg. Paxton starts a new transaction with `BEGIN` and makes the change:

In [None]:
with paxton.cursor() as paxton_cursor:
    paxton_cursor.execute("BEGIN;")
    
    paxton_cursor.execute('''
        UPDATE patient 
        SET weight_kg = 75.0 
            WHERE patient_id = 'p001';
        ''')

Note that this `UPDATE` transaction has not yet been committed.

We can now look at the transaction status of the connection `paxton`:

In [None]:
transaction_status(paxton)

The message shows that although the session is idle (the update has been completed), the connection has not yet completed the transaction.

Now, while within the transaction, Paxton can see the changed weight in the database. Let's see what value the `paxton` connection currently thinks is in the `weight_kg` column for the patient with id `p001`:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', paxton)


`paxton` sees the value as 75kg.

What about the `gibson` connection?

![Gibson](images/gibson.png) **Step 3:** Gibson takes another look at `p001`'s weight in the database:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)


So Gibson still sees the original 71.6kg. This what we would hope, as Paxton's transaction (including the update) has not yet been committed.

This is a crucially important aspect of transaction processing! Gibson and Paxton received *different results* for the same query because of the transaction status of their different connections.

We can see that the connections `paxton` and `gibson` are still in ongoing transactions:

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

This is what we expect, as Paxton's transaction (including the update) hasn't completed yet, and Gibson opened a new transaction when she queried the database.

![Paxton](images/paxton.png) **Step 4:** Paxton now commits the change to `p001`'s record.

In [None]:
paxton.commit()

In [None]:
transaction_status(paxton)

Paxton is no longer involved with a transaction, and so his update is committed. 

![Gibson](images/gibson.png) Now Paxton's transaction has been committed, what does Gibson see? Well remember that having made a `SELECT` query, the connection `gibson` is now involved in a transaction:

In [None]:
transaction_status(gibson)

So if a query is made with the connection `gibson`, then `paxton`'s update is not yet visible:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)

Gibson (should) still see `p001`'s weight as 71.6 kg.

However, if we now complete `gibson`'s transaction with `commit`:

In [None]:
gibson.commit()

then we see that `gibson` is no longer in a transaction:

In [None]:
transaction_status(gibson)

If a query is made using the connection `gibson`, then the update should now be visible:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)

To tidy up, we need to commit `gibson`'s transaction (which makes no changes to the database, as it only made queries):

In [None]:
gibson.commit()

and at this point, we can see that there are no open transactions:

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

### Activity 1


Carry out the above sequence of actions again, but this time have Paxton set `p001`'s weight to 200kg, and then, having realised that this was a mistake, roll back the transaction. What weight do doctors Gibson and Paxton see before, during, and after this transaction?

This is the summary of what should happen:

![Rolled back transaction](images/12.1.2.png)

As a series of steps:

1. Gibson looks at `p001`'s weight. 

2. Paxton updates `p001`'s weight to 200kg, but does not commit the change.

3. Before Paxton commits the change, Gibson looks at `p001`'s weight again. If Paxton's transaction is isolated, Gibson shouldn't see the change-in-progress.

4. Paxton **rolls back** the change

5. Gibson looks at `p001`'s weight again.

What do Gibson and Paxton see at this point?

To carry out this activity, you should call `paxton.rollback()` rather than `paxton.commit()` to see the effect of rolling back the transaction.

In your solution, you will need several cells as you step through the stages of Paxton's update, while checking what Gibson can see at each stage. Remember to use `transaction_status()` to see what state the transactions are in.

In [None]:
# Write your code in this cell (adding more if necessary)

#### Our solution

To reveal our solution, click on the triangle symbol on the left-hand end of this cell.

![Gibson](images/gibson.png) **Step 1:** Gibson looks at `p001`'s weight:

In [None]:
with gibson.cursor() as gibson_cursor:

    gibson_cursor.execute('BEGIN;')
    
    df= pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)

    gibson.commit()

df

Gibson should see the weight as 75.0kg.

We have closed Gibson's transaction with `commit`, so let's check Gibson's transaction status:

In [None]:
transaction_status(gibson)

This should shows that Gibson is not in a transaction block.


![Paxton](images/paxton.png) **Step 2:** Paxton updates `p001`'s weight in the database to 200kg. Paxton can see the changed weight in the database. Note that this update transaction isn't yet committed. 

In [None]:
with paxton.cursor() as paxton_cursor:
    paxton_cursor.execute("BEGIN;")
    
    paxton_cursor.execute('''
        UPDATE patient 
        SET weight_kg = 200
            WHERE patient_id = 'p001';
        ''')

We can now look at the transaction status of the connection `paxton`:

In [None]:
transaction_status(paxton)

The message shows that although the session is idle (the update has been completed), the connection has not yet completed the transaction.

We can see what value the `paxton` connection currently thinks is in the `weight_kg` column for the patient with id `p001`:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', paxton)


Paxton sees the value as 200kg.

Now, what about Gibson?

![Gibson](images/gibson.png) **Step 3:** Gibson takes another look at `p001`'s weight in the database:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)


Gibson still sees the original 75kg, as we would hope: Paxton's transaction (including the update) has not yet been committed.

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

Paxton is still inside his transaction.

![Paxton](images/paxton.png) **Step 4:** Paxton now realises that the update to 200kg was incorrect. Therefore, rather than calling `commit()` to commit the change, call `rollback()` to return the database to the state it was in before the transaction started:

In [None]:
paxton.rollback()

Having rolled back the transaction, we should now find that Paxton is no longer in a transaction:

In [None]:
transaction_status(paxton)

`paxton` is no longer involved with a transaction, and so his update is discarded.

![Gibson](images/gibson.png) Now Paxton's transaction has been committed, what does Gibson see? Well remember that having made a `SELECT` query, the connection `gibson` is now involved in a transaction:

In [None]:
transaction_status(gibson)

If we now complete `gibson`'s transaction with `commit`:

In [None]:
gibson.commit()

Then we see that `gibson` is no longer in a transaction:

In [None]:
transaction_status(gibson)

and if a query is made using the connection `gibson`, then the rolled back update should not be visible:

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)

To tidy up, we will commit Gibson's (read-only) transaction:

In [None]:
gibson.commit()

and at this point, we can see that there are no open transactions:

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

So in this case, the database handled the update so that the value of 200kg was never visible to Gibson.

#### End of Activity 1

--------------------------------------------------------

## Transaction processing: Preventing lost updates

Suppose that a message has gone around the hospital that the scales were all off, reading 10kg too high. That means `p001`'s weight should have been recorded as 65kg, not 75kg. 

Both Paxton and Gibson notice that `p001`'s weight needs to be adjusted, so they both decide to be good citizens and make the change.

However, due to a miscommunication, Paxton misunderstands the problem, and thinks that he needs to adjust `p001`'s weight to be 10kg *more* than what he previously recorded.

In both of these cases, because the weight is being modified, the read of the current weight and the writing of the modified weight will be done in the same transaction.



The following steps take place:

1. Gibson looks at `p001`'s weight. 

2. Paxton looks at `p001`'s weight. 

3. Gibson updates `p001`'s weight to 65kg, and commits the change.

4. Paxton updates `p001`'s weight to 85kg.

What happens when Paxton attempts to update the database in conflict with Gibson's committed change?

Here is an illustration of the process:

![Preventing lost updates](images/12.1.3.png)

Note that in this case, we _want_ Paxton's update to fail. We don't want Gibson's update to be overwritten and lost before Paxton has a chance to see the new weight recorded in the database.

![Gibson](images/gibson.png) **Step 1:** Gibson starts a transaction and looks at `p001`'s weight.

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', gibson)

Gibson has now entered a transaction:


In [None]:
transaction_status(gibson)

![Paxton](images/paxton.png) **Step 2**: Paxton also starts a transaction and looks at `p001`'s weight, while Gibson is doing the update.

In [None]:
pd.read_sql_query('''
        SELECT patient_id, patient_name, weight_kg
        FROM patient
        WHERE patient_id = 'p001';
        ''', paxton)

Now, both doctors are in the middle of transactions.

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

![Gibson](images/gibson.png) **Step 3:** Gibson calculates what `p001`'s weight should have been recorded as (65kg), updates the record, then commits the transaction. Gibson's subsequent query shows the record has been updated.

In [None]:
with gibson.cursor() as gibson_cursor:
    
    gibson_cursor.execute('''
        UPDATE patient 
        SET weight_kg = 65
            WHERE patient_id = 'p001';
        ''')
    
    gibson.commit()

In [None]:
pd.read_sql_query('''
    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';
    ''', gibson)

Complete with a commit so that `gibson` is no longer in a transaction:

In [None]:
gibson.commit()

In [None]:
print(f"{'Gibson'} : {transaction_status(gibson)}")
print(f"{'Paxton'} : {transaction_status(paxton)}")

Although Gibson's transaction has been committed, Paxton is still in a transaction, and so does not see the change. Paxton still sees the weight of the patient as 75kg:

In [None]:
pd.read_sql_query('''
    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';
    ''', paxton)

![Paxton](images/paxton.png) **Step 4:** Paxton now erroneously attempts to update the weight of the patient to 85kg:

In [None]:
with paxton.cursor() as paxton_cursor:

    paxton_cursor.execute('''
        UPDATE patient
        SET weight_kg = 85.0
            WHERE patient_id = 'p001';
        ''', paxton)

The attempt to make the update should have raised an error `TransactionRollbackError: could not serialize access due to concurrent update`. PostgreSQL has prevented the change being made: the record has already been locked by Gibson.

![Paxton](images/paxton.png) If Paxton now tries to see what's going on with Thornton, PostgreSQL tells him that he needs to abort his transaction to revert to a consistent view of the database:

In [None]:
pd.read_sql_query('''
    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';
    ''', paxton)

![Paxton](images/paxton.png)

Paxton therefore rolls back his update transaction. He can now see the current contents of the database, after Gibson's update is applied.

In [None]:
paxton.rollback()

The table should now contain the value of 65 in the `weight_kg` column for both gibson and paxton:

In [None]:
pd.read_sql_query('''
    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';
    ''', paxton)

In [None]:
pd.read_sql_query('''
    SELECT patient_id, patient_name, weight_kg 
    FROM patient 
    WHERE patient_id = 'p001';
    ''', gibson)

## Cleanup: close connections
Close all connections to the next notebook to reset the database.

In [None]:
paxton.close()
gibson.close()

## Summary

In this Notebook you have seen how PostgreSQL supports transactions as **atomic** units of work which are **isolated** from each other. Where transactions interfere, PostgreSQL detects this and forces rollback of some transactions: this ensures **consistency**. We don't explore how transactions are **durable** in this notebook.

## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to [`12.2 Transaction anomalies`](12.2%20Transaction%20anomalies.ipynb).