# SQL injection hacks

The aim of this Notebook is to show how a badly designed database security schema, coupled with poor management of user access to the database, can enable a malicious user to compromise your data security, or even obtain full access to your system.

You should spend around two hours on this notebook.

## Setup

This Part uses a number of notebooks:

1. `23.1  SQL injection hacks.ipynb`: This notebook, containing calls to the remaining notebooks to generate webforms and complete them.

2. `part_23_authentication_notebook.ipynb`: Contains the user name and password to connect to the database. 

3. `reset_form_server.ipynb`: Redefines and repopulates the tables that are accessed using the web forms.

4. `form_server.ipynb`: Creates a webform to access a subset of the hospital database.

5. `form_server_safe.ipynb`:  Creates a webform to access a subset of the hospital database which is safe from an injection attack.

For this week, we have put the authentication details in a separate notebook (`part_23_authentication_notebook.ipynb`) to emphasise that the injection attacks do not require the user to be logged in to the database.

The notebooks are designed so that `reset_form_server.ipynb`, `form_server.ipynb` and `form_server_safe.ipynb` replicate the code that might sit on some remote server, whereas `part_23_authentication_notebook.ipynb` represents the client process. The server notebooks each call `part_23_authentication_notebook.ipynb` so that they can interact with the database.

(imagine that the form connects to a server).

The database authentication code is in the following notebook:
```
part_23_authentication_notebook.ipynb
```
As in earlier parts using databases, before starting, you should edit this notebook so that it contains the appropriate authentic details for your environment.

After you have entered your authentication credentials into the authentication notebook, run the next cell to reset the tables that will be used in this notebook:

In [None]:
%run reset_form_server.ipynb

If this raises an error:
```
OperationalError: FATAL:  password authentication failed for user XXX
```
or
```
NameError: name 'DB_USER' is not defined
```
then go back to the notebook `part_23_authentication_notebook.ipynb` and check that you have set the authentication correctly, and that you have saved the notebook.

## A Simple Query Form

The weakest form of security is 'security by obscurity'. In this approach, you just hope that someone will not guess a query term that can return information that a user should not be privy to.

For example, suppose we create a simple form that allows a patient to enter their identifier and look up the name and identifier of their doctor.

The notebook `form_server.ipynb` contains a simple form such as might exist on a webserver, and a connection to a database to query the hospital records. Feel free to look at that notebook to see how the sql query is constructed and executed.

To open an instance of the form, run the notebook with the following cell (`%run <notebook>` will run the given notebook). You should see a web form asking you to enter a patient identifier.

In [None]:
%run form_server.ipynb

If you enter the value `p088` in the form, for example, you should find that a row of a table is returned which tells you that the patient with id p088 is called Boswell and has a responsible doctor called Paxton, whose id is d06.

How secure is that form? Do you think you could guess the pattern of any patient identifiers and so find out what their name is and who their doctor is?

## Try hacking a Simple Query Form

One of the ways in which SQL injection attacks can be used is to subvert queries that are made through web forms.

The previous form was not very secure, but it would take an attacker a long time to guess all the patient names and find out which doctor they were registered with.

Although the current version of the form is insensitive to case, it does not enforce any other checks around the text entered as part of the query.

So is there a way you can attack this form to get all the data out in one go?

One of the problems with the query we are running is that it does not guard against SQL injection attacks. A user can add additional SQL query code into their response that will then modify the query that is actually run on the database.

For example, can you come up with a query string that will list all the contents of the table. Or just the first three items?

### Activity 1

Try to come up with an input to the form which will allow you to find out the names, and responsible doctors, for all the patients in the database.

To get you started, the database query is constructed from the python code:

```python
    f'''SELECT *
        FROM form_view
        WHERE LOWER(patient_id) = '{input_id.lower()}';
      '''
```
where `input_id` is the value obtained from the input into the form.

(Note the use of the SQL function `LOWER` and the python string method `.lower()` to ensure that the form handles discrepancies in case.)

In [None]:
# Run this cell to give the form:

%run form_server.ipynb

 What value of `input_id` could allow all the contents of the table to be displayed? (Rerun the previous cell to try different values in the form.)

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

If we wanted to query the whole view `form_view`, we would want to send a command of the form:

```python
    f'''SELECT *
        FROM form_view
        WHERE <some condition>;
      '''
```
where `<some condition>` would always evaluate to True.

In this case, we could set up the `WHERE` clause of the query to be something like:
```sql
    WHERE LOWER(patient_id) = 'p008' OR 'A'='A'
```
Because `'A'='A'` is always True, the whole `WHERE` clause is True, and so the query will return all the rows in the view.

The key to this simple attack is first to terminate the text value requested (the first single quote) and to follow it by an `OR` and a condition that is always true - but omitting the final single quote.

So one possible malicious entry to the form could be:
```
    p008' OR 'A'='A
```
or even just:
```
    ' OR 'A'='A
```

More sophisticated attacks may also be possible, depending on the extent to which the psql function passing the constructed query to the database parses the input - a precaution that is recommended precisely to intercept this kind of attack.

#### End of Activity 1

-------------------------------

## To make things worse...

In the above example, we were "only" concerned with accessing data which should have been kept private. But if the database had been very carelessly set up, perhaps you could do even more damage...

### Activity 2

Try to come up with an input to the form which will drop the `form_view` view.

Remember that you can use the command:

```
%run reset_form_server.ipynb
```
to reset the database if you need to.

In [None]:
# Run this cell to give the form:

%run form_server.ipynb

 What input to the form could allow `form_view` to be dropped?

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

As with Activity 1, to carry out our malicious wish to drop `form_view`, we have to try to create some SQL which will result in the view being dropped. The easiest way to do this is to complete the `SELECT` query, and then add a separate `DROP VIEW` statement, to give something like:

If we wanted to query the whole view `form_view`, we would want to send a command of the form:

```sql
    SELECT *
    FROM form_view
    WHERE LOWER(patient_id)='p088' ;
    DROP VIEW form_view;
    --' ;
```
Note that we have used an SQL comment at the end of the commands: this lets us take care of the closing quote mark.

So taking the necessary parts out of our attacking command, we could force the view to be dropped by entering:
    `p088' ; DROP VIEW form_view; --`
into the form.

Run the form notebook, and try it:

In [None]:
%run form_server.ipynb

A successful hack is likely to generate an error, because the SQL completes with a statement rather than a query. However, you can test whether the view has been dropped by calling the form with any input. Try `p088` again:

In [None]:
%run form_server.ipynb

Entering this value into the form now should generate a sizable error message, which contains the explanation that:
```python
ProgrammingError: relation "form_view" does not exist
```

If you wanted to try corrupting the database further, the base tables of `form_view` are called `form_patient` and `form_doctor`; try querying and dropping those tables.

#### End of Activity 2

-------------------------------

As you have seen in Activities 1 and 2, the dangers of an SQL injection attack can go from a failure of data security, to the whole system being compromised.

In fact, if the account hosting the form were a superuser, a malicious attacker could go as far as creating their own user account with extensive privileges. For this reason, it is important to be clear about whether a given account actually requires all privileges. For example, in this case, it might be worth creating a further account to host the web form, which only has `READ` privileges on the database, rather than the ability to modify the data or the database structure. 

## Sanitising your inputs

As you have seen, using string formatting tools to pass information into a database query is a very dangerous practice. You leave yourself open to the danger of a malicious hacker being able to access private data or, worse, being able to penetrate your whole system.

Fortunately, most, if not all, languages for accessing databases provide mechanisms for ensuring that a user's input cannot be used in a way that the programmer intended. This is known as "sanitising" the input.

Before starting Activity 3, reset the tables in the form server:

In [None]:
%run reset_form_server.ipynb

### Activity 3

We have provided a second version of the form notebook which incorporates the form input safely into the underlying SQL query.

Use the following form to find the name and identifier of the doctor who is responsible for the patient with identifier p088. Then as in Activity 1, try to use the form to return the names, identifiers and responsible doctors of all patients in the hospital.

In [None]:
%run form_server_safe.ipynb

What happens when you try to inject some malicious code?

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

Following Activity 1, to try to access all the data in `form_view`, we might try the same input to the form:
```
    ' OR 'A'='A
```
However, if we now try to enter this into safe form, we should find that the form simply returns an empty table:

In [None]:
%run form_server_safe.ipynb

Similarly, you should find that you cannot use the techniques from Activity 2 to modify or drop the views or tables in the database.

#### End of Activity 3

-------------------------------

In Activity 2, you saw that the form generated by `form_server_safe.ipynb` behaved as we would hope for correct inputs, but did not allow the malicious entries to the form to create unwanted behaviour.

We recommend that you look at the code in the `form_server_safe.ipynb` file to see how the input is handled in context. The relevant call to the database is:
```python
pd.read_sql('''
    SELECT *
    FROM form_view
    WHERE LOWER(patient_id) = %s
    ''',
                  form_connection,
                  params=[input_id.lower()])
```
Note that rather than use string formatting to create the SQL query that we want, the query is constructed using a particular marker (`%s`) and the `params` parameter of the `pd.read_sql` function. The function `pd.read_sql` takes over the work of converting the input into a safe form, and protects from injection attacks.

The exact syntax is application dependent. So although in this case we have used `pandas.read_sql` to query the database, the syntax is determined by `psycopg2`, which is the underlying interface to postgresql. If we were using a different database, such as MySql, then the details of the query syntax would be different.

For psycopg2 and pandas, the relevant documentation is [here (for `psycopg2`)](https://www.psycopg.org/docs/sql.html) and [here (for `pandas.read_sql`)](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html). However, when working with applications that allow user input (not just SQL!), you should always consider the dangers of injection attacks, and find the relevant documentation for the environment that you are working in.

## What next?

SQL injection attacks are often made possible by incorrectly designed web forms that do not escape certain characters such as apostrophes.

In certain respects, you might think of such strings as a particularly dirty form of data, and one that can create serious side effects for your database if you don't clean it before using it in a SQL query.

You have now completed the Part 23 Notebooks.