# Writing SQL Queries in Python: The Basics

We started you off by having you write SQL queries in PostgreSQL's interactive shell (`psql`). Which is cool, but you're probably wondering: "Hey how do I actually get information from a SQL database into my Python programs? That's where I can do some real damage."

## Python PostgreSQL libraries

In order to connect to a SQL database and make queries in Python, we need to use a *database adapter library*. Such libraries provide the functionality needed to connect to a database, send SQL statements, and receive responses, taking all necessary steps to ensure that (e.g.) SQL types are converted safely to Python types, etc.

Each RDBMS has several different adapter libraries. For this tutorial, we're going to use a library called [pg8000](https://github.com/mfenniak/pg8000). There are a number of other PostgreSQL adapter libraries; the reason we're using pg8000 is that it's easy to install, as it has no external dependencies. [psycopg2](http://initd.org/psycopg/docs/index.html) is another popular library for accessing PostgreSQL in Python. It's harder to install but offers significant performance advantages over pg8000. The good news is that nearly *all* of the database adapter libraries in Python conform to the [DB-API specification](https://www.python.org/dev/peps/pep-0249/), so once you've learned one, you should be able to switch over to the others with relative ease.

This library assumes that you've completed the tutorial in the [SQL introduction](SQL_notes.md) notes, and that you have a PostgreSQL server with a copy of the [MONDIAL database](http://www.dbis.informatik.uni-goettingen.de/Mondial/) running on your computer on the default port.

## Installing pg8000

The pg8000 library can be installed with `pip`. (If you're using a version of Python that doesn't include `pip`, make sure to [install pip first](https://pip.pypa.io/en/stable/installing/). You may want to create a virtual environment first, to isolate different projects from upgrades and uninstalls.) In OSX and other UNIX-like operating systems, type the following on the command line:

    pip3 install pg8000
    
(This should also work if you've installed Python on Windows using the instructions in class. If you're using a different Python packaging system, you're on your own!)

## Connecting to a database with pg8000

When using a SQL server from Python, you'll be working with several different kinds of objects:

* A *connection object*, which gives you access to the server; and
* *Cursor objects*, which you use to make SQL queries and retrieve data  returned from those queries.

To create a connection object, call `pg8000`'s `connect()` function:

In [3]:
import pg8000
conn = pg8000.connect(database="mondial")
print(type(conn))

<class 'pg8000.core.Connection'>


The `connect()` function takes a number of named parameters. The only one we're using here is `database`, which specifies which database to connect to. If we were attempting to connect to a PostgreSQL server on someone else's machine, we might need to use the `host` and `port` parameters. [Consult the documentation](http://pythonhosted.org/pg8000/dbapi.html#pg8000.connect) for more information.

## Making a query

Now that we have an open connection, let's attempt to query the database. To perform a query, we first need a cursor object, which we can create using the connection object's `.cursor()` method:

In [4]:
cursor = conn.cursor()
print(type(cursor))

<class 'pg8000.core.Cursor'>


The cursor object has several methods of interest to us. The first and most important is `.execute()`, which takes a SQL statement (in a Python string) as a parameter:

In [5]:
cursor.execute("SELECT name, length FROM river WHERE length > 4000")

The `.execute()` performs the query, but doesn't evaluate to anything. After calling `.execute()`, you can call the cursor's `.fetchone()` method to get the first row returned from the query:

In [6]:
cursor.fetchone()

['Irtysch', Decimal('4248')]

Subsequent requests to `.fetchone()` will return subsequent rows:

In [7]:
cursor.fetchone()

['Jenissej', Decimal('4092')]

To retrieve all of the rows returned from a query, you can use the cursor object in a `for` loop, like so:

In [9]:
cursor.execute("SELECT name, length FROM river WHERE length > 4000")
for row in cursor:
    print(row)

['Irtysch', Decimal('4248')]
['Jenissej', Decimal('4092')]
['Lena', Decimal('4400')]
['Hwangho', Decimal('4845')]
['Jangtse', Decimal('6380')]
['Mekong', Decimal('4350')]
['Missouri', Decimal('4130')]
['Niger', Decimal('4184')]
['Zaire', Decimal('4374')]


Calling the `.fetchone()` method, or iterating over the cursor object, yields a series of lists, with one element in the tuple per field requested in the query.

## Interpolating values in a query

Let's say that we're starting with data from some other source, say, a list of cities whose population we're interested in.

In [10]:
cities_of_interest = ['Paris', 'Nairobi', 'Buenos Aires', 'Kyoto']

Now we want to perform queries on the MONDIAL database to get the population for each of these cities. Somehow, we need to *build* a series of SQL queries in Python that include the names in the list.

You might think that you could simply do something like this:

In [11]:
query = "SELECT population FROM city WHERE name = '" + cities_of_interest[0] + "'";
print(query)

SELECT population FROM city WHERE name = 'Paris'


This looks good, until you have a name with problematic punctuation:

In [12]:
problematic_city = "Martha's Vineyard"
query = "SELECT population FROM city WHERE name = '" + problematic_city + "'"
print(query)

SELECT population FROM city WHERE name = 'Martha's Vineyard'


See the trouble? The apostrophe in the name of the city made its way into our query string. This query would be a syntax error in SQL, since SQL will believe the string to have ended at the apostrophe in `Martha's`. Troublesome!

To solve this problem, the cursor object's `.execute()` method comes with a built-in means of interpolating values into queries. Simply put `%s` in your query string wherever you want to insert a value, and then pass as a second parameter to `.execute()` a list of values that you want to be included in the query:

In [13]:
cursor.execute("SELECT population FROM city WHERE name = %s",
              ["Martha's Vineyard"])

pg8000 will take care of the nasty business of quoting your string for you, and you'll be protected from [SQL injection attacks](https://en.wikipedia.org/wiki/SQL_injection).

Here's a complete example, iterating over a list of cities and getting the population for each one:

In [15]:
for city_name in cities_of_interest:
    cursor.execute("SELECT population FROM city WHERE name = %s",
                   [city_name])
    population = cursor.fetchone()[0] # fetchone() returns a tuple w/1 val
    print(city_name, population)

Paris 2249975
Nairobi 3133518
Buenos Aires 2768772
Kyoto 1382113


## Example: Percentages to population

Let's do a quick example of using Python to perform calculations on results from SQL queries performed in Python. Here's what we'll do: given a particular country and a religion in that country, find the *number of adherents* of that religion in the country.

> There's nothing in this example that you couldn't do with a sophisticated SQL query. But there's nothing wrong with solving a problem like this in Python! It might not end up being as fast or efficient as doing it all in SQL, but sometimes it's helpful to work through these problems as a series of discrete steps rather than as a monolithic query.

First, we'll set two variables for the country and religion:

In [25]:
country = "United States"
religion = "Muslim"

Now, we'll select the relevant data from the `country` table.

In [35]:
query = "SELECT code, population FROM country WHERE name = %s"
cursor.execute(query, [country])
country_info = cursor.fetchone()

At this point, `country_info` will be a list of values corresponding to the columns in the table:

In [36]:
print(country_info)

['USA', Decimal('318857056')]


Let's break that out into separate variables, for clarity:

In [37]:
country_code = country_info[0]
country_pop = country_info[1]

What is that weird `Decimal` thing anyway? Well, it's a special type of Python object that can represent numerical data with the same precision as the SQL `numeric` type. It looks weird when you `print` it, but it actually behaves just like a regular number in other contexts:

In [33]:
print(country_info[2] * 2)

637714112


Now that we have the data for the country, let's find all of the religion records for that country.

In [56]:
query = "SELECT percentage FROM religion WHERE country = %s AND name = %s"
cursor.execute(query, [country_code, religion])

This query is a bit different from the others in that we've used *two* placeholders in the query instead of just one. This means that we also need to include two expressions inside the square brackets in the `.execute()` method.

In [57]:
religion_info = cursor.fetchone()

Now the variable `religion_info` contains a list with a single element: the percentage of adherents of that religion in the given country. Let's store that in a separate variable:

In [63]:
percentage = religion_info[0] / 100
print(percentage)

0.006


Now, we'll get the number of adherents by multiplying the population of the country by the religion percentage:

In [65]:
print(percentage * country_pop)

1913142.336


That's nearly two million Muslims living in the USA! Let's put the whole thing into a function:

In [71]:
def religion_pop(country, religion):
    query = "SELECT code, population FROM country WHERE name = %s"
    cursor.execute(query, [country])
    country_info = cursor.fetchone()
    country_code = country_info[0]
    country_pop = country_info[1]
    query = "SELECT percentage FROM religion WHERE country = %s AND name = %s"
    cursor.execute(query, [country_code, religion])
    religion_info = cursor.fetchone()
    percentage = religion_info[0] / 100
    return percentage * country_pop

This function takes a country and a religion as parameters, and returns the number of adherents of the religion in that country:

In [73]:
print(religion_pop("Canada", "Roman Catholic"))
print(religion_pop("United Kingdom", "Hindu"))
print(religion_pop("Lithuania", "Protestant"))

14977437.504
641056.54
56606.890


## Coping with errors

pg8000 is very persnickety about errors. If you have a syntax error, like so:

In [17]:
cursor = conn.cursor()
cursor.execute("SMELLECT * FORM cheese WERE stink > 15 ODOR DESC")

ProgrammingError: ('ERROR', '42601', 'syntax error at or near "SMELLECT"', '1', 'scan.l', '1053', 'scanner_yyerror', '', '')

... you'll get a syntax error as expected. But then subsequent attempts to use the cursor will frustratingly fail:

In [18]:
cursor.execute("SELECT population FROM city WHERE name = 'Paris'")

ProgrammingError: ('ERROR', '25P02', 'current transaction is aborted, commands ignored until end of transaction block', 'postgres.c', '1283', 'exec_parse_message', '', '')

The way to fix this problem is to close the connection and re-open it, or simply call the connection object's `rollback` method:

In [49]:
conn.rollback()

Now your queries can proceed as planned:

In [20]:
cursor.execute("SELECT population FROM city WHERE name = 'Paris'")
cursor.fetchone()

[Decimal('2249975')]

## More information

For more information, consult [pg8000's documentation](http://pythonhosted.org/pg8000/).