# Writing SQL Queries in Python

It's finally time (just one week in) to liberate yourself from running SQL in the shell! 
Now we will move into the peaceful iPython Notebook where we will happily play with our data for the next month.

## Python PostgreSQL libraries

To connect from Python to the PostgreSQL database you have installed on your computer, we need a library (database adapter library). For this class we will use [pg8000](https://github.com/tlocke/pg8000). It is written entirely in Python, and it is easy to install (fingers crossed!). 

The other library many people use, [psycopg2](https://pynative.com/python-postgresql-tutorial/), is much faster(http://blog.fizyk.net.pl/blog/sqlalchemy-speed-tests-on-postgres-and-mysql.html) but tricky to install--so we will stick with the basics. If you find you need greater performance in the future, you will find the coding syntax quite similar. Both of these packages follow the same basic syntax based on the [Python Database API](https://www.python.org/dev/peps/pep-0249/)


## Installing pg8000

First make sure you have `pip` installed on computer! Then write:

    pip install pg8000
    
(Hopefully this will do the trick. Use sudo if that's what you do.)

## Connecting to a database with pg8000

When using a SQL server from Python, two main objects are necessary to make the connection and to send queries and commands:

1. The *connection Object*, pg8000.connect(), which connects you to the database and server. You can pass a few parameters inside the connect() function, including the name of the database, user, password and others. (See documentation here)[https://github.com/tlocke/pg8000#api-docs]

2. *cursor objects*, which you use to make SQL queries and retrieve data returned from those queries.

To create a connection object, call `pg8000`'s `connect()` function:

In [9]:
import pg8000.dbapi
##########for older versions##########
#for older versions:
#import pg8000
#conn = pg8000.connect(database="mondial")

##########connecting##########
#You may need to specify the Username or More!!!
#Here are all the possible parameters, you may need a few

#pg8000.dbapi.connect(user, host='localhost', unix_sock=None, port=5432, database=None, password=None, ssl=None, timeout=None, application_name=None, tcp_keepalive=True)
conn = pg8000.dbapi.connect(database="mondial", user="rachelp")
print(type(conn))

<class 'pg8000.dbapi.Connection'>


## Making a query

Now that we've connected, you need create a cursor() object through which we will send commands. (Imagine this as an actual blinking cursor on page that you will type into. We are using the connection object's `.cursor()` method:

In [10]:
cursor = conn.cursor()
print(type(cursor))

<class 'pg8000.dbapi.Cursor'>


The cursor object has several methods of interest to us. You can find all of the properties of the cursor method on the [documentation page](http://pythonhosted.org/pg8000/dbapi.html). The first and most important is `.execute()`, which takes a SQL statement (in a Python string) as a parameter:

In [11]:
#Highest mountains
conn.rollback()
cursor.execute("SELECT * FROM mountain;")

The `.execute()` performs the query, but doesn't return any values. After calling `.execute()`, you can call the cursor's `.fetchone()`, `.fetchmany()` or `.fetchall()`  methods to get rows returned from the query:

In [12]:
#This gets one row, if you run again it gets the next row
cursor.fetchone()

['Gunnbjørn Fjeld', None, Decimal('3694'), None, '(68.92,-29.9)']

In [None]:
#This gets the next 8 rows
cursor.fetchmany(8)

In [None]:
#This gets the rest of the rows (or a all of the rows if you call it first)
cursor.fetchall()

To get all of the rows returned from a query, you can also use the cursor object in a `for` loop, like so:

In [1]:
myquery = "SELECT name, max(elevation) FROM mountain GROUP BY name ORDER BY max(elevation) DESC NULLS LAST;"
cursor.execute(myquery)
for row in cursor:
    print(row)

NameError: name 'cursor' is not defined

In [None]:
#Same thing but an extra variable
myquery = "SELECT name, max(elevation) FROM mountain GROUP BY name ORDER BY max(elevation) DESC NULLS LAST;"
cursor.execute(myquery)
myresult = cursor.fetchall()
for row in myresult:
    print(row)

In [None]:
#Same thing but with space for a big query
myquery = '''
SELECT name, max(elevation) 
FROM mountain 
GROUP BY name 
ORDER BY max(elevation) DESC NULLS LAST;
'''
cursor.execute(myquery)
myresult = cursor.fetchall()
for row in myresult:
    print(row)


The `.fetchone()` method gets you a LIST for the results of each row (each row is one element in a TUPLE of LISTS returned by the cursor object.)

## Flexible Queries
Now that we are in Python we can be more flexible about what goes in and out of our queries. For example, if we are getting names of mountains inputted by the users of our website, we can set up a query to get information about the mountains from the database.

We could build our query as a simple variable (but you will see we don't want to):

In [None]:
userwants = "Kilimanjaro"
badquery = "SELECT elevation FROM mountain WHERE name = '" + userwants + "'";
print(badquery)

In [None]:
cursor.execute(badquery)
cursor.fetchone()

So why did I name that variable "badquery"? A couple reasons. First, if there is a mountain ' in its name, like "Pica d'Estats" the query would break.

In [None]:
bad_mountain = "Pica d'Estats"
query = "SELECT elevation FROM mountain WHERE name = '" + bad_mountain + "'"
print(query)

See the **unmatching single quotation marks** there?

For this reason and others, the cursor object's `.execute()` method comes with a built-in means of interpolating values into queries. Use `%s` as wildcards in your query string wherever you want to insert a value, and then pass as a second parameter to `.execute()` via list of values that you want to be included in the query:

In [None]:
cursor.execute("SELECT elevation FROM mountain WHERE name = %s",
              ["Pica d'Estats"])
cursor.fetchone()

pg8000 deals with interpolating to strings properly, and protects you from [SQL injection attacks](https://en.wikipedia.org/wiki/SQL_injection).

Here's an example looping through a list of mountains to get their elevation:

In [None]:
user_wants_mount = ["Pica d'Estats","Kilimanjaro","Ararat","Mont Blanc"]
for mt_name in user_wants_mount:
    cursor.execute("SELECT elevation FROM mountain WHERE name = %s",
                   [mt_name])
    height = cursor.fetchone() # fetchone() returns a tuple w/1 val
    print(mt_name, height)

##  errors in pg8000

Simple errors can break your connection to the database.

In [None]:
cursor = conn.cursor()
cursor.execute("SELECTT elevation FROM mountain WHERE name = 'Ararat'")

... you'll get a syntax error. But also you will keep getting errors even when you fix the problem:

In [None]:
cursor = conn.cursor()
cursor.execute("SELECT elevation FROM mountain WHERE name = 'Ararat'")
cursor.fetchone()

The way to fix this problem is to close the connection and re-open it, by calling the connection object's `rollback` method:

In [None]:
conn.rollback()

Now your queries can proceed as planned:

In [None]:
cursor.execute("SELECT elevation FROM mountain WHERE name = 'Ararat'")
cursor.fetchone()

## More information

For more information, go to [pg8000's documentation](https://github.com/tlocke/pg8000).