## Lesson - Querying SQL from Python
(Python script is available in accompanying `.py` files)

SQLite is a database that doesn't require a standalone server; it stores the entire database as a file on disk. This makes it ideal for working with larger datasets that can fit on disk but not in memory.

Python loads the entire data set we're working with into memory (RAM), making SQLite a compelling alternative for working with data sets larger than 8 gigabytes (which is roughly the amount of memory modern computers contain). 
The fact that we can contain an entire database in a single file makes them easy to share; some data sets are available online as SQLite database files (using the extension .db).

We can interact with a SQLite database in two main ways:

- Through the sqlite3 Python module
- Through the SQLite shell
In this lesson, we'll learn how to use the sqlite3 module to interact with the database.

**Database**
We'll work with the subset of  American Community Survey data on college majors and job outcomes.
Full dataset is available at [FiveThirtyEight's GitHub repository](https://github.com/fivethirtyeight/data/tree/master/college-majors).

Here are the descriptions for the columns in the preview:

`Rank`: The major's rank by median earnings
`Major_code`: The major's code or ID
`Major`: The name of the major
`Major_category`: The broader category the major belongs to
`Total`: The total number of people who studied the major
`Sample_size`: The sample size (unweighted) of graduates with full time jobs
`Men`: The number of male graduates
`Women`: The number of female graduates
`ShareWomen`: Women as a proportion of the total number of graduates (a number ranging from 0 to 1)
`Employed`: The number of employed graduates

We've loaded a subset of the data into a table named `recent_grad`s in a database. The subset contains the 2010-2012 data for recent college grads only. The database file we'll be working with is called `jobs.db`.

In [1]:
import sqlite3
conn = sqlite3.connect("jobs.db")

### Cursor objects and Tuples
Before we can execute a query, we need to express our SQL query as a `string`. While we use the `Connection` class to represent the database we're working with, we use the `Cursor` class to:

- Run a query against the database
- Parse the results from the database
- Convert the results to native Python objects
- Store the results within the Cursor instance as a local variable
- After running a query and converting the results to a list of tuples, the Cursor instance stores the   list as a local variable. 


### Working with Sequence of Values as Tuples
A [tuple](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) is a core data structure that Python uses to represent a sequence of values, similar to a list. Unlike lists, tuples are immutable, which means we can't modify existing ones. Python represents each row in the results set as a tuple.

To create an empty tuple, assign a pair of empty parentheses to a variable: `t=()`

Python indexes Tuples from 0 to n-1, just like it does with lists. We access the values in a tuple using bracket notation.
```
t = ('Apple', 'Banana')
apple = t[0] 
banana = t[1]
```
Tuples are faster than lists, so they're helpful with larger databases and larger results sets.

----------------------------------------------
```
>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
>>> # Tuples are immutable:
... t[0] = 88888
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> # but they can contain mutable objects:
... v = ([1, 2, 3], [3, 2, 1])
>>> v
([1, 2, 3], [3, 2, 1])
```


### Creating a Cursor and Running a Query
We need to use the Connection instance method cursor() to return a Cursor instance corresponding to the database we want to query: `cursor = conn.cursor()`
In the following code block, we:

- Write a basic `select` query that will return all of the values from the `recent_grads` table, and **store this query as a string named query**.
- use the `Cursor` method `execute()` to run the query against our database.
- Return the full results set and store it as results.
- Print the first three tuples in the list results.

In [2]:
import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()

# SQL Query as a string
query = "select * from recent_grads;"
# Execute the query, convert the results to tuples, and store as a local variable
cursor.execute(query)
# Fetch the full results set as a list of tuples
results = cursor.fetchall()
# Display the first three results
print(results[0:3])

[(0, 1, 2419, 'PETROLEUM ENGINEERING', 'Engineering', 2339, 36, 2057, 282, 0.120564344, 1976, 1849, 270, 1207, 37, 0.018380527, 110000, 95000, 125000, 1534, 364, 193), (1, 2, 2416, 'MINING AND MINERAL ENGINEERING', 'Engineering', 756, 7, 679, 77, 0.10185185199999999, 640, 556, 170, 388, 85, 0.117241379, 75000, 55000, 90000, 350, 257, 50), (2, 3, 2415, 'METALLURGICAL ENGINEERING', 'Engineering', 856, 3, 725, 131, 0.153037383, 648, 558, 133, 340, 16, 0.024096386, 73000, 50000, 105000, 456, 176, 0)]


Write a query that returns all of the values in the `Major` column from the `recent_grads table`.
- Store the full results set (a list of tuples) in `majors`.
- Then, `print` the first three tuples in majors

In [5]:
import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()

# SQL Query as a string
query = "select Major from recent_grads;"
# Execute the query, convert the results to tuples, and store as a local variable
cursor.execute(query)
# Fetch the full results set as a list of tuples
majors = cursor.fetchall()
# Display the first three results
print(majors[0:3])

[('PETROLEUM ENGINEERING',), ('MINING AND MINERAL ENGINEERING',), ('METALLURGICAL ENGINEERING',)]


### Execute as a Shortcut for running Queries
So far, we've run queries by creating a Cursor instance, and then calling the execute method on the instance. The SQLite library actually allows us to skip creating a Cursor altogether by using the execute method within the Connection object itself. SQLite will create a Cursor instance for us under the hood and run our query against the database, allowing us to skip a step. Here's what the code looks like:
```
conn = sqlite3.connect("jobs.db")
query = "select * from recent_grads;"
conn.execute(query).fetchall()
```
We didn't explicitly create a separate Cursor instance ourselves in this code example

### Fetching a Specific number of Results
To make it easier to work with large results sets, the `Cursor` class allows us to control the number of results we want to retrieve at any given time. To return a single result (as a tuple), we use the Cursor method `fetchone()`. To return `n` results, we use the Cursor method `fetchmany()`.

Each Cursor instance contains an internal counter that updates every time we retrieve results. When we call the `fetchone()` method, the Cursor instance will return a single result, and then increment its internal counter by 1. This means that if we call `fetchone()` again, the Cursor instance will actually return the second tuple in the results set (and increment by 1 again).

The `fetchmany()` method takes in an integer `(n)` and returns the corresponding results, starting from the current position. It then increments the Cursor instance's counter by `n`.

In the following code, we return the first two results using the `fetchone()` method, then the next five results using the `fetchmany()` method.
```
first_result = cursor.fetchone()
second_result = cursor.fetchone()
next_five_results = cursor.fetchmany(5)

```
Write and run a query that returns the `Major` and `Major_category` columns from recent_grads.
Then, fetch the first five results and store them as `five_results`.


In [9]:
import sqlite3
conn = sqlite3.connect("jobs.db")
cursor = conn.cursor()

query="select Major, Major_category from recent_grads"         
five_results=conn.execute(query).fetchmany(5)
print(five_results)


[('PETROLEUM ENGINEERING', 'Engineering'), ('MINING AND MINERAL ENGINEERING', 'Engineering'), ('METALLURGICAL ENGINEERING', 'Engineering'), ('NAVAL ARCHITECTURE AND MARINE ENGINEERING', 'Engineering'), ('CHEMICAL ENGINEERING', 'Engineering')]


### Closing  Database Connection
Because SQLite restricts access to the database file when we're connected to a database, we need to close the connection when we're done working with it. Closing the connection allows other processes to access the database, which is important when you're in a production environment and working with other team members.

To close a connection to a database, use the Connection instance method `close()`. When we're working with multiple databases and multiple Connection instances, we want to make sure we call the `close()` method on the correct instance.

After closing the connection, attempting to query the database using any linked Cursor instances will return the following error:
```
ProgrammingError: Cannot operate on a closed database.

```
Close the connection to the database using the Connection instance method `close()`.

In [10]:
conn = sqlite3.connect("jobs.db")
conn.close()

### Complete Workflow

Connect to the database `jobs2.db`, which contains the same data as jobs.db.
Write and execute a query that returns all of the `majors (Major)` in reverse alphabetical order (Z to A).
Assign the full result set to `reverse_alphabetical`.
Finally, close the connection to the database.

In [15]:
import sqlite3
conn2=sqlite3.connect("jobs2.db")
cursor=conn2.cursor()

query="select Major from recent_grads order by Major Desc;"
cursor.execute(query)
reverse_alphabetical=cursor.fetchall()
print(reverse_alphabetical[0:3])
conn2.close()

[('ZOOLOGY',), ('VISUAL AND PERFORMING ARTS',), ('UNITED STATES HISTORY',)]
