# Converting TLC SQL Queries to Python DataFrames
- This is a copy of our class lecture on 6/7 but changed to match my capstone needs

In this notebook, you'll see how to connect to a Postgres database using the sqlalchemy library.

For this notebook, you'll need both the `sqlalchemy` and `psycopg2` libraries installed.

In [1]:
from sqlalchemy import create_engine

First, we need to create a connection string. The format is

 ```<dialect(+driver)>://<username>:<password>@<hostname>:<port>/<database>```

To connect to a database, you can use the following connection string.

In [2]:
database_name = 'ag_aid'

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"

Now, we need to create an engine and use it to connect.

In [3]:
engine = create_engine(connection_string)

Now, we can create our query and pass it into the `.query()` method.

In [5]:
query = '''
WITH cit AS (
SELECT
    (RIGHT(begin_date,2)::int) AS year,
    job_title,
    workers_req,
    SUM(workers_req::FLOAT) OVER() as wr_total,
    employer_state,
    worksite_state
FROM main
WHERE job_title ILIKE '%%citrus%%'
    AND (RIGHT(begin_date,2)::int) >= 10
GROUP BY year, job_title, employer_state, workers_req, worksite_state
ORDER BY year)

SELECT year,
    job_title,
    workers_req,
    wr_total,
    SUM(workers_req::FLOAT) OVER(PARTITION BY year) AS wr_total_year,
    employer_state,
    worksite_state
FROM cit
''' # Everything between these 3 single quotes is on one line

result = engine.execute(query)

You can then fetch the results as tuples using either `fetchone` or `fetchall`:

In [6]:
result.fetchone()

(10, 'CITRUS HAND HARVESTER', '117', 57652.0, 1045.0, 'FL', 'FL')

In [7]:
result.fetchall()

[(10, 'CITRUS HAND HARVESTER', '121', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HARVESTER', '156', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HARVESTER', '16', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HARVESTER', '47', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HARVESTER', '64', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HARVESTER', '97', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HARVESTING', '200', 57652.0, 1045.0, 'FL', 'FL'),
 (10, 'CITRUS HAND HAVESTER', '210', 57652.0, 1045.0, 'MS', 'FL'),
 (10, 'CITRUS TRUCK DRIVER, HAULER', '17', 57652.0, 1045.0, 'FL', 'FL'),
 (11, 'CITRUS FRUIT HARVESTING', '24', 57652.0, 2115.0, 'FL', 'FL'),
 (11, 'CITRUS HAND HARVESTER', '106', 57652.0, 2115.0, 'FL', 'FL'),
 (11, 'CITRUS HAND HARVESTER', '137', 57652.0, 2115.0, 'FL', 'FL'),
 (11, 'CITRUS HAND HARVESTER', '151', 57652.0, 2115.0, 'FL', 'FL'),
 (11, 'CITRUS HAND HARVESTER', '16', 57652.0, 2115.0, 'FL', 'FL'),
 (11, 'CITRUS HAND HARVESTER', '24', 57652.0, 2

On the other hand, sqlalchemy plays nicely with pandas.

In [8]:
import pandas as pd

In [None]:
# people = pd.DataFrame(data, 'user')

In [9]:
cit = pd.read_sql(query, con = engine)
cit.head()

Unnamed: 0,year,job_title,workers_req,wr_total,wr_total_year,employer_state,worksite_state
0,10,CITRUS HAND HARVESTER,117,57652.0,1045.0,FL,FL
1,10,CITRUS HAND HARVESTER,121,57652.0,1045.0,FL,FL
2,10,CITRUS HAND HARVESTER,156,57652.0,1045.0,FL,FL
3,10,CITRUS HAND HARVESTER,16,57652.0,1045.0,FL,FL
4,10,CITRUS HAND HARVESTER,47,57652.0,1045.0,FL,FL


For much more information about SQLAlchemy and to see a more “Pythonic” way to execute queries, see Introduction to Databases in Python: https://www.datacamp.com/courses/introduction-to-relational-databases-in-python