# SQL Basics

Import the required Python packages:

In [3]:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.exc import ProgrammingError

To connect to the local database we set up for the Meerkat development environment, we need to run some additional Python commands:

In [4]:
url = 'postgresql+psycopg2://postgres:postgres@localhost/meerkat_db'
engine = create_engine(url)
Session = sessionmaker(bind=engine)
session = Session()

## Exercise 1: Basic SQL queries

In this exercise, we write standard SQL queries. Since PostgreSQL data can't be edited by simply opening a file with a text editor, we'll have to resort to the Python client.

For example, this query fetches the name of the location with ID 1

In [5]:
sql_query = \
"SELECT NAME FROM LOCATIONS WHERE ID = 1;"

session.commit() # These commit lines make sure the session is in its default state even if something goes wrong
try:
    result = session.execute(sql_query)
    for r in result:
        print(r)
except ProgrammingError as e:
    print("SQL command wasn't valid:", e)
session.commit()


('Demo',)


### a)
The "locations" table has a field called parent_location, which refers to the larger region the location is in. 

Edit the below query to find the ids and names of locations that have location 1 as their parent_location


In [6]:
sql_query = \
"SELECT ID, NAME FROM LOCATIONS WHERE"

session.commit() # These commit lines make sure the session is in its default state even if something goes wrong
try:
    result = session.execute(sql_query)
    for r in result:
        print(r)
except ProgrammingError as e:
    print("SQL command wasn't valid:", e)
session.commit()


SQL command wasn't valid: (psycopg2.ProgrammingError) syntax error at end of input
LINE 1: SELECT ID, NAME FROM LOCATIONS WHERE
                                            ^
 [SQL: 'SELECT ID, NAME FROM LOCATIONS WHERE']


### b)
The table "data" has tuples with pre-calculated variables. The Abacus module calculates these from the ODK data sent by the tablets. Most of the reports query data from this table. 

Refer to the SQL reference manual and tutorials in e.g.
https://www.w3schools.com/sql/default.asp
to edit the query below to calculate the number of entries with type "case" and date after 2016/07/01.   

In [7]:
sql_query = \
""
# Write your own SQL query
# HINT 1: To give more than one filter option, check w3schools section SQL And & Or
# HINT 2: To aggregate several rows to e.g. sums or counts, check w3schools section SQL Functions

session.commit() # These commit lines make sure the session is in its default state even if something goes wrong
try:
    result = session.execute(sql_query)
    for r in result:
        print(r)
except ProgrammingError as e:
    print("SQL command wasn't valid:", e)
session.commit()


SQL command wasn't valid: (psycopg2.ProgrammingError) can't execute an empty query


## Exercise 2: Querying JSON objects in PostgreSQL

Standard SQL does not include JSON objects that can be queried. PostgreSQL includes this feature as an extension.

The main data table "data" has several JSON columns. The column "variables" is a JSON column that has pre-calculated variables as key-value pairs. These variables are used to filter and sum data in the frontend reports.

This sample query fetches the rows from the table data that have the following key-value pairs in the "variables" JSON column: 
 - gen_1 : 1
 - nat_1 : 1

In [8]:
sql_query = \
"SELECT ID, CLINIC, DATE FROM DATA WHERE VARIABLES->>'gen_1' = '1' AND VARIABLES->>'nat_1' = '1'" 
# ->> is used to access a key-value pair in a JSON column. The value will be returned as character string


session.commit() # These commit lines make sure the session is in its default state even if something goes wrong
try:
    result = session.execute(sql_query)
    for r in result:
        print("ID:" ,r.id, " clinic:", r.clinic, " date:", r.date)
except ProgrammingError as e:
    print("SQL command wasn't valid:", e)
session.commit()


ID: 9  clinic: 10  date: 2017-02-18 00:00:00
ID: 5  clinic: 7  date: 2017-02-18 00:00:00
ID: 8  clinic: 8  date: 2017-02-18 00:00:00
ID: 32  clinic: 10  date: 2017-02-11 00:00:00
ID: 21  clinic: 10  date: 2017-02-27 00:00:00
ID: 22  clinic: 10  date: 2017-02-17 00:00:00
ID: 25  clinic: 7  date: 2017-02-24 00:00:00
ID: 26  clinic: 7  date: 2017-02-16 00:00:00
ID: 27  clinic: 8  date: 2017-02-23 00:00:00
ID: 28  clinic: 8  date: 2017-02-13 00:00:00
ID: 34  clinic: 7  date: 2017-02-26 00:00:00
ID: 46  clinic: 8  date: 2017-02-20 00:00:00
ID: 58  clinic: 11  date: 2017-02-13 00:00:00
ID: 68  clinic: 7  date: 2017-02-25 00:00:00
ID: 82  clinic: 7  date: 2017-02-26 00:00:00
ID: 86  clinic: 8  date: 2017-02-26 00:00:00
ID: 90  clinic: 11  date: 2017-02-11 00:00:00
ID: 94  clinic: 7  date: 2017-02-20 00:00:00
ID: 98  clinic: 7  date: 2017-02-12 00:00:00
ID: 102  clinic: 7  date: 2017-02-24 00:00:00
ID: 106  clinic: 8  date: 2017-02-10 00:00:00
ID: 111  clinic: 8  date: 2017-02-14 00:00:00
ID: 

### a) 
Write a query that counts the number of rows in the data table that have date between 2017/02/19 and 2017/02/23 as well as have either variable "nat_1" or "nat_2" as 1.


In [10]:
sql_query = \
"" 
# Write your SQL query here


session.commit() # These commit lines make sure the session is in its default state even if something goes wrong
try:
    result = session.execute(sql_query)
    for r in result:
        print(r)
except ProgrammingError as e:
    print("SQL command wasn't valid:", e)
session.commit()


SQL command wasn't valid: (psycopg2.ProgrammingError) can't execute an empty query


### b) 
Write the query above with using the "@>" or "<@" JSON operators. Refer to PostgreSQL documentation for guidelines

In [11]:
sql_query = \
"" 
# Write your SQL query here

session.commit() # These commit lines make sure the session is in its default state even if something goes wrong
try:
    result = session.execute(sql_query)
    for r in result:
        print(r)
except ProgrammingError as e:
    print("SQL command wasn't valid:", e)
session.commit()


SQL command wasn't valid: (psycopg2.ProgrammingError) can't execute an empty query
