# Question 1: Convert XML to a SQL database

Create two tables named `LOW` and `HIGH`, each corresponding to data given for the low and high temperature range.
Each should have the following column names:

- `SPECIES_NAME`
- `TLOW`
- `THIGH`
- `COEFF_1`
- `COEFF_2`
- `COEFF_3`
- `COEFF_4`
- `COEFF_5`
- `COEFF_6`
- `COEFF_7`

Populate the tables using the XML file you created in last assignment. If you did not complete the last assignment, you may also use the `example_thermo.xml` file.

`TLOW` should refer to the temperature at the low range and `THIGH` should refer to the temperature at the high range.  For example, in the `LOW` table, $H$ would have `TLOW` at $300$ and `THIGH` at $1000$ and in the `HIGH` table, $H$ would have `TLOW` at $1000$ and `THIGH` at $5000$.

For both tables, `COEFF_1` through `COEFF_7` should be populated with the corresponding coefficients for the low temperature data and high temperature data.

In [1]:
import xml.etree.ElementTree as ET
import sqlite3
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)

In [2]:
tree = ET.parse('../HW9/thermo.xml')
root = tree.getroot()
species = root.find("phase")
species_array = species.find("speciesArray").text
species_name = species_array.split()

species_name

['O', 'O2', 'H', 'H2', 'OH', 'H2O', 'HO2', 'H2O2']

In [3]:
dic = {}
for specie in root.findall("speciesData"):
    for s in specie.findall("species"):
        name = s.get("name")
        dic[name] = {}
        coeffs = s.find("thermo").findall("NASA")
        T_max = []
        T_min = []
        coeff = []
        for c in coeffs:
            T_max.append(c.get("TMAX"))
            T_min.append(c.get("TMIN"))   
            coeff.append(c.find("floatArray").text.split(','))
        dic[name]["TMAX"] = T_max
        dic[name]["TMIN"] = T_min
        dic[name]["coeffs"] = coeff

In [4]:
# dic

In [5]:
db = sqlite3.connect('HW10.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS HIGH")
cursor.execute("DROP TABLE IF EXISTS LOW")
cursor.execute("PRAGMA foreign_keys=1")

cursor.execute('''CREATE TABLE HIGH (
               SPECIES_NAME TEXT PRIMARY KEY NOT NULL, 
               TLOW TEXT, 
               THIGH TEXT, 
               COEFF_1 TEXT,
               COEFF_2 TEXT,
               COEFF_3 TEXT,
               COEFF_4 TEXT,
               COEFF_5 TEXT,
               COEFF_6 TEXT,
               COEFF_7 TEXT)''')

cursor.execute('''CREATE TABLE LOW (
               SPECIES_NAME TEXT PRIMARY KEY NOT NULL, 
               TLOW TEXT, 
               THIGH TEXT, 
               COEFF_1 TEXT,
               COEFF_2 TEXT,
               COEFF_3 TEXT,
               COEFF_4 TEXT,
               COEFF_5 TEXT,
               COEFF_6 TEXT,
               COEFF_7 TEXT)''')

db.commit() # Commit changes to the database

In [6]:
def viz_tables(cols, query):
    q = cursor.execute(query).fetchall()
    framelist = []
    for i, col_name in enumerate(cols):
        framelist.append((col_name, [col[i] for col in q]))
    return pd.DataFrame.from_items(framelist)

In [7]:
for key, value in dic.items():
    vals_to_insert_high = (key, float(value["TMIN"][0]), float(value["TMAX"][0]), value["coeffs"][0][0], value["coeffs"][0][1], \
                          value["coeffs"][0][2], value["coeffs"][0][3], value["coeffs"][0][4], \
                          value["coeffs"][0][5], value["coeffs"][0][6])
    cursor.execute('''INSERT INTO HIGH 
                  (SPECIES_NAME, TLOW, THIGH, COEFF_1, COEFF_2, COEFF_3, COEFF_4, COEFF_5, COEFF_6, COEFF_7)
                  VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert_high)


In [8]:
HIGH_cols = [col[1] for col in cursor.execute("PRAGMA table_info(HIGH)")]
query_HIGH = '''SELECT * FROM HIGH'''
viz_tables(HIGH_cols, query_HIGH)

Unnamed: 0,SPECIES_NAME,TLOW,THIGH,COEFF_1,COEFF_2,COEFF_3,COEFF_4,COEFF_5,COEFF_6,COEFF_7
0,O,1000.0,5000.0,2.56942078,-8.59741137e-05,4.19484589e-08,-1.00177799e-11,1.2283369100000001e-15,29217.5791,4.78433864
1,O2,1000.0,5000.0,3.28253784,0.00148308754,-7.57966669e-07,2.09470555e-10,-2.16717794e-14,-1088.45772,5.45323129
2,H,1000.0,5000.0,2.50000001,-2.30842973e-11,1.61561948e-14,-4.7351523499999995e-18,4.98197357e-22,25473.6599,-0.446682914
3,H2,1000.0,5000.0,3.3372792,-4.94024731e-05,4.99456778e-07,-1.79566394e-10,2.00255376e-14,-950.158922,-3.20502331
4,OH,1000.0,5000.0,3.09288767,0.000548429716,1.26505228e-07,-8.79461556e-11,1.17412376e-14,3858.657,4.4766961
5,H2O,1000.0,5000.0,3.03399249,0.00217691804,-1.64072518e-07,-9.7041987e-11,1.68200992e-14,-30004.2971,4.9667701
6,HO2,1000.0,5000.0,4.0172109,0.00223982013,-6.3365815e-07,1.1424637e-10,-1.07908535e-14,111.856713,3.78510215
7,H2O2,1000.0,5000.0,4.16500285,0.00490831694,-1.90139225e-06,3.71185986e-10,-2.87908305e-14,-17861.7877,2.91615662


In [9]:
for key, value in dic.items():
    vals_to_insert_low = (key, float(value["TMIN"][1]), float(value["TMAX"][1]), value["coeffs"][1][0], value["coeffs"][1][1], \
                          value["coeffs"][1][2], value["coeffs"][1][3], value["coeffs"][1][4], \
                          value["coeffs"][1][5], value["coeffs"][1][6])
    cursor.execute('''INSERT INTO LOW 
                  (SPECIES_NAME, TLOW, THIGH, COEFF_1, COEFF_2, COEFF_3, COEFF_4, COEFF_5, COEFF_6, COEFF_7)
                  VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert_low)

In [10]:
LOW_cols = [col[1] for col in cursor.execute("PRAGMA table_info(LOW)")]
query_LOW = '''SELECT * FROM LOW'''
viz_tables(LOW_cols, query_LOW)

Unnamed: 0,SPECIES_NAME,TLOW,THIGH,COEFF_1,COEFF_2,COEFF_3,COEFF_4,COEFF_5,COEFF_6,COEFF_7
0,O,300.0,1000.0,3.1682671,-0.00327931884,6.64306396e-06,-6.12806624e-09,2.11265971e-12,29122.2592,2.05193346
1,O2,300.0,1000.0,3.78245636,-0.00299673416,9.84730201e-06,-9.68129509e-09,3.24372837e-12,-1063.94356,3.65767573
2,H,300.0,1000.0,2.5,7.05332819e-13,-1.9959196400000004e-15,2.30081632e-18,-9.27732332e-22,25473.6599,-0.446682853
3,H2,300.0,1000.0,2.34433112,0.00798052075,-1.9478151e-05,2.01572094e-08,-7.37611761e-12,-917.935173,0.683010238
4,OH,300.0,1000.0,3.99201543,-0.00240131752,4.61793841e-06,-3.88113333e-09,1.3641147e-12,3615.08056,-0.103925458
5,H2O,300.0,1000.0,4.19864056,-0.0020364341,6.52040211e-06,-5.48797062e-09,1.77197817e-12,-30293.7267,-0.849032208
6,HO2,300.0,1000.0,4.30179801,-0.00474912051,2.11582891e-05,-2.42763894e-08,9.29225124e-12,294.80804,3.71666245
7,H2O2,300.0,1000.0,4.27611269,-0.000542822417,1.67335701e-05,-2.15770813e-08,8.62454363e-12,-17702.5821,3.43505074


# Question 2: `WHERE` Statements

1. Write a `Python` function `get_coeffs` that returns an array of 7 coefficients.  
   
   The function should take in two parameters: 1.) `species_name` and 2.) `temp_range`, an indicator variable ('low' or 'high') to indicate whether the coefficients should come from the low or high temperature range.  
   The function should use `SQL` commands and `WHERE` statements on the table you just created in Question 1 (rather than taking data from the XML directly).
```python
def get_coeffs(species_name, temp_range):
    ''' Fill in here'''
    return coeffs
```

2. Write a python function `get_species` that returns all species that have a temperature range above or below a given value. The function should take in two parameters: 1.) `temp` and 2.) `temp_range`, an indicator variable ('low' or 'high').

  When temp_range is 'low', we are looking for species with a temperature range lower than the given temperature, and for a 'high' temp_range, we want species with a temperature range higher than the given temperature.

  This exercise may be useful if different species have different `LOW` and `HIGH` ranges.

  And as before, you should accomplish this through `SQL` queries and where statements.

```python
def get_species(temp, temp_range):
    ''' Fill in here'''
    return coeffs
```

# Question 3: `JOIN` STATEMENTS

Create a table named `ALL_TEMPS` that has the following columns:

- `SPECIES_NAME`
- `TEMP_LOW`
- `TEMP_HIGH`

This table should be created by joining the tables `LOW` and `HIGH` on the value `SPECIES_NAME`.

1. Write a `Python` function `get_range` that returns the range of temperatures for a given species_name.

The range should be computed within the `SQL` query (i.e. you should subtract within the `SELECT` statement in the `SQL` query).
```python
def get_range(species_name):
    '''Fill in here'''
    return range
```

Note that `TEMP_LOW` is the lowest temperature in the `LOW` range and `TEMP_HIGH` is the highest temperature in the `HIGH` range.

In [11]:
db.commit()
# db.close()