# Question 1: Convert XML to a SQL database

Create two tables named `LOW` and `HIGH`, each corresponding to data given for the low and high temperature range.
Each should have the following column names:

- `SPECIES_NAME`
- `TLOW`
- `THIGH`
- `COEFF_1`
- `COEFF_2`
- `COEFF_3`
- `COEFF_4`
- `COEFF_5`
- `COEFF_6`
- `COEFF_7`

Populate the tables using the XML file you created in last assignment. If you did not complete the last assignment, you may also use the `example_thermo.xml` file.

`TLOW` should refer to the temperature at the low range and `THIGH` should refer to the temperature at the high range.  For example, in the `LOW` table, $H$ would have `TLOW` at $300$ and `THIGH` at $1000$ and in the `HIGH` table, $H$ would have `TLOW` at $1000$ and `THIGH` at $5000$.

For both tables, `COEFF_1` through `COEFF_7` should be populated with the corresponding coefficients for the low temperature data and high temperature data.

In [1]:
import xml.etree.ElementTree as ET
import sqlite3
import pandas as pd
import numpy as np
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)

In [2]:
tree = ET.parse('../HW9/thermo.xml')
root = tree.getroot()
species = root.find("phase")
species_array = species.find("speciesArray").text
species_name = species_array.split()

species_name

['O', 'O2', 'H', 'H2', 'OH', 'H2O', 'HO2', 'H2O2']

In [3]:
dic = {}
for specie in root.findall("speciesData"):
    for s in specie.findall("species"):
        name = s.get("name")
        dic[name] = {}
        coeffs = s.find("thermo").findall("NASA")
        T_max = []
        T_min = []
        coeff = []
        for c in coeffs:
            T_max.append(c.get("TMAX"))
            T_min.append(c.get("TMIN"))   
            coeff.append(c.find("floatArray").text.split(','))
        dic[name]["TMAX"] = T_max
        dic[name]["TMIN"] = T_min
        dic[name]["coeffs"] = coeff

In [4]:
# dic

In [5]:
db = sqlite3.connect('HW10.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS HIGH")
cursor.execute("DROP TABLE IF EXISTS LOW")
cursor.execute("PRAGMA foreign_keys=1")

cursor.execute('''CREATE TABLE HIGH (
               SPECIES_NAME TEXT PRIMARY KEY NOT NULL, 
               TLOW FLOAT, 
               THIGH FLOAT, 
               COEFF_1 FLOAT,
               COEFF_2 FLOAT,
               COEFF_3 FLOAT,
               COEFF_4 FLOAT,
               COEFF_5 FLOAT,
               COEFF_6 FLOAT,
               COEFF_7 FLOAT)''')

cursor.execute('''CREATE TABLE LOW (
               SPECIES_NAME TEXT PRIMARY KEY NOT NULL, 
               TLOW FLOAT, 
               THIGH FLOAT, 
               COEFF_1 FLOAT,
               COEFF_2 FLOAT,
               COEFF_3 FLOAT,
               COEFF_4 FLOAT,
               COEFF_5 FLOAT,
               COEFF_6 FLOAT,
               COEFF_7 FLOAT)''')

db.commit() # Commit changes to the database

In [6]:
def viz_tables(cols, query):
    q = cursor.execute(query).fetchall()
    framelist = []
    for i, col_name in enumerate(cols):
        framelist.append((col_name, [col[i] for col in q]))
    return pd.DataFrame.from_items(framelist)

In [7]:
for key, value in dic.items():
    vals_to_insert_high = (key, float(value["TMIN"][0]), float(value["TMAX"][0]), float(value["coeffs"][0][0]), \
                           float(value["coeffs"][0][1]), float(value["coeffs"][0][2]), \
                           float(value["coeffs"][0][3]), float(value["coeffs"][0][4]), \
                           float(value["coeffs"][0][5]), float(value["coeffs"][0][6]))
    cursor.execute('''INSERT INTO HIGH 
                  (SPECIES_NAME, TLOW, THIGH, COEFF_1, COEFF_2, COEFF_3, COEFF_4, COEFF_5, COEFF_6, COEFF_7)
                  VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert_high)


In [8]:
HIGH_cols = [col[1] for col in cursor.execute("PRAGMA table_info(HIGH)")]
query_HIGH = '''SELECT * FROM HIGH'''
viz_tables(HIGH_cols, query_HIGH)

Unnamed: 0,SPECIES_NAME,TLOW,THIGH,COEFF_1,COEFF_2,COEFF_3,COEFF_4,COEFF_5,COEFF_6,COEFF_7
0,O,1000.0,5000.0,2.569421,-8.597411e-05,4.194846e-08,-1.001778e-11,1.228337e-15,29217.5791,4.784339
1,O2,1000.0,5000.0,3.282538,0.001483088,-7.579667e-07,2.094706e-10,-2.167178e-14,-1088.45772,5.453231
2,H,1000.0,5000.0,2.5,-2.30843e-11,1.615619e-14,-4.735152e-18,4.981974000000001e-22,25473.6599,-0.446683
3,H2,1000.0,5000.0,3.337279,-4.940247e-05,4.994568e-07,-1.795664e-10,2.002554e-14,-950.158922,-3.205023
4,OH,1000.0,5000.0,3.092888,0.0005484297,1.265052e-07,-8.794616e-11,1.174124e-14,3858.657,4.476696
5,H2O,1000.0,5000.0,3.033992,0.002176918,-1.640725e-07,-9.704199e-11,1.68201e-14,-30004.2971,4.96677
6,HO2,1000.0,5000.0,4.017211,0.00223982,-6.336581e-07,1.142464e-10,-1.079085e-14,111.856713,3.785102
7,H2O2,1000.0,5000.0,4.165003,0.004908317,-1.901392e-06,3.71186e-10,-2.879083e-14,-17861.7877,2.916157


In [9]:
for key, value in dic.items():
    vals_to_insert_low = (key, float(value["TMIN"][1]), float(value["TMAX"][1]), float(value["coeffs"][1][0]), \
                          float(value["coeffs"][1][1]), float(value["coeffs"][1][2]), \
                          float(value["coeffs"][1][3]), float(value["coeffs"][1][4]), \
                          float(value["coeffs"][1][5]), float(value["coeffs"][1][6]))
    cursor.execute('''INSERT INTO LOW 
                  (SPECIES_NAME, TLOW, THIGH, COEFF_1, COEFF_2, COEFF_3, COEFF_4, COEFF_5, COEFF_6, COEFF_7)
                  VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', vals_to_insert_low)

In [10]:
LOW_cols = [col[1] for col in cursor.execute("PRAGMA table_info(LOW)")]
query_LOW = '''SELECT * FROM LOW'''
viz_tables(LOW_cols, query_LOW)

Unnamed: 0,SPECIES_NAME,TLOW,THIGH,COEFF_1,COEFF_2,COEFF_3,COEFF_4,COEFF_5,COEFF_6,COEFF_7
0,O,300.0,1000.0,3.168267,-0.003279319,6.643064e-06,-6.128066e-09,2.11266e-12,29122.2592,2.051933
1,O2,300.0,1000.0,3.782456,-0.002996734,9.847302e-06,-9.681295e-09,3.243728e-12,-1063.94356,3.657676
2,H,300.0,1000.0,2.5,7.053328e-13,-1.99592e-15,2.300816e-18,-9.277323e-22,25473.6599,-0.446683
3,H2,300.0,1000.0,2.344331,0.007980521,-1.947815e-05,2.015721e-08,-7.376118e-12,-917.935173,0.68301
4,OH,300.0,1000.0,3.992015,-0.002401318,4.617938e-06,-3.881133e-09,1.364115e-12,3615.08056,-0.103925
5,H2O,300.0,1000.0,4.198641,-0.002036434,6.520402e-06,-5.487971e-09,1.771978e-12,-30293.7267,-0.849032
6,HO2,300.0,1000.0,4.301798,-0.004749121,2.115829e-05,-2.427639e-08,9.292251e-12,294.80804,3.716662
7,H2O2,300.0,1000.0,4.276113,-0.0005428224,1.673357e-05,-2.157708e-08,8.624544e-12,-17702.5821,3.435051


# Question 2: `WHERE` Statements

1. Write a `Python` function `get_coeffs` that returns an array of 7 coefficients.  
   
   The function should take in two parameters: 1.) `species_name` and 2.) `temp_range`, an indicator variable ('low' or 'high') to indicate whether the coefficients should come from the low or high temperature range.  
   The function should use `SQL` commands and `WHERE` statements on the table you just created in Question 1 (rather than taking data from the XML directly).
```python
def get_coeffs(species_name, temp_range):
    ''' Fill in here'''
    return coeffs
```

2. Write a python function `get_species` that returns all species that have a temperature range above or below a given value. The function should take in two parameters: 1.) `temp` and 2.) `temp_range`, an indicator variable ('low' or 'high').

  When temp_range is 'low', we are looking for species with a temperature range lower than the given temperature, and for a 'high' temp_range, we want species with a temperature range higher than the given temperature.

  This exercise may be useful if different species have different `LOW` and `HIGH` ranges.

  And as before, you should accomplish this through `SQL` queries and where statements.

```python
def get_species(temp, temp_range):
    ''' Fill in here'''
    return coeffs
```

In [30]:
def get_coeffs(species_name, temp_range):
    if temp_range == 'high':
        query = '''SELECT COEFF_1, COEFF_2, COEFF_3, COEFF_4, COEFF_5, COEFF_6, COEFF_7 FROM HIGH WHERE SPECIES_NAME==species_name'''
    elif temp_range == 'low':
        query = '''SELECT COEFF_1, COEFF_2, COEFF_3, COEFF_4, COEFF_5, COEFF_6, COEFF_7 FROM LOW WHERE SPECIES_NAME==species_name'''
    coeffs = cursor.execute(query).fetchall()
    return coeffs

In [31]:
get_coeffs('O', 'high')

[(2.56942078,
  -8.59741137e-05,
  4.19484589e-08,
  -1.00177799e-11,
  1.22833691e-15,
  29217.5791,
  4.78433864),
 (3.28253784,
  0.00148308754,
  -7.57966669e-07,
  2.09470555e-10,
  -2.16717794e-14,
  -1088.45772,
  5.45323129),
 (2.50000001,
  -2.30842973e-11,
  1.61561948e-14,
  -4.73515235e-18,
  4.98197357e-22,
  25473.6599,
  -0.446682914),
 (3.3372792,
  -4.94024731e-05,
  4.99456778e-07,
  -1.79566394e-10,
  2.00255376e-14,
  -950.158922,
  -3.20502331),
 (3.09288767,
  0.000548429716,
  1.26505228e-07,
  -8.79461556e-11,
  1.17412376e-14,
  3858.657,
  4.4766961),
 (3.03399249,
  0.00217691804,
  -1.64072518e-07,
  -9.7041987e-11,
  1.68200992e-14,
  -30004.2971,
  4.9667701),
 (4.0172109,
  0.00223982013,
  -6.3365815e-07,
  1.1424637e-10,
  -1.07908535e-14,
  111.856713,
  3.78510215),
 (4.16500285,
  0.00490831694,
  -1.90139225e-06,
  3.71185986e-10,
  -2.87908305e-14,
  -17861.7877,
  2.91615662)]

In [13]:
# def get_species(temp, temp_range):
    
#     return coeffs

# Question 3: `JOIN` STATEMENTS

Create a table named `ALL_TEMPS` that has the following columns:

- `SPECIES_NAME`
- `TEMP_LOW`
- `TEMP_HIGH`

This table should be created by joining the tables `LOW` and `HIGH` on the value `SPECIES_NAME`.

1. Write a `Python` function `get_range` that returns the range of temperatures for a given species_name.

The range should be computed within the `SQL` query (i.e. you should subtract within the `SELECT` statement in the `SQL` query).
```python
def get_range(species_name):
    '''Fill in here'''
    return range
```

Note that `TEMP_LOW` is the lowest temperature in the `LOW` range and `TEMP_HIGH` is the highest temperature in the `HIGH` range.

In [17]:
cursor.execute("DROP TABLE IF EXISTS ALL_TEMPS")
cursor.execute('''CREATE TABLE ALL_TEMPS (
               SPECIES_NAME TEXT PRIMARY KEY NOT NULL, 
               TEMP_LOW FLOAT, 
               TEMP_HIGH FLOAT)''')

db.commit()

In [28]:
query = '''SELECT LOW.SPECIES_NAME, LOW.TLOW, HIGH.THIGH FROM LOW INNER JOIN HIGH ON LOW.SPECIES_NAME = HIGH.SPECIES_NAME'''
cursor.execute(query)

<sqlite3.Cursor at 0x10ed52b20>

In [29]:
ALL_cols = [col[1] for col in cursor.execute("PRAGMA table_info(ALL_TEMPS)")]
query_ALL = '''SELECT * FROM ALL_TEMPS'''
viz_tables(ALL_cols, query_ALL)

Unnamed: 0,SPECIES_NAME,TEMP_LOW,TEMP_HIGH


In [None]:
# def get_range(species_name):
    
#     return range

In [16]:
db.commit()
# db.close()