# Homework 9: Getting Familiar with NASA Polynomials
## Due Date:  Tuesday, November 7th at 11:59 PM

Read the NASA Polynomial dataset in raw format and parse and store the data into an .xml file.

### Review of the NASA Polynomials
You can find the NASA Polynomial file in `thermo.txt`.

You can find some details on the NASA Polynomials [at this site](http://combustion.berkeley.edu/gri_mech/data/nasa_plnm.html) in addition to the Lecture 16 notes.


The NASA polynomials for specie $i$ have the form:
$$
    \frac{C_{p,i}}{R}= a_{i1} + a_{i2} T + a_{i3} T^2 + a_{i4} T^3 + a_{i5} T^4
$$

$$
    \frac{H_{i}}{RT} = a_{i1} + a_{i2} \frac{T}{2} + a_{i3} \frac{T^2}{3} + a_{i4} \frac{T^3}{4} + a_{i5} \frac{T^4}{5} + \frac{a_{i6}}{T}
$$

$$
    \frac{S_{i}}{R}  = a_{i1} \ln(T) + a_{i2} T + a_{i3} \frac{T^2}{2} + a_{i4} \frac{T^3}{3} + a_{i5} \frac{T^4}{4} + a_{i7}
$$

where $a_{i1}$, $a_{i2}$, $a_{i3}$, $a_{i4}$, $a_{i5}$, $a_{i6}$, and $a_{i7}$ are the numerical coefficients supplied in NASA thermodynamic files. 

### Some Notes on `thermo.txt`
The first 7 numbers starting on the second line of each species entry (five of the second line and the first two of the third line) are the seven coefficients ($a_{i1}$ through $a_{i7}$, respectively) for the high-temperature range (above 1000 K, the upper boundary is specified on the first line of the species entry). 

The next seven numbers are the coefficients ($a_{i1}$ through $a_{i7}$, respectively) for the low-temperature range (below 1000 K, the lower boundary is specified on the first line of the species entry).

### Additional Specifications
Your final .xml file should contain the following specifications:

1. A `speciesArray` field that contains a space-separated list of all of the species present in the file.
2. Each species contains a `species` field with a `name` attribute as the species name.

    1. For each temperature range, use a sub-field with the minimum and maximum temperature as attributes.
    2. `floatArray` field that contains comma-separated string of each coefficient.
    
You can reference the `example_thermo.xml` file for an example .xml output.

**Hint**: First parse the file into a Python dictionary. 

In [1]:
###############################################################################
# Parse data from thermo.txt.
###############################################################################

species_data = {}

def split_line_by_column_width(line, width=15, conv_float=True):
    result = [line[i:i+width].strip() for i in range(0, len(line), width)]
    result = [q for q in result if q != ''] # remove empty strings
    if conv_float:
        try:
            result = [float(i) for i in result]
        except ValueError:
            print('Encountered error converting line values to floats.')
            print(result)
    return result

def get_species_data(lines):
    """Returns (species, dict) containing necessary data for XML file.
        
    Args:
        lines: Should contain lines 1-4 of the thermo file for a species.
    """
    # Exit if this was end-of-file garbage.
    if len(lines) < 4:
        return
    
    # Format lines to make data more easily retrievable.
    lines[0] = lines[0].split()
    lines[1] = split_line_by_column_width(lines[1])
    lines[2] = split_line_by_column_width(lines[2])
    lines[3] = split_line_by_column_width(lines[3])
    
    # Extract data from lines.
    data = {} # container for data that will be put into XML for species
    coeffs = []
    coeffs.extend(lines[1][0:-1])
    coeffs.extend(lines[2][0:-1])
    coeffs.extend(lines[3][0:-1])
    data['coeffs'] = coeffs
    data['Tmin'] = lines[0][-4]
    data['Tmax'] = lines[0][-3]
    data['Tmid'] = lines[0][-2]
    
    species = lines[0][0]
    return species, data
    
with open('thermo.txt', 'r', encoding='utf8') as f:
    lines = f.readlines()
    i_species_start = 5
    for i, line in enumerate(lines, 5):
        if i != i_species_start:
            continue
            
        i_species_start += 4
        try:
            species, data = get_species_data(lines[i:i+4])
            species_data[species] = data
        except:
            # Assume this is because of end-of-file garbage.
            print('Error getting data for i={0}'.format(i))

Error getting data for i=37
Error getting data for i=41


The above error messages occur because we encounter breaks at the last lines of the `thermo.txt` file given the logic I use in the above script. I didn't want to hardcode the final row so instead settled on printing a warning.

In [2]:
###############################################################################
# Write data to XML file.
###############################################################################

import xml.etree.ElementTree as ET

def create_str_from_coeffs(coeffs):
    str_coeffs = [str(c) for c in coeffs]
    return ', '.join(str_coeffs)
    

def add_species_element(name, data, parent):
    species = ET.SubElement(parent, 'species', name=name)
    thermo = ET.SubElement(species, 'thermo')
    # First <NASA> element.
    nasa1 = ET.SubElement(thermo, 'NASA', Tmax=data['Tmid'], Tmin=data['Tmin'])
    coeffs1 = data['coeffs'][0:7]
    ET.SubElement(nasa1, 'floatArray', name='coeffs', 
                  size=str(len(coeffs1))).text = create_str_from_coeffs(coeffs1)
    # Second <NASA> element.
    nasa2 = ET.SubElement(thermo, 'NASA', Tmax=data['Tmax'], Tmin=data['Tmid'])
    coeffs2 = data['coeffs'][7:]
    ET.SubElement(nasa2, 'floatArray', name='coeffs', 
                  size=str(len(coeffs2))).text = create_str_from_coeffs(coeffs2)

root = ET.Element('ctml')

# <phase> element.
phase = ET.SubElement(root, 'phase', id='hw9temp')
ET.SubElement(phase, 'speciesArray', datasrc="#species_data").text = ' '.join(
    list(species_data.keys()))

# All <species> elements within species.
speciesData = ET.SubElement(root, 'speciesData', id='species_data')

for species, data in species_data.items():
    add_species_element(species, data, speciesData)

# Save to file.
tree = ET.ElementTree(root)
tree.write('finalhw.xml')