---
Eli Schwat

elilouis@uw.edu

Created for Professor Michael Brett's CEWA547 Course, Winter 2021

---

# Check Modified SSM Input File

In [1]:
import pandas as pd

Provide a path to a modified input file.

**<span style="color:red">USER INPUT REQUIRED</span>**


In [2]:
input_file = "/Users/elischwat/Google Drive/UW/Classes Winter 2021/Watershed MGMT/salish sea model/SSM_WQM_model_inputs/inputs/ssm_pnt_wqMODIFIED.dat"

In [3]:
with open(input_file) as src:
    df = pd.DataFrame(src.readlines())

In [4]:
df

Unnamed: 0,0
0,point calculated !2014\n
1,193\n
2,8778 ! FVCOM ID/Node: 87 / 3856 [Distribute...
3,8861 ! FVCOM ID/Node: 88 / 3919 [Distribute...
4,7542 ! FVCOM ID/Node: 59 / 2914 [Distribute...
...,...
13524,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...
13525,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...
13526,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...
13527,1.138E+03 1.138E+03 5.268E+02 5.268E+02 6.018...


There are 389 lines before any actual data is written. Remove those lines.

In [5]:
n_header_lines = 389

In [6]:
df = df.iloc[n_header_lines:]

Create a new data column, indicating how many tokens are on each line. This is important because on lines that contain the day of the following data, there is only one token per line. On lines with actual data, there are 193 values (1 for each point source).

In [7]:
df['n_tokens'] = df[0].apply(lambda x: len(x.split()))
df = df.reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [8]:
df.n_tokens.unique()

array([  1, 193])

When we parse and modify the data, we assume that the day/timestep is indicated on every 36th line.
Check if the day is indeed indicated on every 36th line. 
We do this by filtering for every line that has only one token and seeing if the index of that line is a multiple of 36.

In [19]:
all(
    df[df.n_tokens==1].index/36 - (
        df[df.n_tokens==1].index/36
    ).astype(int) == 0
)

True

Our understanding of the file is that on every 36th line, the timestep of the 35 lines of data that follow is indicated.
On each of those following 35 lines is the value for each of 193 point sources. 

Similarly to how the index of every line with a timestep should be a multiple of 36, the index of every line with ammonium data should be a multiple of 14.

We can check that we correctly modified ammonium, nitrate, ldon, and lpon values by looking at all of those values in our dataframe

In [21]:
df.head()

Unnamed: 0,index,0,n_tokens
0,389,0.00\n,1
1,390,6.400E+02 6.400E+02 3.107E+01 3.107E+01 5.635...,193
2,391,3.730E+00 3.730E+00 5.939E+00 5.939E+00 5.939...,193
3,392,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...,193
4,393,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...,193


Mark the line number (out of 35) for the parameters that we change.

In [22]:
# 14. Ammonium (NH4)
ammonium_index = 14
# 15. Nitrate + Nitrite (NO3+NO2)
nitrate_index = 15
# 17. Labile DON (Labile dissolved organic nitrogen)
ldon_index = 17
# 19. Labile PON (Labile particular organic nitrogen)
lpon_index = 19

In [23]:
df.head()

Unnamed: 0,index,0,n_tokens
0,389,0.00\n,1
1,390,6.400E+02 6.400E+02 3.107E+01 3.107E+01 5.635...,193
2,391,3.730E+00 3.730E+00 5.939E+00 5.939E+00 5.939...,193
3,392,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...,193
4,393,0.000E+00 0.000E+00 0.000E+00 0.000E+00 0.000...,193


We create a new column to represent the "parameter index", or the number of the line for each timestep. IE, for any timestep, the line containing the timestep value will have an `n_param` value of 0. Similarly, for any timestep, the line containing the ammonium value will have an `n_param` value of 14.

In [24]:
df['n_param'] = list(df.index % 36)

Now we look at the 119th (and 126th) token on every line that contain ammonium, nitrate, ldon, and lpon values. 
119 and 126 correspond to the South King and West Point point sources.

In [25]:
df[df.n_param == ammonium_index][0].apply(lambda x: x.split()[119]).unique()

array(['2.500E+01'], dtype=object)

In [29]:
df[df.n_param == ammonium_index][0].apply(lambda x: x.split()[126]).unique()

array(['2.500E+01'], dtype=object)

In [30]:
df[df.n_param == nitrate_index][0].apply(lambda x: x.split()[119]).unique()

array(['4.000E+00'], dtype=object)

In [31]:
df[df.n_param == nitrate_index][0].apply(lambda x: x.split()[126]).unique()

array(['4.000E+00'], dtype=object)

In [32]:
df[df.n_param == ldon_index][0].apply(lambda x: x.split()[119]).unique()

array(['2.000E+00'], dtype=object)

In [33]:
df[df.n_param == ldon_index][0].apply(lambda x: x.split()[126]).unique()

array(['2.000E+00'], dtype=object)

In [28]:
df[df.n_param == lpon_index][0].apply(lambda x: x.split()[119]).unique()

array(['1.000E+00'], dtype=object)

In [34]:
df[df.n_param == lpon_index][0].apply(lambda x: x.split()[126]).unique()

array(['1.000E+00'], dtype=object)

We should observe only one unique value for each line of code run above. These values should match the concentrations that we provided in the `modify_ssm_inputs.ipynb` notebook.