---
Eli Schwat

elilouis@uw.edu

Created for Professor Michael Brett's CEWA547 Course, Winter 2021

---

# Modify Salish Sea Model Point Source Inputs

This notebook walks you through the process of modifying point source effluent concentration data. 

User input is required when you see this...

**<span style="color:red">USER INPUT REQUIRED</span>**

Good luck.

In [1]:
import pandas as pd

# Input Variables:

**<span style="color:red">USER INPUT REQUIRED</span>**

Put in the path to the file `ssm_pnt_wq.dat`, which should come with your packaging of the SSM model.

In [2]:
input_file = "/Users/elischwat/Google Drive/UW/Classes Winter 2021/Watershed MGMT/salish sea model/SSM_WQM_model_inputs/inputs/ssm_pnt_wq.dat"
# input_file = "/Users/elischwat/Google Drive/UW/Classes Winter 2021/Watershed MGMT/salish sea model/SSM_WQM_model_inputs/inputs/ssm_pnt_wqSHORT.dat"

**<span style="color:red">USER INPUT REQUIRED</span>**

Put in a path to where you will write the modified file when you are done.

In [3]:
output_file = "/Users/elischwat/Google Drive/UW/Classes Winter 2021/Watershed MGMT/salish sea model/SSM_WQM_model_inputs/inputs/ssm_pnt_wqMODIFIED.dat"
# output_file = "/Users/elischwat/Google Drive/UW/Classes Winter 2021/Watershed MGMT/salish sea model/SSM_WQM_model_inputs/inputs/ssm_pnt_wqSHORTMODIFIED.dat"

In [4]:
variable_name_dict = {
    0: "Flow (OFF, b/c from FVCOM)",
    1: "Temperature (OFF, b/c from FVCOM)",
    2: "Salinity (OFF, b/c from FVCOM)",
    3: "TSS",
    4: "Algal 1 (Algal group 1)",
    5: "Algal 2 (Algal group 2)",
    6: "Algal 3 (Algal group 3) (unused)",
    7: "Zooplankton 1 (Zooplankton – species 1)",
    8: "Zooplankton 2 (Zooplankton species 2)",
    9: "Labile DOC (Labile dissolved organic carbon)",
    10: "Refractory DOC (Refractory dissolved organic carbon)",
    11: "Labile POC (Labile particulate organic carbon)",
    12: "Refractory POC (Refractory particulate organic carbon)",
    13: "Ammonium (NH4)",
    14: "Nitrate + Nitrite (NO3+NO2)",
    15: "Urea",
    16: "Labile DON (Labile dissolved organic nitrogen)",
    17: "Refractory DON (Refractory dissolved organic nitrogen)",
    18: "Labile PON (Labile particular organic nitrogen)",
    19: "Refractory PON (Refractory particulate organic nitrogen)",
    20: "Total PO4 (Total phosphate)",
    21: "Labile DOP (Labile dissolved organic phosphate)",
    22: "Refractory DOP (Refractory dissolved organic phosphate)",
    23: "Labile POP (Labile particulate organic phosphate)",
    24: "Refractory POP (Refractory particulate organic phosphate)",
    25: "Particulate inorganic P (Particulate inorganic phosphate)",
    26: "COD (Chemical oxygen demand)",
    27: "DO (Dissolved Oxygen)",
    28: "Particulate Silica",
    29: "Dissolved Silica",
    30: "internal P group for Alga 1, Droop model (currently off)",
    31: "internal P group for Alga 2, Droop model (currently off)",
    32: "internal P group for Alga 3, Droop model (currently off)",
    33: "DIC",
    34: "Alkalinity"
}

In [5]:
def read_data(input_file, num_params):
    """
    Params:
    input_file (str): path to input file
    num_params (int): number of parameters contained in the file. Usually 35.
    
    Returns:
    (df, header_lines): df is a dataframe containing the data separated by parameter, point source, 
        and date. header_lines is a list of strings containing all the header data that must be 
        written to the new file.
    """
    with open(input_file) as f:
        lines = [line.rstrip() for line in f]
    num_point_sources = int(lines[1])
    print(f"Found {num_point_sources} point sources")
    header_lines = lines[:num_point_sources*2+3]
    data_lines =  lines[num_point_sources*2+3:]
    num_daily_data = int(header_lines[-1])
    print(f"Found {num_daily_data} days of data")
    df_list = []
    for n_day in range(0, num_daily_data):
        day_num = data_lines[n_day*(num_params+1)]
        lines = data_lines[n_day*(num_params+1) + 1: n_day*(num_params+1) + 1 + num_params]
        df_list.append(__extract_daily_data(lines, day_num))
    df = pd.concat(df_list)
    df = df.reset_index(drop=True)
    df['hour'] = df['hour'].astype('float')
    return df, header_lines
          
def write_data(output_file, df, header_lines):
    """
    Params:
    output_file (str): path to output file.
    df (pandas.DataFrame): a dataframe containing data. such as is returned by the read_data function
                defined above.
    header_lines: list of strings, such as is returned by the read_data function defined above.
    """
    writer = open(output_file, "w")
    writer.write("\n".join(header_lines))
    for day, day_df in df.groupby('hour'):
        #generate a days worth of data which is composed of:
        #1. a first single line with the julian day
        writer.write("\n")
        day_line_string = "     {:.2f}".format(day)
        writer.write(day_line_string)
        #2. num_params lines of data, each line is a series of single-space-separated floats (formatted in sci notation),
        #    each line is num_point_sources floats long. make sure lines are written in the order of the parameter number.    
        for index, row in day_df.iloc[:,2:].iterrows():
            line_string = ' ' + ' '.join([ #add a space here because that's how the original file is
                '{:.3E}'.format(single_param_vals) for single_param_vals in row
            ])
            writer.write("\n")
            writer.write(line_string)
    writer.write("\n") #to put an empty line at the beginning, as the original files have
    writer.close()
          
def __extract_daily_data(lines, day_num):
    assert len(lines)==num_params, f"Expecting {num_params} lines of data"
    arr_list = []
    for i in range(0, len(lines)):
        line = lines[i]
        param_index = i
        arr_list.append(
            [float(x) for x in line.strip().split(' ')]
        )
    df = pd.DataFrame(arr_list)
    df.insert(0, 'hour', day_num)
    df.insert(0, 'param', df.index)
    return df

# Read Data 

In [6]:
num_params = 35

In [7]:
df, header_lines = read_data(input_file, num_params)
source_lines = header_lines[2:195]
source_names_series = pd.Series(source_lines).apply(lambda x: x.split(',')[1].split('---')[0].strip())
point_source_types = source_names_series.apply(lambda x: x.split(' - ')[1].split(' (')[0].strip())

Found 193 point sources
Found 365 days of data


In [8]:
len(source_names_series), len(point_source_types)

(193, 193)

In [9]:
for key,src in source_names_series.to_dict().items():
    print(str(key) + ': ' + src)

0: Fraser - River (ECY ID: 258)
1: Fraser - River (ECY ID: 258)
2: Nooksack - River (ECY ID: 238)
3: Nooksack - River (ECY ID: 238)
4: Samish_Bell south - River (ECY ID: 246)
5: Samish_Bell south - River (ECY ID: 246)
6: Skagit - River (ECY ID: 249)
7: Skagit - River (ECY ID: 249)
8: Stillaguamish - River (ECY ID: 253)
9: Stillaguamish - River (ECY ID: 253)
10: Snohomish - River (ECY ID: 251)
11: Snohomish - River (ECY ID: 251)
12: Lake Washington - River (ECY ID: 223)
13: Lake Washington - River (ECY ID: 223)
14: Green_Duwamish - River (ECY ID: 222)
15: Green_Duwamish - River (ECY ID: 222)
16: Puyallup - River (ECY ID: 201)
17: Puyallup - River (ECY ID: 201)
18: Nisqually - River (ECY ID: 207)
19: Nisqually - River (ECY ID: 207)
20: Budd_Deschutes - River (ECY ID: 209)
21: Budd_Deschutes - River (ECY ID: 209)
22: Tahuya - River (ECY ID: 254)
23: Tahuya - River (ECY ID: 254)
24: Skokomish - River (ECY ID: 250)
25: Skokomish - River (ECY ID: 250)
26: Hamma Hamma - River (ECY ID: 233)
27

In [10]:
source_names_series[source_names_series.str.contains('West Point')]

126    West Point - Point Source (ECY ID: 233)
dtype: object

In [11]:
df.head()

Unnamed: 0,param,hour,0,1,2,3,4,5,6,7,...,183,184,185,186,187,188,189,190,191,192
0,0,0.0,640.0,640.0,31.07,31.07,5.635,5.635,158.7,158.7,...,0.5063,0.113,0.006036,6.3e-05,0.000985,0.01353,0.1927,0.7577,0.1218,0.00019
1,1,0.0,3.73,3.73,5.939,5.939,5.939,5.939,5.939,5.939,...,15.06,15.06,15.06,15.06,15.06,15.06,15.06,15.06,15.06,15.06
2,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**<span style="color:red">USER INPUT REQUIRED</span>**

Choose sources and parameters to modify.
1. Look at the printed list of sources above and write the numerical index (on the left side) of the sources you want to modify in the `sources` list below.
2. Look at where the `variable_name_dict` variable is created above. Right the numerical index of the variables you want to modify, along with the constant value you want to set, in the `params_values_dict` dictionary below

For example, if I want to modify the West Point (126) and South King (119) WWTPs, setting 

| index        | Nitrogen           | mg/L  |
| ------------- |:-------------:| -----:|
| 13 | Ammonium          | 25 |
| 14 | Nitrate + Nitrite | 4 |
| 16 | Labile DON        | 2 |
| 18 | Labile PON        | 1 |

I enter the following:

In [12]:
sources = [126, 119]
params_values_dict = {
    13: 25.0, 
    14: 4.0,
    16: 2.0, 
    18: 1.0
}

# Modify Specified Parameters and Point Sources

In [14]:
print(f'Modifying sources {sources}')
for src in sources:
    print(f'\t{source_names_series.to_dict()[src]}')
print()
for param_index, param_value in params_values_dict.items():
    print(f"\tSetting param {param_index} {variable_name_dict[param_index]} to {param_value}")
    mod_df.loc[mod_df.param==param_index, sources] = param_value

Modifying sources [126, 119]
	West Point - Point Source (ECY ID: 233)
	South King - Point Source (ECY ID: 226)

	Setting param 13 Ammonium (NH4) to 25.0
	Setting param 14 Nitrate + Nitrite (NO3+NO2) to 4.0
	Setting param 16 Labile DON (Labile dissolved organic nitrogen) to 2.0
	Setting param 18 Labile PON (Labile particular organic nitrogen) to 1.0


In [16]:
pd.options.display.max_rows
pd.options.display.max_columns=500

In [19]:
df[[126,119]].head(50)

Unnamed: 0,126,119
0,3.966,3.14
1,15.06,15.06
2,0.0,0.0
3,0.0,0.0
4,0.0,0.0
5,0.0,0.0
6,0.0,0.0
7,0.0,0.0
8,0.0,0.0
9,3.893,4.083


In [20]:
mod_df[[126,119]].head(50)

Unnamed: 0,126,119
0,3.966,3.14
1,15.06,15.06
2,0.0,0.0
3,0.0,0.0
4,0.0,0.0
5,0.0,0.0
6,0.0,0.0
7,0.0,0.0
8,0.0,0.0
9,3.893,4.083


# Write Data

In [22]:
write_data(output_file, mod_df, header_lines)

# Check that there is no difference or there is a difference

In [None]:
infile = input_file.replace(" ", "\ ")
outfile = output_file.replace(" ", "\ ")
!cmp --silent {infile} {outfile} && echo '### SUCESS: Files are the same!'|| echo '### WARNING: Files Are Different! ###'

In [None]:
infile, outfile