# Notebook for predicting water masses from a  file with samples data


In [None]:
from joblib import load
import csv
import sklearn #version 1.1.2 or superior required

If you need to upgrade sklearn you can do it by the following command. WARNING: This might depend on your system configuration.

!pip install scikit-learn==1.1.2

We download the model. WARNING: large file, approx. 319Mb

This step is only necessary if you don't have the model already downloaded.

The following command downloads it using wget. In case you don't have wget in your system, you can either install it or directly download the model from the following url:
https://github.com/joheras/water-masses-inference/releases/download/v1.0/model_regr_et_combined.joblib

Once downloaded, place it in the same folder in which you have this script.

In [None]:
!wget https://github.com/joheras/water-masses-inference/releases/download/v1.0/model_regr_et_combined.joblib -O model_regr_et_combined.joblib   

Once the model file is downloaded, we load it into the system

In [2]:
model = load('model_regr_et_combined.joblib') 

Your data should be in a csv file called 'input_file.csv', if the name of your file is different, just change 'input_file.csv' in the code below. This file must have columns named Latitude (in decimal degrees), Longitude (in decimal degrees), CTDPRS (depth in meters), CTDPOT (potential temperature in Celsius degrees) and CTDSAL (salinity) with the corresponding data. The file should be placed in the same folder in which you have this script (otherwise, indicate the complete path).

A csv file named 'output_file.csv' will be produced, containing your original input file plus added columns named according to the 15 water masses used in the model.

In [4]:
labels = ['EDW','ENACW12', 'WNACW7', 'SPMW', 'SACWT12', 'SACWE12', 'WW', 'AAIW5','AAIW3', 'MW', 'LSW', 'ISOW', 'DSOW', 'CDW', 'WSDW', 'SAIW']
with open('input_file.csv', newline='') as csvfile:
 with open('output_file.csv', 'w') as f:
  mywriter = csv.writer(f, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
  #mywriter.writerow(labels)
  data = csv.DictReader(csvfile)
  input_labels= data.fieldnames
  mywriter.writerow(input_labels+labels)
  for row in data:
   prediction = model.predict([[row['Latitude'],row['Longitude'],row['CTDPRS'],row['CTDPOT'],row['CTDSAL']]])
   probs = prediction[0]
   res = {labels[i]: float(probs[i])*100 for i in range(len(labels))}
   row.update(res)
   w = csv.DictWriter(f, row.keys())
   w.writerow(row)