# Visualizing Open Source Acoustic Metrics and Their Relationship to Satellite Oceanography: Part 1 (Python)

## Emily Speciale, Charles Anderson, Veronica Martinez, Carrie Wall Bell, & Adrienne Copeland

This script outlines how to extract backscatter data from a cruise, mask the data for seafloor and surface noise, and collect the cruise's echometric data into a csv file. We recommend using PyCharm for this procedure. 

Upon downloading and opening PyCharm, click "Get from VCS". Paste the URL for pyEchoLab (https://github.com/CI-CMG/pyEcholab), then press "Clone". Repeat this for Echometrics (https://github.com/ElOceanografo/EchoMetrics). Then, open up your pyEchoLab workspace, click File > Open, and choose your Echometrics folder. When the pop-up appears, click "Attach" to install Echometrics into your pyEchoLab library. 

If using Jupyter Notebook, you can use terminal to download both pyEchoLab and Echometrics. Use the commands:
pip install git+https://github.com/CI-CMG/pyEcholab.git
pip install git+https://github.com/ElOceanografo/EchoMetrics

The other step we must take before beginning is downloading our cruise data. We recommend storing this data on a hard drive, as cruise data can contain up to hundreds of gigabytes of data. For this notebook, we are working with the Okeanos Explorer EX1903L1 and EX1903L2 raw data, which can be seen here: https://noaa-wcsd-pds.s3.amazonaws.com/index.html#data/raw/Okeanos_Explorer/EX1903L1/EK60/. 

In order to download the data onto your computer, make sure set up an AWS account and download AWS CLI, and use your terminal for the download. We created folders specifically for EX1903L1 and EX1903L2 data in our hard drive. For instance, in order to download EX1903L1, we entered into our terminal: 
aws s3 sync s3://noaa-wcsd-pds/data/raw/Okeanos_Explorer/EX1903L1/EK60/  /Volumes/Emily_Passport/EX1903L1/ --no-sign-request

We then did the same for the EX1903L2. Both .raw and .idx files will download into our folders. In order to make processing the data easier, we created an EX1903_raw folder, and moved only the raw files from EX1903L1 and EX1903L2 into that folder. *Warning, depending on your computer's strength, downloading this data may take a few days, with the download occasionally stopping. Just repeatedly enter the terminal line above and the data will continue to sync and download.*

Now that we have everything downloaded we can started coding!

In [None]:
from echolab2.instruments import EK80
from echolab2.processing.batch_utils import FileAggregator as fa
from echometrics import echometrics as ec
import numpy as np
from echolab2.plotting.matplotlib import echogram
from matplotlib.pyplot import figure, show, subplots_adjust
from echolab2.processing import afsc_bot_detector
from echolab2.processing.mask import mask
from echolab2.processing.line import line
try:
    import matplotlib.pyplot as plt
except ImportError:
    plt = None
    print("Warning: could not import from matplotlib. Echogram.show() will not work.")

## Setting Up New CSV File and Cruise Data

First we create a blank .csv file and write in our headings for our columns. 

In [None]:
output_file = open('cruise_data.csv', 'w')
output_file.write("Latitude,Longitude,Range,Time,Sv_Avg,Center_of_Mass,Inertia,"
                  "Proportion_Occupied,Aggregation_Index,Equivalent_Area\n")

For this cruise, we only want to examine data from the 18 kHz channel. After setting our freq to 18000, we use the file aggregator to bin the files into thirty minute time intervals. Thus, we will be collecting the average latitude, longitude, range, echometric value, etc. for each thirty minutes. We will be calling each time interval an index. 

In [None]:
freq = 18000
raw_files = fa('/Volumes/Emily_Passport/EX1903_raw', interval=30)

## Extracting Data, Masking Seafloor, and Collecting Echometric Data for each Index

We will be using a for loop to run through each index to extract the Sv data, apply masks, and collect that index's average lat/lon, range, and echometrics.

In [None]:
for index, file_list in enumerate(raw_files.file_bins):
    
    # Print the index number and the files within the index.
    file_list = file_list
    print(index, file_list)
    
    # Use the EK80 class to read the raw files within our index and collect data from all 
    # its channels. *Although EX1903 used an EK60, its EK60 had parts of an EK80, thus its
    # file format was produced in EK80 format. If using EK60 data, simply change the ek80s 
    # to ek60s. 
    ek80 = EK80.EK80()
    try:
        ek80.read_raw(file_list)
        
        # Get channel data only from our frequency of interest (18 kHz).
        raw_data = ek80.get_channel_data(frequencies=freq)
        
        # Get our mean volume-backscattering strength data, or Sv (backscatter) data, from our 18 kHz channel.
        Sv = raw_data[freq][0].get_Sv(heave_correct=True)
        
        # Optional: to check data, plot an echogram of the raw Sv data.
        fig = figure()
        ax1 = fig.add_subplot(2, 1, 1)
        eg = echogram.Echogram(ax1, Sv, threshold=[-145, 10])
        ax1.set_title('Original 18 kHz Sv Data')
        
        # There will be noise within the first 6 meters of data from the transducer. The seafloor/bottom 
        # also produces a lot of noise that we want to remove from our data. Create a mask to hold 
        # bottom and surface exclusion.
        bot_surf_mask = mask(like=Sv)
        
        # Create a surface exclusion line at data=Xm RANGE.
        surf_line = line(ping_time=Sv.ping_time, data=6)
        
        # Now apply our surface line to this same mask.
        bot_surf_mask.apply_line(surf_line, apply_above=True)
        
        # Since we'll be passing our bottom detector data on a depth grid, set this to the minimum 
        # DEPTH in meters to search for the bottom
        search_min = 15
        
        # Since we'll be passing Sv data to the bottom detector, set the backstepin Sv(dB). 
        back_step = 30
        bot_detector = afsc_bot_detector.afsc_bot_detector(search_min=search_min, backstep=back_step)
        
        # Now use our simple bottom detector to pick a bottom line for the data.The bottom detector 
        # class returns a pyEcholab2 line object representing the bottom.
        bottom_detected = bot_detector.detect(Sv)
        
        # Next create a new line that is 3 m shallower(in place operators will change the existing line).
        bottom_line = bottom_detected - 3.0
        
        # Now apply our bottom line to the mask.
        bot_surf_mask.apply_line(bottom_line, apply_above=False)
        
        # Now use this mask to set sample data from 0.5m above the bottom downward to NaN.
        Sv[bot_surf_mask] = np.NaN
        
        # Optional: to check data, plot an echogram of the Sv data now with a masked surface and bottom.
        ax_3 = fig.add_subplot(3, 1, 3)
        echogram_3 = echogram.Echogram(ax_3, Sv, 'data', threshold=[-145, 10])
        ax_3.set_title("Sv after bottom and surface removal")
        
    except Exception as e:
        print(e)
        continue
    
    # Now that we have our clean data, we want to find the latitude and longitude positions
    # of our indexes. We use the first latitude and longitude value of each index. 
    try:
        positions = ek80.nmea_data.interpolate(Sv, 'position')
        latitude = positions[1]['latitude']
        latitude = latitude[1]
        longitude = positions[1]['longitude']
        longitude = longitude[1]

    except Exception as e:
        print(e)
        continue
    
    # We will now use the class echogram from Echometrics to begin our calculations. The echogram
    # class requires these parameters: (data, depth, index, scale=decibel, threshold=[-80, 0], bad_data=None). 
    try:
        data = ec.Echogram(np.transpose(Sv.data), Sv.range, Sv.ping_time, scale="linear", 
                           threshold=[-80, -34], bad_data=None)
        
        # Get the average range and the first ping time for each index.
        Sv.range = np.average(Sv.range)
        Sv.ping_time = Sv.ping_time[0]
        
        # We can begin calculating the echometrics (more specifically, their averages for our indexes). 
        # Our first echometric is Sv, which is a proxy for density and has a unit of dB re 1 m^-1. 
        sv_avg = ec.sv_avg(data)
        sv_avg = np.average(sv_avg)
        
        # Our second echometric is center of mass, which describes the average location of noise in
        # ping, in units of meters (m).
        com = ec.center_of_mass(data)
        com = np.average(com)
        
        # Our third echometric is inertia, which measures the dispersion or spread of backscatter
        # in the ping in units of m^-2. 
        inertia = ec.inertia(data)
        inertia = np.average(inertia)
       
        # Our fourth echometric is proportion occupied, which calculates the proportion of the ping
        # with an Sv above -90 dB. 
        po = ec.proportion_occupied(data)
        po = np.average(po)
        
        # Our fifth echometric is equivalent area, which measures the evenness of the ping's backscatter.
        # It has units of meters (m).
        ai = ec.aggregation_index(data)
        ai = np.average(ai)
    
        # Our sixth echometric is the aggregation index, which is the opposite of equivalent area; when small
        # areas are denser than the rest of the distribution, AI is high. Its units are m^-1.
        ea = ec.equivalent_area(data)
        ea = np.average(ea)
        
        # Finally, we add all of our calculations of lat/lon, range, ping time, and our echometrics to our
        # .csv file. 
        data_list = f"{latitude},{longitude},{Sv.range},{Sv.ping_time},{sv_avg},{com},{inertia},{po},{ai},{ea}\n"
        output_file.write(data_list)
        
    except Exception as e:
        print(e)
        continue

# Upon collecting data for all the indexes, we close the .csv file.
output_file.close()