<a href="https://www.hydroffice.org/epom/"><img src="../../python_basics/images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

# Lab A, Step 1: File Parsing

In this Notebook you will create classes for handling various ocean mapping data. You may use these classes for the Lab A assignment in the Integrated Seabed Mapping Systems course. 

To refresh your memory on reading **text files** please refer to the ePOM *Programming Basics with Python for Ocean Mapping* [Read and Write Text Files](../../python_basics/006_Read_and_Write_Text_Files.ipynb) notebook. Similarly for help with **classes** use the [A class as a Data Container](../../python_basics/008_A_Class_as_a_Data_Container.ipynb) notebook.

Progressing through this notebook You will create the class definitions and keep adding **code** to it. Each class definition will be contained in a single code cell in a notebook that has the same name as the class. 

---
___

## 1.1 Time Series Data Class Definitions


---
### 1.1.1 Class Definition

In the code cell below [define the class](../../python_basics/008_A_Class_as_a_Data_Container.ipynb) using the `class` keyword, the class name `WaterLevelData` and a `:` and the `docstring` """A Class for Water Level Data""""


In [17]:
import os.path
from datetime import datetime, timezone

class WaterLevelData:
    """A Class for Water Level Data"""

    def __init__(self):

        # The data attributes
        self.epochs = list()
        self.water_levels = list()
        self.metadata = dict()
        self.metadata["units"] = "m"
        self.metadata["geoid"] = None
        self.metadata["start_time"] = None
        self.metadata["end_time"] = None
        self.metadata["count"] = None

    # The I/O methods:

    def read_jhc_file(self, fullpath):

        # Check the File's existence
        if os.path.exists(fullpath):
            self.metadata["Source File"] = fullpath
            print('Opening water level data file:' + fullpath)
        else:  # Raise a meaningful error
            raise RuntimeError('Unable to locate the input file' + fullpath)

        # Open, read and close the file
        wl_file = open(fullpath)
        wl_content = wl_file.read()
        wl_file.close

        # Tokenize the contents
        wl_lines = wl_content.splitlines()
        count = 0  # initialize the counter for the number of rows read
        for wl_line in wl_lines:
            observations = wl_line.split()  # Tokenize the string
            epoch=datetime.fromtimestamp(float(observations[5]), timezone.utc)
            self.epochs.append(epoch)
            self.water_levels.append(float(observations[6]))
            count += 1

---
### 1.1.2 Create a Class instance 

In the code cell below create an instance of the `WaterLevelData` class called water_level_data and check its type

In [18]:
water_level_data = WaterLevelData()
type(water_level_data)

__main__.WaterLevelData

---
### 1.1.3 Add Attributes to the Class


<img src="../../Images/TideFile.png">

In the cell above you see the contents of the file `Lab_A_TIDE.txt`. As you can see there are no **metadata** contained within the file. Each row in the data file represents a record consisting of a specific time and date (**epoch**), and an associated water level observation in meters. The epoch for each record is represented two times in this file: first by a combination of the Year, year-day, hour and minute as **integer** values and seconds as **float** values. The second representation of time is as seconds since Universal Time Coordinated (UTC) midnight January 1, 1970.

<img align="left" width="6%" style="padding-right:10px;" src="../../Images/info.png">

**POSIX Time** (alternatively **Unix time** or **Unix Epoch time**) is the number of elapsed seconds since  midnight January 1, 1970 UTC.

The benefit of using POSIX time for data is that it is independent of the time zone in which the data is collected, and that you may represent an epoch by a single number, thereby simplifying math involving time spans.

Add the [`__init__`](../../python_basics/008_A_Class_as_a_Data_Container.ipynb) method to to the `WaterLevelData` class in the code cell of step 1.1.1. In the `__init__` method add the attributes `epochs` and `water_levels` as `lists`. If you forget how to do this you may look at the example in [A Class as a Data Container](../../python_basics/008_A_Class_as_a_Data_Container.ipynb).

**Click on Kernel $\rightarrow$ Restart & Run All.** This to make sure that the updates you made propagate through.

The code cell below list the attributes associated to the class using the `__dict__` `magic` or `dunder` method. For now you do not have to know how `dunder methods` work, just that this particular one provides a convenient way to get a list of the `attributes` of a class.

In [19]:
water_level_data.__dict__.keys()

dict_keys(['epochs', 'water_levels', 'metadata'])

---
### 1.1.4 [Metadata](../../python_basics/007_Dictionaries_and_Metadata.ipynb)


The `WaterLevelData` class you created currently defines a container that can hold water level data. To make it more useful we will add [`metadata`](../../python_basics/007_Dictionaries_and_Metadata.ipynb). You may ask what metadata should I add? The answer to the question above is not at all trivial! 

Take for example the epochs in the tide data file: they are simply represented as numbers and it is up to us to interpret them. It is desirable that in the WaterLevelData the epochs are represented as specific moments in time that are not open to interpretation. We can achieve this in a number of manners, namely by adding `metadata` descriptive of the time base used, which can get very messy (are the years Julian or Gregorian, is the time 'UTC' time, 'GPS' time, etc...). Fortunately Python provides the `datetime` module that supplies the functionality needed. 

In the code snippet below the use of a `datetime` object is illustrated, using the POSIX time of the first data record in the file `Lab_A_TIDE.txt`:

In [20]:
from datetime import datetime, timezone
epoch_naive=datetime.fromtimestamp(1304479800.000)
print("Epoch in YYYY-MM-DD HH:MM:SS format: " + str(epoch_naive))

# You can verify the year date (as is shown in the tide file) as follows
print("Year date: " + str(epoch_naive.timetuple().tm_yday))
print("Time Zone: " +str(epoch_naive.tzinfo))

Epoch in YYYY-MM-DD HH:MM:SS format: 2011-05-04 03:30:00
Year date: 124
Time Zone: None


The time zone is None! **This does not represent: 'a specific moment in time that is not open to interpretation'!** The representation of the time by `epoch_naive` is said to be **naive** if the timezone is `None`, as opposed to be **aware** when the timezone is specified. The `epoch_naive.tzinfo` attribute is metadata that describes the properties of the time contained in an object of type `datetime`.

It is important that you start understanding the help provided by the Python documentation. Look at the class method `datetime.fromtimestamp` section in the `datetime` Python documentation and use it to update the code below so that the time zone is printed as `UTC`

In [21]:
epoch=datetime.fromtimestamp(1304479800.000, timezone.utc)
print("Epoch in YYYY-MM-DD HH:MM:SS format: " + str(epoch))

# You can verify the year date (as is shown in the tide file) as follows
print("Year date: " + str(epoch.timetuple().tm_yday))
print("Time Zone: " +str(epoch.tzinfo))

Epoch in YYYY-MM-DD HH:MM:SS format: 2011-05-04 03:30:00+00:00
Year date: 124
Time Zone: UTC


<img align="left" width="6%" style="padding-right:10px;" src="../../Images/key.png">

in ocean mapping almost all data is integrated on a time basis. It is therefore of key importance that you handle time consistently. By specifying the epochs as `aware` objects of class `datetime` the metadata is included in the epochs as the `tzinfo` attribute.

---
### 1.1.5 Add a Metadata Attribute to the Class

In the previous section you have seen that the `datetime` includes the necessary metadata to interpret the epochs consistently. Thus if we add datetime objects to the `WaterLevelData` epochs `list` we do not need to add futher metadata to correctly interpret the time.

We can, however, add other metadata to the class. In the code cell from step 1.1.1  Add the attribute metadata as a dict (for more information on dictionaries see the [Dictionaries and Metadata](../../python_basics/007_Dictionaries_and_Metadata.ipynb) notebook).

**Click on Kernel $\rightarrow$ Restart & Run All.** This to make sure that the updates you made to the cell propagate through.

In the code cell below add a line that prints the metadata contained in the `water_level_data` object. 

In [22]:
print(water_level_data.metadata)

{'units': 'm', 'geoid': None, 'start_time': None, 'end_time': None, 'count': None}


---
### 1.1.6 Populate the Metadata Attribute

The purpose of the `WaterLevelData` class is to hold the data contained in a water level file such as Lab_A_TIDE.txt. To interpret the numbers specified in the files we need to know what  the observations represent. In this case they are heights in meters above the ['EGM96' **geoid**](https://en.wikipedia.org/wiki/EGM96) model (a **geoid** is a model that represents the shape of the Earth and is defined by the equipotential gravity surface that most closely matches Mean Sea Level).

#### 1.1.6.a The 'units' attribute

We always want the units to be meters by default so you should add the key "units" with as a value the string "m" to the metadata attribute in the __init__ method of the WaterLevelData class in the code cell of section 1.1.1.

**Click on Kernel $\rightarrow$ Restart & Run All.** This to make sure that the updates you made propagate through. Note that the code cell of section 1.1.4 now prints the `tuple` {'units': 'm'}.

#### 1.1.6.b The 'geoid' attribute

To properly reference the *geometric* heights (heights above the geoid) you need to know what geoid is used. Unlike the units we do not have a preference for what the geoid should be. Therefore at the time of creation we can set this attribute to `None` by default and then change it later as needed.

Add the key "geoid" with None as a value metadata attribute in the init method of the WaterLevelData class in the code cell of section 1.1.1.

**Click on Kernel $\rightarrow$ Restart & Run All.** This to make sure that the updates you made propagate through. Note that the code cell of section 1.1.4 now prints the `tuple` {'units': 'm', 'geoid': None}

In the code cell below change the value associated to the key `water_level_data.metadata["geoid"]` to the string "EGM96" and then print the complete metadata for the  `water_level_data` instance.

In [23]:
water_level_data.metadata["geoid"]="EGM96"
print(water_level_data.metadata)

{'units': 'm', 'geoid': 'EGM96', 'start_time': None, 'end_time': None, 'count': None}


### 1.1.7 Add the read_jhc_file method to the class

There are a myriad of file formats for water level data. The file included with this assignment was created by John Hughes Clarke (jhc) so for convenience we will refer to the format of the file as the 'jhc' water level file format.

<img align="center"  src="../../Images/read_jhc_file.png">

To read the data and load it into your `water_level_data` object it would be convenient if we can tell the object to read a 'jhc' water level file. To achieve this add the method `read_jhc_file` shown above to the class definition in section 1.1.1. Also, make sure to import the os.path module in that code section

<img align="left" width="6%" style="padding-right:10px;" src="../../Images/info.png">

Visit the ePOM [Write Your Own Functions](../../python_basics/005_Write_Your_Own_Functions.ipynb) notebook to learn more about writing functions.<br>
Visit the ePOM [Read and Write Text Files](../../python_basics/006_Read_and_Write_Text_Files.ipynb) notebook to learn more about text file I/O.  

In the code cell below add code that assigns the string containing the absolute path to the current directory and assign it to the variable `fullpath`. Augment the `fullpath` string with a slash and the name of the water level data file i.e., `"Lab_A_TIDE.txt"` and print the value of `fullpath`. 

In [24]:
fullpath=os.path.abspath(os.path.curdir)
fullpath+="/Lab_A_TIDE.txt"
print(fullpath)

/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_TIDE.txt


In the code cell below call the `read_jhc_file()` method with the argument `fullpath` of the `water_level_data` instance, then print its metadata

In [25]:
water_level_data.read_jhc_file(fullpath)
print(water_level_data.metadata)

Opening water level data file:/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_TIDE.txt
{'units': 'm', 'geoid': 'EGM96', 'start_time': None, 'end_time': None, 'count': None, 'Source File': '/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_TIDE.txt'}


### 1.1.8 Read a File Using the read_jhc_file Method

You now have written a lot of code, but you still have not added any of the contents of the data file to your `water_level_data` object! Look at the code in the [Read and Write Text Files](../../python_basics/006_Read_and_Write_Text_Files.ipynb) notebook to

- Add code that opens the file specified by  the argument `fullpath`
    - test
- 



## 1.2 Class Definition for GNSS Data 

In [26]:
import os.path
from datetime import datetime, timezone

class GNSS_Data:
    """A Class for GNSS Data"""

    def __init__(self):

        # The data attributes
        self.epochs = list()
        self.latitudes = list()
        self.longitudes = list()
        self.ortho_heights = list()
        self.metadata = dict()
        self.metadata["units"] = "m"
        self.metadata["geoid"] = None        
        self.metadata["start_time"] = None
        self.metadata["end_time"] = None
        self.metadata["count"] = None


    # The I/O methods:

    def read_jhc_file(self, fullpath):

        # Check the File's existence
        if os.path.exists(fullpath):
            self.metadata["Source File"] = fullpath
            print('Opening water level data file:' + fullpath)
        else:  # Raise a meaningful error
            raise RuntimeError('Unable to locate the input file' + fullpath)

        # Open, read and close the file
        gnss_file = open(fullpath)
        gnss_content = gnss_file.read()
        gnss_file.close

        # Tokenize the contents
        gnss_lines = gnss_content.splitlines()
        count = 0  # initialize the counter for the number of rows read
        for gnss_line in gnss_lines:
            observations = gnss_line.split()  # Tokenize the string
            epoch=datetime.fromtimestamp(float(observations[5]), timezone.utc)
            self.epochs.append(epoch)
            self.latitudes.append(float(observations[6]))
            self.longitudes.append(float(observations[7]))
            self.ortho_heights.append(float(observations[8]))
            count += 1

In [27]:
pos_data = GNSS_Data()
fullpath=os.path.abspath(os.path.curdir)
fullpath+="/Lab_A_GNSS.txt"
print(fullpath)
pos_data.read_jhc_file(fullpath)

/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_GNSS.txt
Opening water level data file:/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_GNSS.txt


## 1.3 Class Definition for TWTT Data

In [28]:
import os.path
from datetime import datetime, timezone

class TWTT_Data:
    """A Class for Two Way Travel Time Data"""

    def __init__(self):

        # The data attributes
        self.epochs = list()
        self.twtts = list()
        self.metadata = dict()
        self.metadata["units"] = "s"
        self.metadata["start_time"] = None
        self.metadata["end_time"] = None
        self.metadata["count"] = None

    # The I/O methods:

    def read_jhc_file(self, fullpath):

        # Check the File's existence
        if os.path.exists(fullpath):
            self.metadata["Source File"] = fullpath
            print('Opening water level data file:' + fullpath)
        else:  # Raise a meaningful error
            raise RuntimeError('Unable to locate the input file' + fullpath)

        # Open, read and close the file
        twtt_file = open(fullpath)
        twtt_content = twtt_file.read()
        twtt_file.close

        # Tokenize the contents
        twtt_lines = twtt_content.splitlines()
        count = 0  # initialize the counter for the number of rows read
        for twtt_line in twtt_lines:
            observations = twtt_line.split()  # Tokenize the string
            epoch=datetime.fromtimestamp(float(observations[5]), timezone.utc)
            self.epochs.append(epoch)
            self.twtts.append(float(observations[6]))
            count += 1

In [29]:
twtt_data = TWTT_Data()
fullpath=os.path.abspath(os.path.curdir)
fullpath+="/Lab_A_TWTT.txt"
print(fullpath)
twtt_data.read_jhc_file(fullpath)


/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_TWTT.txt
Opening water level data file:/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_TWTT.txt


## 1.4 Class Definition for MRU Data

Note the need for radians, need to import the math module (or scipy module)

In [33]:
import os.path
from datetime import datetime, timezone
from math import pi

class Motion_Data:
    """A Class for motion Data"""

    def __init__(self):

        # The data attributes
        self.epochs = list()
        self.yaw = list()        
        self.roll = list()        
        self.pitch = list()        
        self.heave = list()
        self.metadata = dict()
        self.metadata["units"] = "rad"        
        self.metadata["start_time"] = None
        self.metadata["end_time"] = None
        self.metadata["count"] = None


    # The I/O methods:

    def read_jhc_file(self, fullpath):

        # Check the File's existence
        if os.path.exists(fullpath):
            self.metadata["Source File"] = fullpath
            print('Opening water level data file:' + fullpath)
        else:  # Raise a meaningful error
            raise RuntimeError('Unable to locate the input file' + fullpath)

        # Open, read and close the file
        motion_file = open(fullpath)
        motion_content = motion_file.read()
        motion_file.close

        # Tokenize the contents
        motion_lines = motion_content.splitlines()
        count = 0  # initialize the counter for the number of rows read
        for motion_line in motion_lines:
            observations = motion_line.split()  # Tokenize the string
            epoch=datetime.fromtimestamp(float(observations[5]), timezone.utc)
            self.epochs.append(epoch)
            self.yaw.append(float(observations[6])*pi/180)
            self.pitch.append(float(observations[7])*pi/180)
            self.roll.append(float(observations[8])*pi/180)
            self.heave.append(float(observations[9])*pi/180)
            count += 1

In [35]:
motion_data = Motion_Data()
fullpath=os.path.abspath(os.path.curdir)
fullpath+="/Lab_A_MRU.txt"
print(fullpath)
motion_data.read_jhc_file(fullpath)


/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_MRU.txt
Opening water level data file:/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_MRU.txt


# 2 Spatial Data Class Definition

In this section you will define a class for a sound speed profile. This is a little different in nature as it contains data obtained at a single epoch at a given location


## 2.1 Class Definition for Sound Speed Profile Data

Will hold a single profile in each object

In [136]:
import os.path
from datetime import datetime, timezone
from math import pi


class SVP_Data:
    """A Class for motion Data"""

    def __init__(self):

        # The data attributes
        self.obs_epoch = None
        self.log_epoch = None
        self.obs_latitude = None
        self.obs_longitude = None
        self.vessel_latitude = None
        self.vessel_longitude = None
        self.obs_sample = list()
        self.obs_depth = list()
        self.obs_ss = list()

        self.metadata = dict()
        self.metadata["units"] = "rad"
        self.metadata["count"] = None

    # The I/O methods:

    def read_jhc_file(self, fullpath):

        # Check the File's existence
        if os.path.exists(fullpath):
            self.metadata["Source File"] = fullpath
            print('Opening water level data file:' + fullpath)
        else:  # Raise a meaningful error
            raise RuntimeError('Unable to locate the input file' + fullpath)

        # Open, read and close the file
        motion_file = open(fullpath)
        motion_content = motion_file.read()
        motion_file.close

        # Tokenize the contents
        motion_lines = motion_content.splitlines()
        self.obs_epoch = datetime.fromtimestamp(float(motion_lines[1].split()[0]), timezone.utc)
        self.log_epoch = datetime.fromtimestamp(float(motion_lines[2].split()[0]), timezone.utc)
        self.obs_latitude = float(motion_lines[3].split()[0])
        self.obs_longitude = float(motion_lines[3].split()[1])
        self.vessel_latitude = float(motion_lines[4].split()[0])
        self.vessel_longitude = float(motion_lines[4].split()[1])
        self.metadata["count"] = float(motion_lines[5].split()[0])

        count = 0  # initialize the counter for the number of rows read

        for motion_line in motion_lines[16:]:
            observations = motion_line.split()  # Tokenize the string
            self.obs_sample.append(float(observations[0]))
            self.obs_depth.append(float(observations[1]))
            self.obs_ss.append(float(observations[2]))
            count += 1

        if self.metadata["count"] != count:
            raise RuntimeError('Nr of Samples read ('+str(count) +
                               ') does not match metadata count (' +
                               str(self.metadata["count"])+')')

In [138]:
svp_data = SVP_Data()
fullpath=os.path.abspath(os.path.curdir)
fullpath+="/Lab_A_SVP.txt"
print(fullpath)
svp_data.read_jhc_file(fullpath)


/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_SVP.txt
Opening water level data file:/home/jupyter-semme/ESCI_OE_774_874/Lab_A/Lab_A_SVP.txt


<img align="left" width="5%" style="padding-right:10px;" src="../../python_basics/images/email.png">

*For issues or suggestions related to this notebook, write to: epom@ccom.unh.edu*