<a href="https://www.hydroffice.org/epom/"><img src="images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

<a href="https://piazza.com/e-learning_python_for_ocean_mapping/summer2019/om000/home"><img src="images/help.png" alt="ePOM" title="Ask questions on Piazza.com" align="right" width="10%" alt="Piazza.com\"></a>
# Wrapping Up Notions

This is the last notebook of this collection. We will not introduce new big concepts, but we will apply what has been discussed in the past notebooks.

We will do it by creating a class to hold the data and the functions for reading and writing a data format that is more complex that the ones that we met up to now.

This is the text content of the `ctd.txt` file in the `data` folder:

![ctd_txt](images/010_000_ctd_txt.png)

As you can see in the above image, the first four rows contain some metadata information about when and where the data were collected.

The rows starting from the fifth line has a structure of four columns, with measures of depth, sound speed, temperature, and salinity.

The collection of multiple oceanographic measures is common for a [CTD instrument](https://en.wikipedia.org/wiki/CTD_(instrument)). 

## Data Class Creation

As done in the [A Class as a Data Container notebook](008_A_Class_as_a_Data_Container.ipynb), we will first create a class with the `init(self)` special method:

In [24]:
import os

class CTDData:
    """A classr for CTD data"""
    
    def __init__(self):
        self.metadata = dict()        
        self.depth_values = list()
        self.ss_values = list()        
        self.temp_values = list()
        self.sal_values = list()

The above class is richer of attributes than the previous ones that we have created. In fact, we need to accommodate the metadata and four columns representing different types of measures. 

## Data Path Retrieval

We will now retrieve the full path to the `ctd.txt` file:

In [25]:
def get_data_paths():
    data_paths = list()
    cur_folder = os.path.abspath(os.path.curdir)
    data_folder = os.path.join(cur_folder, "data")
    data_filenames = os.listdir(data_folder)
    
    for data_filename in data_filenames:
        data_path = os.path.join(data_folder, data_filename)
        data_paths.append(data_path)
    
    data_paths.sort()  # sort in alphabetical order
    
    return data_paths

retrieved_paths = get_data_paths()
input_path = retrieved_paths[0]
print("input path: " + input_path)

input path: C:\code\hyo2\epom\python_basics\data\ctd.txt


## Creating a Data Reading Function

Similarly to what we did in the [Read a Text File section](006_Read_and_Write_Text_Files.ipynb#Read-a-Text-File), we will define a function to read the data:

In [26]:
def read_ctd_data(path, data):
    # check whether the passed file does not exist
    if not os.path.exists(path):
        raise RuntimeError("Unable to locate %s" % (path, ))

    # read the file content
    ctd_file = open(path)
    ctd_content = ctd_file.read()
    ctd_file.close()

    ctd_lines = ctd_content.splitlines()
    count = 0  # to count the number of read rows
    for ctd_line in ctd_lines:

        if count < 4: # metadata 
            meta_pair = ctd_line.split()
            data.metadata[meta_pair[0]] = meta_pair[1]

        else:  # measures
            measures = ctd_line.split()
            data.depth_values.append(float(measures[0]))
            data.ss_values.append(float(measures[1]))
            data.temp_values.append(float(measures[2]))
            data.sal_values.append(float(measures[3]))

        count += 1  # it is equal to write: count = count + 1

In the above code we have used the [`str.split()`](https://docs.python.org/3.6/library/stdtypes.html?#str.split) method. 

This method returns a list of the words in a string by splitting it using the delimiter (e.g., `":"`) passed as a parameter. 

In [27]:
time_str = "14:02:39"
time_list = time_str.split(":")
print("The resulting list after splitting time_str is: %s" % (time_list, ))

The resulting list after splitting time_str is: ['14', '02', '39']


In case that a parameter is **not** specified (as we did for the measures section of the code), the following splitting algorithm is applied: *"runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace."*

In [28]:
sample_str = "0.003    1501.09     3.7610    25.0900"
sample_list = sample_str.split()
print("The resulting list after splitting sample_str is: %s" % (sample_list, ))

The resulting list after splitting sample_str is: ['0.003', '1501.09', '3.7610', '25.0900']


## Reading the Data

It is now time to create an instance (**instantiate**) our `CTDData` class and to call our `read` function:

In [31]:
ctd_data = CTDData()
read_ctd_data(input_path, ctd_data)
print("The metadata are: %s" % (ctd_data.metadata, ))
print("Nr. of samples: %s" % (len(ctd_data.depth_values), ))

The metadata are: {'date': '02/10/2016', 'time': '14:02:39', 'latitude': '43.13555', 'longitude': '-70.9395'}
Nr. of samples: 30


The data should be now **loaded in memory**. 

We can check the success of this operation by printing depths and sound speed values. We will do this printing by accessing the values by index with the help of the [`range()`](https://docs.python.org/3.6/library/stdtypes.html?#range) type.

A `range()` with an integer value as single parameter represents an immutable sequence of numbers from 0 to the value passed as a parameter:

In [34]:
for value in range(10):
    print("Current range value: %s" % (value, ))

Current range value: 0
Current range value: 1
Current range value: 2
Current range value: 3
Current range value: 4
Current range value: 5
Current range value: 6
Current range value: 7
Current range value: 8
Current range value: 9


Thus, we can use `range()` with the number of loaded samples to print all the values in the `depth_values` and `ss_values` lists preceded by the corresponding index:

In [35]:
nr_of_samples = len(ctd_data.depth_values)

for index in range(nr_of_samples):
    print("%s %.3f %.2f" % (index, ctd_data.depth_values[index], ctd_data.ss_values[index]))

0 0.003 1501.09
1 0.076 1506.66
2 0.167 1511.29
3 0.256 1513.82
4 0.574 1514.82
5 0.801 1514.35
6 1.589 1515.11
7 1.658 1515.11
8 2.535 1515.15
9 3.394 1515.24
10 4.087 1515.33
11 4.761 1515.35
12 6.879 1515.54
13 7.721 1515.54
14 8.412 1515.54
15 9.453 1515.55
16 9.912 1515.55
17 10.010 1515.55
18 10.087 1515.55
19 11.480 1515.56
20 12.104 1515.56
21 13.342 1515.58
22 13.499 1515.58
23 13.642 1515.56
24 14.383 1515.43
25 14.508 1515.43
26 14.968 1515.23
27 15.532 1515.24
28 15.924 1515.15
29 16.088 1511.91


It worked! We have been able to read a complex format in just a few lines of code.

***

<img align="left" width="6%" style="padding-right:10px; padding-top:10px;" src="images/refs.png">

## Useful References

* [The official Python 3.6 documentation](https://docs.python.org/3.6/index.html)
* [CTD instrument](https://en.wikipedia.org/wiki/CTD_(instrument))

<img align="left" width="5%" style="padding-right:10px;" src="images/email.png">

*For issues or suggestions related to this notebook, write to: epom@ccom.unh.edu*

<!--NAVIGATION-->
[< A Class as a Data Container](008_A_Class_as_a_Data_Container.ipynb) | [Contents](index.ipynb) | [Congrats >](congrats.ipynb)