<a href="https://www.hydroffice.org/epom/"><img src="images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

<a href="https://piazza.com/class/js5dnu0q39n6qe"><img src="images/help.png" alt="ePOM" title="Ask questions on Piazza.com" align="right" width="10%" alt="Piazza.com\"></a>
# Dictionaries

It is time to add to your knowledge of Python containers. In this notebook we introduce the useful `dict` (dictionary) container.

Each item in a `dict` is represented by a pair: a key and a corresponding value. 

For example, a `dict` can be used to map a [chemical symbol](https://en.wikipedia.org/wiki/Symbol_(chemistry)) to the corresponding element name. Thus, a possible pair would be `H` as key and `Hydrogen` as the value.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A `dict` maps a set of indices, called **keys**, to a set of **values**.

## How to create and populate a `dict`

Similarly to [the creation approach #2 for lists](002_Lists_of_Variables.ipynb#Creation-of-a-List:-Approach-#2), you may create a empty dictionary by calling its constructor: `dict()`. Items are then added by using square brackets and assignments like in the code below: 

In [None]:
chem_dict = dict()
chem_dict["H"] = "Hydrogen"
chem_dict["He"] = "Helium"
chem_dict["Li"] = "Lithium"
chem_dict["Be"] = "Beryllium"
chem_dict["B"] = "Boron"

print(chem_dict)

Printing a `dict` will show the contents as follows:

- All the item pairs are printed between curly brackets (i.e., `{`, `}`). 
- The items are separated by a comma (e.g., the pair `"Li": "Lithium"` is an item). 
- For each item, the two parts are separated by a `:` with the key (e.g., `"Li"`) on the left and the value (e.g., `"Lithium"`) on the right.

We can also create a dictionary using the method shown below:

In [None]:
chem_dict = {"H": "Hydrogen", "He": "Helium", "Li": "Lithium", "Be": "Beryllium", "B": "Boron"}

print(chem_dict)

As in [approach #1 for lists](002_Lists_of_Variables.ipynb#Creation-of-a-List:-Approach-#1), the above method creates a `dict` and its items with a single statement.

The example below uses the same code as above, but splits the dictionary creation statement into rows for readability:

In [None]:
chem_dict = {
    "H": "Hydrogen",
    "He": "Helium",
    "Li": "Lithium",
    "Be": "Beryllium",
    "B": "Boron"
}

print(chem_dict)

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

By now, you may have noticed that the `str` variables in the above `chem_dict` are printed within single quotes `'` rather  than double quotes `"`. This is an alternative and valid way to define strings in Python. However, mixing `'` and `"` in the same string results in an error. For consistency, we always use `"` here.

When you print the content of a dictionary, you may have the items presented in an order that differs from the one that you used to populate the `dict`. This is **not** an error, but a specific property of a `dict`.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

A `dict` is an **unordered** container. The order of items insertion is not preserved. If you need to preserve the items order, Python provides a special dictionary called [`OrderedDict`](https://docs.python.org/3.9/library/collections.html?highlight=ordereddict#ordereddict-objects).

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Populate and print a dictionary that pairs symbols with sediment types based on the [International Scale](https://en.wikipedia.org/wiki/Grain_size#International_scale) (e.g., `"Cl"` for `"Clay"`).

In [None]:
int_scale_dict = dict()
int_scale_dict["LBo"] = "Large boulder"
int_scale_dict["Bo"] = "Boulder"
int_scale_dict["Co"] = "Cobble"
int_scale_dict["CGr"] = "Coarse gravel"
int_scale_dict["MGr"] = "Medium gravel"
int_scale_dict["FGr"] = "Fine gravel"
int_scale_dict["CSa"] = "Coarse sand"
int_scale_dict["MSa"] = "Medium sand"
int_scale_dict["FSa"] = "Fine sand"
int_scale_dict["CSi"] = "Coarse silt"
int_scale_dict["MSi"] = "Medium silt"
int_scale_dict["FsI"] = "Fine silt"
int_scale_dict["Cl"] = "Clay"

print(int_scale_dict)

In [None]:
int_scale_dict = dict()

print(int_scale_dict)

***

## Comparison between `dict` and `list`

A dictionary differs from a [list](002_Lists_of_Variables.ipynb) for several aspects:

| Topic | List  | Dictionary |
| :-----| :---- | :--------- |
| Brackets | Squared brackets: `[`, `]` | Curly brackets: `{`, `}` |
| Empty Constructor | `list()`, `[]` | `dict()`, `{}` |
| Indexing | Indices are only integers (`int`) | The indices can be of (almost) any type  |
| Ordered | Yes (The order of items is fixed.) | No (The order of items is unpredictable.) |


<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

In the table above, the *"(almost) any type"* for `dict` indexing is because the type must be [hashable](https://docs.python.org/3.9/glossary.html). For now you do not have to worry what this means as it is outside the scope of this notebook. If, however, you want to learn about the hash function go [here](https://en.wikipedia.org/wiki/Hash_function).

***

# What is Metadata?

In ocean mapping you will encounter [Metadata](https://en.wikipedia.org/wiki/Metadata), which is of critical importance. 

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

**Metadata** are a set of data that gives information about other data.

One of the most common uses of metadata is to help discover and identify data resources. 

There are different [metadata standards](https://en.wikipedia.org/wiki/Metadata#Standards) for different fields of study. In ocean mapping you may encounter many of these standards. However, in this notebook we will not explore them.

***

## A `dict` as a Metadata Container

We will now explore the use of a `dict` as a [metadata](https://en.wikipedia.org/wiki/Metadata) container.  

Following our previous examples of experiments collecting water salinity and temperature values, we will use a `dict` to store metadata such as:

- The author of the observations (`"first_name"` and `"last_name"`).
- The location where the observations took place (`"latitude"` and `"longitude"`).
- The time range during which observations were taken (`"start_timestamp"` and `"end_timestamp"`).

A set of metadata may be represented by a `dict` containing the following six item pairs:

- `"first_name"` &#x279C; `str` type
- `"last_name"` &#x279C; `str` type
- `"latitude"` &#x279C; `float` type
- `"longitude"` &#x279C; `float` type
- `"start_timestamp"` &#x279C; `datetime` type
- `"end_timestamp"` &#x279C; `datetime` type

This is the first time that we use the [`datetime`](https://docs.python.org/3.9/library/datetime.html?#module-datetime) type.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A variable of `datetime` type represents a specific date and a time. Integration of data is typically done based on time in ocean mapping. Consistent handling of time is therefore critical. 

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

At CCOM/JHC there is an entire course dealing with the integration of data from various sensors used to map the seafloor.

The `datetime` constructor is part of the `datetime` module (they have the same name; see the [Python documentation](https://docs.python.org/3.9/library/datetime.html?#datetime-objects)) It takes several parameters: 

- `datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)` 

A `datetime` object needs to be constructed using at least three arguments (year, month and day), the others are assigned default values e.g, `hour=0`. This implies that, if you do *not* pass values for those parameters, Python will assign them those defined default values.

For the purpose of this notebook we will call the `datetime` constructor with 6 values (from `year` to `second`) and ignore the others.

In [None]:
from datetime import datetime # imports only the datetime constructor from the datetime module

example_timestamp = datetime(2019, 2, 22, 12, 32, 40)
print(str(example_timestamp))

We can now write our `metadata`:

In [None]:
metadata = dict()
metadata["first_name"] = "John"
metadata["last_name"] = "Doe"
metadata["latitude"] = 43.135555
metadata["longitude"] = -70.939534
metadata["start_timestamp"] = datetime(2019, 2, 22, 12, 32, 40)
metadata["end_timestamp"] = datetime(2019, 2, 22, 12, 34, 14)

print(metadata)

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Populate and print a `metadata` dictionary containing the following three keys: your `"username"`, the `"begin_time"` and the `"end_time"` for the execution of this exercise.

In [None]:
metadata = dict()
metadata["username"] = "jdoe"
metadata["begin_time"] = datetime(2019, 2, 22, 12, 34, 20)
metadata["end_time"] = datetime(2019, 2, 22, 12, 34, 21)

print(metadata)

***

# More on String Formatting

Finally, we will explore a different mechanism for printing (**string formatting**) a value in Python.

You already know how to print a value with `str` type:

In [None]:
metadata = dict()
metadata["first_name"] = "John"

print("The first name is: " + metadata["first_name"])

You also know that `str()` can be used to **type-cast** values that are different from `str`:

In [None]:
metadata = dict()
metadata["latitude"] = 43.135555
metadata["longitude"] = -70.939534
metadata["start_timestamp"] = datetime(2019, 2, 22, 12, 32, 40)

print("The position is: " + str(metadata["latitude"]) + ", " + str(metadata["longitude"]))
print("Start time: " + str(metadata["start_timestamp"]))

We now introduce a new method that achieves the same result by using the modulo operator: `%`.

Below are a couple of examples of its usage:

In [None]:
metadata = dict()
metadata["first_name"] = "John"

print("The first name is: %s" % (metadata["first_name"]))

In [None]:
metadata = dict()
metadata["latitude"] = 43.135558
metadata["longitude"] = -70.939534
metadata["start_timestamp"] = datetime(2019, 2, 22, 12, 32, 40)

print("The position is: %s, %s" % (metadata["latitude"], metadata["longitude"]))
print("Start time: %s" % (metadata["start_timestamp"]))

By examining the above examples, you will notice that: 

* The `%s` is used as a **placeholder** in the string containing the message. 
* The message string is followed by the `%` operator, then by one or more variables enclosed in rounded brackets.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

In the above code, the values inside the rounded brackets after the `%` operator create a so-called [`tuple`](https://docs.python.org/3.9/library/stdtypes.html?#tuples). <br>
A `tuple` is a Python container similar to a list except that you cannot modify the content after creation (an **immutable sequence**).

String formatting using the `%` operator provides [additional printing options](https://docs.python.org/3.9/library/stdtypes.html#printf-style-string-formatting). One of these options is to define how many decimal digits to print for a `float` value. 

For instance, by using `%.4f` as a placeholder, Python will print **only** the first four decimal digits: 

In [None]:
metadata = dict()
metadata["latitude"] = 43.135558
metadata["longitude"] = -70.939534

print("The position is: %.4f, %.4f" % (metadata["latitude"], metadata["longitude"]))

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Print the depth value in the `metadata` dictionary created below to two decimals (centimeter-level accuracy):

In [None]:
metadata = dict()
metadata["depth"] = 129.121 # depth in meters

print("The depth is %.2f m." % (metadata["depth"]))

In [None]:
metadata = dict()
metadata["depth"] = 129.121 # depth in meters

***

<img align="left" width="6%" style="padding-right:10px; padding-top:10px;" src="images/refs.png">

## Useful References

* [The official Python 3.9 documentation](https://docs.python.org/3.9/index.html)
  * [Glossary](https://docs.python.org/3.9/glossary.html)
  * [Mapping Types - dict](https://docs.python.org/3.9/library/stdtypes.html#mapping-types-dict)
  * [Collections - OrderedDict](https://docs.python.org/3.9/library/collections.html?highlight=ordereddict#ordereddict-objects)
  * [`datetime`](https://docs.python.org/3.9/library/datetime.html?#module-datetime) 
  * [`tuple`](https://docs.python.org/3.9/library/stdtypes.html?#tuples)
* [Hash function](https://en.wikipedia.org/wiki/Hash_function)
* [Metadata](https://en.wikipedia.org/wiki/Metadata)

<img align="left" width="5%" style="padding-right:10px;" src="images/email.png">

*For issues or suggestions related to this notebook, write to: epom@ccom.unh.edu*

<!--NAVIGATION-->
[< Read and Write Text Files >](006_Read_and_Write_Text_Files.ipynb) | [Contents](index.ipynb) | [A Class as a Data Container >](008_A_Class_as_a_Data_Container.ipynb)