# Dictionaries and Pandas

## References: 

- [Datacamp - Matplotlib](https://campus.datacamp.com/courses/intermediate-python/dictionaries-pandas?ex=1)


## Overview

Dictionary is an unordered collection of key-value pairs, where each key is unique. It is denoted by curly braces `{}` and the key-value pairs are separated by a `:` colon. Dictionaries are extremely useful when we need to store and retrieve data in a way that is fast and efficient.


## Creating a Dictionary

To create a dictionary in Python, we use the curly braces {} and separate the key-value pairs with a colon. Here's an example:

In [23]:
hot_data = {
    # keys        :   values 
    'dataset_name': 'Hawaii Ocean Time-series data',
    'dataset_description': 'HOT dataset',
    'dataset_source': 'BCO-DMO',
    'dataset_variables': ['temperature', 'salinity', 'pressure'], # not including everything
    'dataset_years': (1988, 2019),
    'dataset_ctd':'http://optserv1.whoi.edu/jg/listfnzm//dmoserv3.bco-dmo.org:80/BCO-DMO/HOT/ctd',
    'dataset_bottle':'http://optserv1.whoi.edu/jg/listfnzm//dmoserv3.bco-dmo.org:80/BCO-DMO/HOT/niskin_v2'
}

In this example, we have created a dictionary called `hot_data` with several key-value pairs. The keys are strings (e.g. 'dataset_name') and the values can be of any data type (e.g. strings, lists, tuples, integers).

## Accessing Dictionary Values

You can access the value of a specific key in a dictionary by using the key inside square brackets `[]`. For example, to access the value for 'dataset_name' in `hot_data`, we would do the following:

In [13]:
print(hot_data['dataset_name'])

Hawaii Ocean Time-series data


## Updating a Dictionary

You can add new key-value pairs to a dictionary or update existing ones by assigning a value to a specific key. Here's an example of adding a new key-value pair to hot_data:

In [24]:
hot_data['dataset_processor'] = 'Fernando C. Pacheco'

In [25]:
print(hot_data)

{'dataset_name': 'Hawaii Ocean Time-series data', 'dataset_description': 'HOT dataset', 'dataset_source': 'BCO-DMO', 'dataset_variables': ['temperature', 'salinity', 'pressure'], 'dataset_years': (1988, 2019), 'dataset_ctd': 'http://optserv1.whoi.edu/jg/listfnzm//dmoserv3.bco-dmo.org:80/BCO-DMO/HOT/ctd', 'dataset_bottle': 'http://optserv1.whoi.edu/jg/listfnzm//dmoserv3.bco-dmo.org:80/BCO-DMO/HOT/niskin_v2', 'dataset_processor': 'Fernando C. Pacheco'}


## Iterating over a Dictionary

You can iterate over a dictionary using a for loop. Here's an example of iterating over the `hot_data` dictionary we created earlier:



In [26]:
for key, value in hot_data.items():
    print(key, ':', value)

dataset_name : Hawaii Ocean Time-series data
dataset_description : HOT dataset
dataset_source : BCO-DMO
dataset_variables : ['temperature', 'salinity', 'pressure']
dataset_years : (1988, 2019)
dataset_ctd : http://optserv1.whoi.edu/jg/listfnzm//dmoserv3.bco-dmo.org:80/BCO-DMO/HOT/ctd
dataset_bottle : http://optserv1.whoi.edu/jg/listfnzm//dmoserv3.bco-dmo.org:80/BCO-DMO/HOT/niskin_v2
dataset_processor : Fernando C. Pacheco


# Pandas

Pandas is a powerful Python library used for data manipulation and analysis. It provides a data structure called DataFrame, which allows you to organize and manipulate data in a tabular form.

## Importing Pandas and Loading Data

Before we start using Pandas, we need to install it and import it into our Python environment.

`conda install pandas`
     
    or 

`pip install pandas`

Next, we will load our data into a Pandas DataFrame. We will use the CTD (conductivity, temperature, depth) dataset from HOT. This dataset contains measurements of water temperature, salinity, and pressure at various depths in the ocean.

In [37]:
import pandas as pd

url = hot_data['dataset_ctd']

In [None]:
df = pd.read_csv(url, delimiter='\t', header=1)

In [None]:
df.head()