# Dictionaries and Pandas

## References: 

- [Datacamp - Matplotlib](https://campus.datacamp.com/courses/intermediate-python/dictionaries-pandas?ex=1)


## Overview

Dictionary is an unordered collection of key-value pairs, where each key is unique. It is denoted by curly braces `{}` and the key-value pairs are separated by a `:` colon. Dictionaries are extremely useful when we need to store and retrieve data in a way that is fast and efficient.


## Creating a Dictionary

To create a dictionary in Python, we use the curly braces {} and separate the key-value pairs with a colon. Here's an example:


In [None]:
hot_data = {
    # keys        :   values 
    'dataset_name': 'Hawaii Ocean Time-series data',
    'dataset_description': 'HOT dataset',
    'dataset_source': 'BCO-DMO',
    'dataset_variables': ['temperature', 'salinity', 'pressure'], # not including everything
    'dataset_years': (1988, 2019),
    'dataset_ctd':'https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_3937.csv',
    'dataset_bottle':'https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_3773.csv'
}

In this example, we have created a dictionary called `hot_data` with several key-value pairs. The keys are strings (e.g. 'dataset_name') and the values can be of any data type (e.g. strings, lists, tuples, integers).


## Accessing Dictionary Values

You can access the value of a specific key in a dictionary by using the key inside square brackets `[]`. For example, to access the value for 'dataset_name' in `hot_data`, we would do the following:

In [None]:
print(hot_data['dataset_name'])

## Updating a Dictionary

You can add new key-value pairs to a dictionary or update existing ones by assigning a value to a specific key. Here's an example of adding a new key-value pair to hot_data:

In [None]:
hot_data['dataset_processor'] = 'Fernando C. Pacheco'

In [None]:
print(hot_data)

## Iterating over a Dictionary

You can iterate over a dictionary using a for loop. Here's an example of iterating over the `hot_data` dictionary we created earlier:



In [None]:
for key, value in hot_data.items():
    print(key, ':', value)

# Pandas

Pandas is a powerful Python library used for data manipulation and analysis. It provides a data structure called DataFrame, which allows you to organize and manipulate data in a tabular form.

## Importing Pandas and Loading Data

Before we start using Pandas, we need to install it and import it into our Python environment.

`conda install pandas`
     
    or 

`pip install pandas`

Next, we will load our data into a Pandas DataFrame. We will use the CTD (conductivity, temperature, depth) dataset from HOT. This dataset contains measurements of water temperature, salinity, and pressure at various depths in the ocean.

In [None]:
import pandas as pd

#url = hot_data['dataset_ctd']
url= "dataset_ctd.csv"

In [None]:
df = pd.read_csv(url , 
                 skiprows=[1],          # Remove the units header...
                 dtype=None,
                 parse_dates=['time']   # Change object type to datetime
                )

In [None]:
df

In [None]:
df.dtypes

In [None]:
df = df.set_index('time')

In [None]:
df

In [None]:
# Print out cruise_names column as Pandas Series
print(df['cruise_name'])

In [None]:
# Print out cruise_names column as Pandas DataFrame
print(df[['cruise_name']])

In [None]:
# Print out DataFrame with pressure and temperature  columns
print(df[['CTDPRS','CTDTMP']])

## LOC and ILOC

In Python's pandas library, `loc` and `iloc` are important functions for data indexing and selection. They allow you to access and manipulate specific rows and columns in a DataFrame or Series. Here's an explanation of both functions and some examples:


###  loc 
- loc is used to access rows and columns by label.
- The syntax is `df.loc[row_indexer, column_indexer]`, where row_indexer and column_indexer can be a single label, a list of labels, or a slice.
- The labels can be either row labels or column labels.
- It returns a new DataFrame or Series containing the selected rows and columns.

In [None]:
print(df.loc[:,'CTDPRS'])

### iloc

- iloc is used to access rows and columns by integer-based position.
- The syntax is df.iloc[row_indexer, column_indexer], where row_indexer and column_indexer can be a single - integer, a list of integers, or a slice.
- The integers represent the positional index of the rows or columns.
- It returns a new DataFrame or Series containing the selected rows and columns.

In [None]:
print(df.iloc[:,23])

In [None]:
df['CTDPRS'] = -df['CTDPRS']     # convert pressure to negative 