# Dictionaries

Dictionaries are an essential data structure for data storage in python and a great format for communicating with the outside world:

+ configuration files
+ structured data files
+ JSON from web services.

## Dictionaries as flexible data storage

Let's first think in which ways we can normally get structured data from the outside world.

### CSV document

```{}
country,state,locality,collectors, scientific_name
República Dominicana,Santiago,"Loma La Pelona, Coordillera Central","Juan Pérez, Pancho Luis Díaz Ramírez",Pinus occidentalis
```

### Basic JSON

```{json}
{
    "country": "República Dominicana",
    "state": "Santiago",
    "locality": "Loma La Pelona, Coordillera Central",
    "collectors":"Juan Pérez, Pancho Luis Díaz Ramírez",
    "scientific_name": "Pinus occidentalis"
 }
```

### Better JSON

```{json}
{
    "country": "República Dominicana",
    "state": "Santiago",
    "locality": "Loma La Pelona, Coordillera Central",
    "collectors": ["Juan Pérez", "Pancho Díaz"],
    "scientific_name": "Pinus occidentalis"
}
```

### Maybe best JSON

```    
{
    "country": "República Dominicana",
    "state": "Santiago",
    "locality": "Loma La Pelona, Coordillera Central",
    "collectors": [
        {
            "first_name": "Juan",
            "last_name": "Pérez"
        },
        {
            "first_name": "Pancho",
            "last_name": "Díaz Ramírez",
            "middle_name": 'Luis'
        }
    ],
    "taxonomy": {
        "genus":"Pinus",
        "specific_epithet": "occidentalis"
    }
}
```

In [None]:
record = {
    "country": "República Dominicana",
    "state": "Santiago",
    "locality": "Loma La Pelona, Coordillera Central",
    "collectors": [
        {
            "first_name": "Juan",
            "last_name": "Pérez"
        },
        {
            "first_name": "Pancho",
            "last_name": "Díaz Ramírez",
            "middle_name": 'Luis'
        }
    ],
    "taxonomy": {
        "genus":"Pinus",
        "specific_epithet": "occidentalis"
    }
}

### Live examples: Working with dictionaries

#### Create dictionary from scrach

#### Key value pair query syntax

#### Dictionary methods

## API example

In [None]:
import requests
import json
import html
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline

## idigbio api

In [None]:
def search_idigbio(params):
    idigbio_base_url = "https://search.idigbio.org/v2/search/records"
    payload = {
        "rq": json.dumps(params)
    }
    response = requests.get(idigbio_base_url, params=payload)
    return response

In [None]:
response = search_idigbio({"genus": "Asclepias", "country":"United States"})

In [None]:
len(response.json()['items'])

In [None]:
response.url

In [None]:
records = response.json()

In [None]:
records['items'][0].keys()

In [None]:
records['items'][0]['data'].keys()

##### Dictionary loop key value pairs

In [None]:
for key, value in records['items'][0]['data'].items():
    print(key)
    print(value)
    print("+=================================+")

In [None]:
for key, value in records['items'][0]['data'].items():
    print(key, type(value))

Retrieve only numeric fields

In [None]:
def is_float(value):
    try:
        res = float(value)
        return res        
    except ValueError:
        return False        

In [None]:
numeric_fields = []
for key, value in records['items'][0]['data'].items():
    if is_float(value):
        numeric_fields.append(key)

In [None]:
numeric_fields

In [None]:
numeric_records = []
for record in records['items']:
    new_record = {
        "uuid": record['uuid']
    }
    for field in numeric_fields:
        new_record[field] = record['data'][field]
    numeric_records.append(new_record)

In [None]:
numeric_records = []
for record in records['items']:
    new_record = {
        "uuid": record['uuid']
    }
    for field in numeric_fields:
        new_record[field] = record['data'].get(field, None)
    numeric_records.append(new_record)

In [None]:
numeric_records[0:2]

In [None]:
years = [int(record['dwc:year']) for record in numeric_records if record.get('dwc:year', 0)]

In [None]:
f, ax = plt.subplots(figsize=(10, 8))
sns.distplot(years, rug=True, kde=False, bins=20);
plt.show;

In [None]:
latitudes = [float(record['dwc:decimalLatitude']) for record in numeric_records if record.get('dwc:decimalLatitude', 0)]

In [None]:
f, ax = plt.subplots(figsize=(10, 8))
sns.distplot(latitudes, rug=True, kde=False, bins=20);
plt.show;

Species summary

In [None]:
species_summary = dict()

for record in res['items']:
    
    taxon = record['data']['dwc:scientificName']
    state = record['data'].get('dwc:stateProvince', 'Unknown')
    
    if species_summary.get(taxon,0):
        
        #species_summary[taxon]['states'] = species_summary[taxon]['states'].add(state) 
        species_summary[taxon]['states'].add(state) 
        species_summary[taxon]['count'] += 1
    else:
        species_summary[taxon] = {
            'states': {state,},
            'count': 1
        }

In [None]:
species_summary

In [None]:
species_summary.keys()

In [None]:
len(species_summary.keys())

### Sequence reverse complement

In [None]:
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}


In [None]:
seq = "TCGGGCCCAAATCTCCGGAG"


In [None]:
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))

In [None]:
reverse_complement