## The Python `dict`

`{key1: value1, key2: value2}`

Let's make a list of dictionaries:

In [1]:
quotes = [
    {'author': 'F. SCOTT FITZGERALD', 'text': 'Action is character'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'Every man is my superior in some way. In that, I learn of him'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'The purpose of life is not to be happy. It is to be useful, to be honorable, to be compassionate, to have it make some difference that you have lived and lived well'},
    {'author': 'Ralph Waldo Emerson', 'text': 'Every man alone is sincere.  At the entrance of a second persion, hypocrisy beings'},
    {'author': 'Majjha Nikaya', 'text': 'This is, because that is.  This is not, because that is not.  This is like this, because this is like that'}
]

We can select our first quote dictionary:

In [2]:
fitz = quotes[0]

fitz

{'author': 'F. SCOTT FITZGERALD', 'text': 'Action is character'}

We can iterate over the keys:

In [3]:
for k in fitz.keys():
    print(k)

author
text


And the same for the values:

In [4]:
for v in fitz.values():
    print(v)

F. SCOTT FITZGERALD
Action is character


## Exercise

Use `zip` to iterate over the keys & values at the same time, print keys and values:

author F. SCOTT FITZGERALD
text Action is character


We can iterate over both the keys & values at the same time:

In [5]:
for k, v in fitz.items():
    print(k, v)

author F. SCOTT FITZGERALD
text Action is character


Be careful with iterating over dicts - the order is not guranteed to be consistent.

## Dict comprehension

Similar sytax to the list comprehension.

Without a dict comprehension, we might do the following:

In [8]:
processed = {}
for k, v in fitz.items():
    processed[k.lower()] = v.lower()
    
processed 

{'author': 'f. scott fitzgerald', 'text': 'action is character'}

A dict comprehension of the above:

In [9]:
{k.lower(): v.lower() for k, v in fitz.items()}

{'author': 'f. scott fitzgerald', 'text': 'action is character'}

## Dict & JSON

The similarity between the Python dict and JSON:

In [10]:
fitz

{'author': 'F. SCOTT FITZGERALD', 'text': 'Action is character'}

In [11]:
import json

js = json.dumps(fitz)

js

'{"author": "F. SCOTT FITZGERALD", "text": "Action is character"}'

In [13]:
type(fitz)

dict

In [14]:
type(js)

str

## Exercise

Write the `quotes` dictionary to a `.json` file - one row per record.

## Sets

Sets are useful for finding uniques:

In [22]:
set(['Bob', 'Bob', 'Dylan'])

{'Bob', 'Dylan'}

Like dicts, sets are **hashed**
- this makes lookup constant time

In [23]:
#  this always takes the same time
#  even for very large sets
'Bob' in set(['Bob', 'Bob', 'Dylan'])

True

Sets are **unordered**

We can't index them:

In [24]:
zimmerman = set(['Bob', 'Bob', 'Dylan'])

In [25]:
zimmerman[0]

TypeError: 'set' object does not support indexing

But we can iterate over them:

In [26]:
[i for i in zimmerman]

['Dylan', 'Bob']

Common set operations include the **union** (a join):

In [27]:
beatles = set(['john', 'paul', 'george', 'ringo'])

new_york = beatles.union(zimmerman)

new_york

{'Bob', 'Dylan', 'george', 'john', 'paul', 'ringo'}

The **intersection** gets items in both:

In [28]:
new_york.intersection(beatles)

{'george', 'john', 'paul', 'ringo'}

The **difference** does what it says on the tin:

In [29]:
new_york.difference(beatles)

{'Bob', 'Dylan'}

The **symmetric difference** has elements from either, or (but not both):

In [46]:
new_york.symmetric_difference(beatles)

{'Bob', 'Dylan'}

In [47]:
new_york.symmetric_difference(zimmerman)

{'george', 'john', 'paul', 'ringo'}

## Exercise

For the quotes dataset
- find the unique authors
- find the unique words
- words that appear in both Emerson's and Fitzgerald's quotes

In [97]:
quotes = [
    {'author': 'F. SCOTT FITZGERALD', 'text': 'Action is character'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'Every man is my superior in some way. In that, I learn of him'},
    {'author': 'RALPH WALDO EMERSON', 'text': 'The purpose of life is not to be happy. It is to be useful, to be honorable, to be compassionate, to have it make some difference that you have lived and lived well'},
    {'author': 'Ralph Waldo Emerson', 'text': 'Every man alone is sincere.  At the entrance of a second persion, hypocrisy beings'},
    {'author': 'Majjha Nikaya', 'text': 'This is, because that is.  This is not, because that is not.  This is like this, because this is like that'}
]

In [30]:
# Unique authors and words

Unique authors are {'majjha nikaya', 'f. scott fitzgerald', 'ralph waldo emerson'}
Unique words are {'that,', 'be', 'i', 'some', 'it', 'a', 'happy.', 'not.', 'learn', 'life', 'honorable,', 'to', 'make', 'purpose', 'is,', 'this,', 'action', 'sincere.', 'hypocrisy', 'him', 'useful,', 'persion,', 'is', 'compassionate,', 'every', 'at', 'in', 'superior', 'lived', 'not', 'difference', 'because', 'not,', 'character', 'have', 'this', 'second', 'is.', 'like', 'of', 'my', 'you', 'well', 'beings', 'the', 'alone', 'entrance', 'man', 'way.', 'that', 'and'}


In [32]:
# Fitz and Emerson common words