[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scott2b/PythonReview/blob/main/notebooks/Python.07.Dictionaries.ipynb)

# Python Dictionaries

## Dictionaries

A simple dictionary that maps digits to their words:

In [None]:
digits = {
    1: 'one',
    2: 'two',
    3: 'three',
    4: 'four',
    5: 'five',
 }
digits

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

Another way to construct a dictionary is from the key-value tuple pairs:

In [None]:
dict([ (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five') ])

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

To de-reference an item, use the `[]` syntax:

In [None]:
digits[3]

'three'

Or use the `get` method:

In [None]:
digits.get(5)

'five'

## Assignment / updating

You can also use the `[key]` syntax to add new values to the dictionary:

In [None]:
digits[6] = 'six'
digits

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five', 6: 'six'}

And you can combine the dictionary with another dictionary using update:

In [None]:
digits.update({ 7: 'seven', 8: 'eight', 9: 'nine' })
digits

{1: 'one',
 2: 'two',
 3: 'three',
 4: 'four',
 5: 'five',
 6: 'six',
 7: 'seven',
 8: 'eight',
 9: 'nine'}

The `get` method also allows you to pass in a default value:

In [None]:
digits.get(10, 'unknown digit')

'unknown digit'

We could use this to implement a simple style guide with the rule to print the name of the number for 1-9, and print the numeric form for anything over 9:

In [None]:
def get_numberstring(digit):
    return digits.get(digit, str(digit))

print(get_numberstring(9))
print(get_numberstring(10))

nine
10


## Iterating

The keys of the dictionary are given by the `keys` method:

In [None]:
digits.keys()

dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9])

Which we could then iterate:

In [None]:
for k in digits.keys():
    print(k, digits[k])

1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine


The key-value pairs can be obtained with the `items`. Note the pairity between what you get from calling `items` and the format for creating a dictionary from k-v pairs above:

In [None]:
digits.items()

dict_items([(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five'), (6, 'six'), (7, 'seven'), (8, 'eight'), (9, 'nine')])

The most common way to iterate a dictionary is by iterating the items:

In [None]:
for k, v in digits.items():
    print(k, v)

1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine


Note: there is a caveat here with respect to dictionary order. In older versions of Python, order of dictionaries was undefined. Now, in modern versions of Python, the order is determined by the order of insertion.

To make this concrete, consider this unordered digits dictionary:

In [None]:
unordered_digits = {
    5: 'five',
    2: 'two',
    9: 'nine'
}
unordered_digits.items()

dict_items([(5, 'five'), (2, 'two'), (9, 'nine')])

In [None]:
unordered_digits[3] = 'three'
unordered_digits.items()

dict_items([(5, 'five'), (2, 'two'), (9, 'nine'), (3, 'three')])

In [None]:
unordered_digits.update( { 7: 'seven', 1: 'one' })
unordered_digits

{1: 'one', 2: 'two', 3: 'three', 5: 'five', 7: 'seven', 9: 'nine'}

In [None]:
import sys
sys.version_info

sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)

Of course, keys don't have to be digits -- this is just an example. We could have created a dictionary that looks like this:

```
{
    'one': 1,
    'two': 2,
    'three': 3
}
```

... and so on.


Exercise: invert the digits dictionary

Using the digits dictionary, create an inverted version of this dictionary that is keyed by the names of the numbers rather than the numeric digits.

In [None]:
numbers = {}
for digit, name in digits.items():
    numbers[name] = digit

numbers

{'eight': 8,
 'five': 5,
 'four': 4,
 'nine': 9,
 'one': 1,
 'seven': 7,
 'six': 6,
 'three': 3,
 'two': 2}

What if we want this ordered? We could ensure iterating in order this way:

In [None]:
ordered_numbers = {}
for digit in sorted(digits.keys()):
    ordered_numbers[digits[digit]] = digit

ordered_numbers

{'eight': 8,
 'five': 5,
 'four': 4,
 'nine': 9,
 'one': 1,
 'seven': 7,
 'six': 6,
 'three': 3,
 'two': 2}

In [1]:
import json
from google.colab import drive
from pathlib import Path

drive.mount('/content/drive')
root = Path('drive/My Drive/')
data = json.load(open(root / 'MyProject/twitter_apiresponse_example.json'))
data = data[0] # we just want the first tweet

Mounted at /content/drive


Get the dictionary keys:

In [2]:
data.keys()

dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'retweeted_status', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])

Simple dereferencing: use the `[]` syntax:

In [3]:
data['text']

'RT @TwitterDev: 1/ Today we’re sharing our vision for the future of the Twitter API platform!nhttps://t.co/XweGngmxlP'

Another way to get a value by key is with the `get` method:

In [4]:
data.get('id')

850007368138018817

... which has the advantage of allowing you to provide a default if the key does not exist:

In [5]:
data.get('nonexisting-key', 'some-default-value')

'some-default-value'

A dictionary's items are the list of key-value pair tuples:

In [6]:
data.items()

dict_items([('created_at', 'Thu Apr 06 15:28:43 +0000 2017'), ('id', 850007368138018817), ('id_str', '850007368138018817'), ('text', 'RT @TwitterDev: 1/ Today we’re sharing our vision for the future of the Twitter API platform!nhttps://t.co/XweGngmxlP'), ('truncated', False), ('entities', {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'TwitterDev', 'name': 'TwitterDev', 'id': 2244994945, 'id_str': '2244994945', 'indices': [3, 14]}], 'urls': [{'url': 'https://t.co/XweGngmxlP', 'expanded_url': 'https://cards.twitter.com/cards/18ce53wgo4h/3xo1c', 'display_url': 'cards.twitter.com/cards/18ce53wg…', 'indices': [94, 117]}]}), ('source', '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>'), ('in_reply_to_status_id', None), ('in_reply_to_status_id_str', None), ('in_reply_to_user_id', None), ('in_reply_to_user_id_str', None), ('in_reply_to_screen_name', None), ('user', {'id': 6253282, 'id_str': '6253282', 'name': 'Twitter API', 'screen_name': 'twitterapi', 'l

The most common way to iterate a dictionary is to iterate the items. Let's look at the value types, which will show us which keys have further nested dictionary structure:

In [7]:
for k, v in data.items():
    print(k, type(v))

created_at <class 'str'>
id <class 'int'>
id_str <class 'str'>
text <class 'str'>
truncated <class 'bool'>
entities <class 'dict'>
source <class 'str'>
in_reply_to_status_id <class 'NoneType'>
in_reply_to_status_id_str <class 'NoneType'>
in_reply_to_user_id <class 'NoneType'>
in_reply_to_user_id_str <class 'NoneType'>
in_reply_to_screen_name <class 'NoneType'>
user <class 'dict'>
geo <class 'NoneType'>
coordinates <class 'NoneType'>
place <class 'NoneType'>
contributors <class 'NoneType'>
retweeted_status <class 'dict'>
is_quote_status <class 'bool'>
retweet_count <class 'int'>
favorite_count <class 'int'>
favorited <class 'bool'>
retweeted <class 'bool'>
possibly_sensitive <class 'bool'>
lang <class 'str'>


## Working with nested dictionaries

In [8]:
data['user'].keys()

dict_keys(['id', 'id_str', 'name', 'screen_name', 'location', 'description', 'url', 'entities', 'protected', 'followers_count', 'friends_count', 'listed_count', 'created_at', 'favourites_count', 'utc_offset', 'time_zone', 'geo_enabled', 'verified', 'statuses_count', 'lang', 'contributors_enabled', 'is_translator', 'is_translation_enabled', 'profile_background_color', 'profile_background_image_url', 'profile_background_image_url_https', 'profile_background_tile', 'profile_image_url', 'profile_image_url_https', 'profile_banner_url', 'profile_link_color', 'profile_sidebar_border_color', 'profile_sidebar_fill_color', 'profile_text_color', 'profile_use_background_image', 'has_extended_profile', 'default_profile', 'default_profile_image', 'following', 'follow_request_sent', 'notifications', 'translator_type'])

In [9]:
data['user']['screen_name']

'twitterapi'

In [10]:
data['user']['entities']

{'url': {'urls': [{'url': 'http://t.co/78pYTvWfJd',
    'expanded_url': 'https://dev.twitter.com',
    'display_url': 'dev.twitter.com',
    'indices': [0, 22]}]},
 'description': {'urls': []}}

Get the expanded_url of the first URL (in this example ther is just one) from the user entities:

In [11]:
data['user']['entities']['url']['urls'][0]['expanded_url']

'https://dev.twitter.com'

But what if we didn't know if there were any entities, or urls within the entities?

One approach:

In [12]:
first_url = None
urls = data['user'].get('entities', {}).get('url', {}).get('urls')
if urls:
    first_url = urls[0]['expanded_url']
first_url

'https://dev.twitter.com'

In [13]:
first_url = None
try:
    urls = data['user']['entities']['url']['urls']
    first_url = urls[0]['expanded_url']
except (KeyError, IndexError):
    pass
first_url

'https://dev.twitter.com'