In [41]:
import re
import sys

## Modern `dict` Syntax

### `dict` Comprehensions

> Since Python 2.7, the syntax of `listcomps` ... was adapted to `dict` comprehensions ... A `dictcomp` (`dict` comprehension) builds a `dict` instance by taking `key:value` pairs from any `iterable`.

In [1]:
dial_codes = [                                                  # <1>
    (880, 'Bangladesh'),
    (55,  'Brazil'),
    (86,  'China'),
    (91,  'India'),
    (62,  'Indonesia'),
    (81,  'Japan'),
    (234, 'Nigeria'),
    (92,  'Pakistan'),
    (7,   'Russia'),
    (1,   'United States'),
]

In [2]:
# Let's create a dict using comprehensions
country_dial = {country: code for code, country in dial_codes}

In [3]:
country_dial

{'Bangladesh': 880,
 'Brazil': 55,
 'China': 86,
 'India': 91,
 'Indonesia': 62,
 'Japan': 81,
 'Nigeria': 234,
 'Pakistan': 92,
 'Russia': 7,
 'United States': 1}

In [4]:
# Let's reverse the dictionary, turn countries into upper case, sort by code and limit code < 70
{code: country.upper() for country, code in sorted(country_dial.items(), key=lambda x: int(x[1])) if code < 70}

{1: 'UNITED STATES', 7: 'RUSSIA', 55: 'BRAZIL', 62: 'INDONESIA'}

In [5]:
# Sort by a country name
{code: country.upper() for country, code in sorted(country_dial.items(), key=lambda x: x[0]) if code < 70}

{55: 'BRAZIL', 62: 'INDONESIA', 7: 'RUSSIA', 1: 'UNITED STATES'}

### Unpacking Mappings

First of all, Python allows to use keyword arguments. In other words, identify arguments by a *name*, not a position. Here's an example from *Lutz, p. 532*. This is different from *default* arguments.
```python
def f(a, b, c): 
    print(a, b, c)

# We may call this function like this
f(a=1, b=2, c=3)
```

We also have a support for functions that take *any number* of *positional* arguments. Let's first consider a single asterisk `*` in a function definition. Here we use `*` in the function definition, not in the call. 
> The first use... collects any number of *positional* arguments into a tuple.

In [12]:
def f(*args):
    return args

In [13]:
args = f(1, 2, 3, 4)

In [14]:
type(args)

tuple

In [15]:
args

(1, 2, 3, 4)

Finally, we also have support for any number of *keyword* arguments. In this case we use a double asterisk `**`.
> The `**` feature is similar, but it only works for keyword arguments—it collects them into a new dictionary, which can then be processed with normal dictionary tools. 

In [16]:
def g(**kwargs): 
    return kwargs

In [17]:
kwargs = g(x=1, y=2)

In [18]:
type(kwargs)

dict

In [19]:
kwargs

{'x': 1, 'y': 2}

Now we can use `**` in *a call* (not as an *argument*) to unroll a dictionary into *keywords* arguments. The result of passing `**{'x': 1, 'y': 2}` is exactly the same as before.

In [20]:
kwargs = g(**{'x': 1, 'y': 2})

In [21]:
type(kwargs)

dict

In [22]:
kwargs

{'x': 1, 'y': 2}

Now we are ready to understand what's specifiied in the book:
> First, we can apply `**` to *more than one argument* in a function call. This works when keys are all strings and unique across all arguments (because duplicate keyword arguments are forbidden).

In [23]:
kwargs = g(**{'x': 1}, y=2, **{'z': 3})

In [24]:
kwargs

{'x': 1, 'y': 2, 'z': 3}

### Merging Mappings with `|`

We may merge dictionaries with (`|=`) or without mutating (`|`).

In [25]:
d1 = {'a': 1, 'b': 3}

In [26]:
d2 = {'a': 2, 'b': 4, 'c': 6}

In [27]:
d1 | d2

{'a': 2, 'b': 4, 'c': 6}

In [29]:
# No changes in d1
d1

{'a': 1, 'b': 3}

In [30]:
d1 |= d2

In [31]:
d1

{'a': 2, 'b': 4, 'c': 6}

## Pattern Matching with Mappings

## Standard API of Mapping Types

### Inserting or Updating Mutable Values

To undestand this code we need some intro to `regex`:
> - `re.findall()` Return **all*8 non-overlapping matches of pattern in string, as a list of strings or tuples.
> - `re.finditer()` Return an iterator yielding Match objects over all non-overlapping matches for the RE pattern in string.
> - `class re.Match` Match object returned by successful matches and searches.
> - `Match.group([group1, ...])` Without arguments, group1 defaults to zero (the whole match is returned).
> - `Match.start([group])` Return the indices of the start ... of the substring matched by group; group defaults to zero (meaning the whole matched substring). 

In [54]:
WORD_RE = re.compile(r'\w+')

index = {}
with open('zen.txt', encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1): 
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            # this is ugly; coded like this to make a point 
            occurrences = index.get(word, []) 
            occurrences.append(location)
            index[word] = occurrences

#### ====DEBUGGING====

In [50]:
with open('zen.txt', encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        if line_no > 1: break
        print(f"line:{line.strip()}")
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            print(f"word:{word} column_no:{column_no}")
            

line:The Zen of Python, by Tim Peters
word:The column_no:1
word:Zen column_no:5
word:of column_no:9
word:Python column_no:12
word:by column_no:20
word:Tim column_no:23
word:Peters column_no:27


In [53]:
index = {}
with open('zen.txt', encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        if line_no > 1: break
        # print(f"line:{line.strip()}")
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            # print(f"word:{word} column_no:{column_no}")
            occurrences = index.get(word, [])
            occurrences.append(location)
            index[word] = occurrences
            print(f"index:{index}")

index:{'The': [(1, 1)]}
index:{'The': [(1, 1)], 'Zen': [(1, 5)]}
index:{'The': [(1, 1)], 'Zen': [(1, 5)], 'of': [(1, 9)]}
index:{'The': [(1, 1)], 'Zen': [(1, 5)], 'of': [(1, 9)], 'Python': [(1, 12)]}
index:{'The': [(1, 1)], 'Zen': [(1, 5)], 'of': [(1, 9)], 'Python': [(1, 12)], 'by': [(1, 20)]}
index:{'The': [(1, 1)], 'Zen': [(1, 5)], 'of': [(1, 9)], 'Python': [(1, 12)], 'by': [(1, 20)], 'Tim': [(1, 23)]}
index:{'The': [(1, 1)], 'Zen': [(1, 5)], 'of': [(1, 9)], 'Python': [(1, 12)], 'by': [(1, 20)], 'Tim': [(1, 23)], 'Peters': [(1, 27)]}


#### ====DEBUGGING====

In [55]:
occurrences = list(sorted(index, key=str.upper))[:10]

In [56]:
for word in occurrences: 
    print(word, index[word])

a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)]
beats [(11, 23)]


So `location` is a tuple like `(15, 23)`. `occurrences = index.get(word, [])` looks up a word in `index`. If no word it returns 

## Automatic Handling of Missing Keys

So what happens when we're indexing by a non-existent key? As we may see, we get an exception or a default value based on the method we use.
> In line with Python’s *fail-fast philosophy*, dict access with `d[k]` raises an error when k is not an existing key. Pythonistas know that `d.get(k, default)` is an alternative to `d[k]` whenever a default value is more convenient than handling `KeyError`.

In [32]:
d = {'a': 1, 'b': 2}

In [33]:
d['c']

KeyError: 'c'

In [35]:
d.get('c', 0), d.get('c')

(0, None)

In [36]:
d

{'a': 1, 'b': 2}

We may see that the dictionary is unchanged after using `d.get('c', 0)`. We may use `d.setdefault('c', 0)` to change this behavior. If we call `d.setdefault('a', 0)` it just *returns* the value for `'a'` and `d` is not changed.

In [37]:
d.setdefault('a', 0)

1

In [38]:
d

{'a': 1, 'b': 2}

In [39]:
d.setdefault('c', 0)

0

In [40]:
d

{'a': 1, 'b': 2, 'c': 0}