# Part 2: Built-in data structures: list, set, dictionary, tuple

## Agenda

1.  Numbers

2.  Strings

3.  Booleans

4.  Tuples

5.  Mutable Data Structures

    a.  Sets

    b.  Lists

    c.  Dictionaries
    
    d.  Composites: list of dict 
    
6. Assignment and Variables

## 1. Numbers

Integers

Floats

Complex

In [1]:
355 + 113

468

In [2]:
355. / 113.

3.1415929203539825

In [3]:
(2 + 3j) * (4 + .5j)

(6.5+13j)

Limits:

Integers have no limits. If you try to create too large a number, you really can fill memory and crash.  But it's a big number.

Float is IEEE 64-bit floats. 

Complex is a pair of 64-bit floats.

In [4]:
2**2048

32317006071311007300714876688669951960444102669715484032130345427524655138867890893197201411522913463688717960921898019494119559150490921095088152386448283120630877367300996091750197750389652106796057638384067568276792218642619756161838094338476170470581645852036305042887575891541065808607552399123930385521914333389668342420684974786564569494856176035326322058077805659331026192708460314150258592864177116725943603718461857357598351152301645904403697613233287231227125684710820209725157101726931323469678542580656697935045997268352998638215525166389437335543602135433229604645318478604952148193555853611059596230656

## Built-in functions

In [5]:
float("42")

42.0

In [6]:
int(2.718281828459045)

2

## The math library

In [7]:
import math

In [8]:
math.sqrt(2)

1.4142135623730951

In [9]:
help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.9/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
        
        The result is between 0 and pi.
    
    acosh(x, /)
        Return the inverse hyperbolic cosine of x.
    
    asin(x, /)
        Return the arc sine (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    asinh(x, /)
        Return the inverse hyperbolic sine of x.
    
    atan(x, /)
        Return the arc tangent (measured in 

## Important

Don't use ``float`` values for currency.

IEEE standards mean **float is an approximation**.

(This is not a Python *problem*.  You'll see Stack Overflow questions that make it seem like it's unique to Python or it's a problem. It's neither.)

Number Theory:

\\[
((a + b) - a) - b == 0
\\]

IEEE Approximations:

In [10]:
((100.0 + .001) - 100.0) - 0.001

4.7748263676261615e-15

It's nearly zero; \\(\approx \frac{1}{2^{48}}\\)

In [11]:
1/2**48

3.552713678800501e-15

It turns out, it's \\(\frac{5505}{2^{60}}\\) 

What's important is that the fraction is based on a power of 2, and anything relatively prime will have possible truncation problems. Since \\(10 = 2 \times 5\\), decimal fractions present a bit of an approximation issue.

For equality tests, use ``math.isclose()``

In [12]:
from math import isclose

In [13]:
isclose(((100.0 + .001) - 100.0) - 0.001, 0.0, abs_tol=1E-10)

True

## Using the decimal module

In [14]:
from decimal import Decimal

In [15]:
cost = Decimal('6.98')
tax_rate = Decimal('.0625')
total = cost + cost*tax_rate
total

Decimal('7.416250')

In [16]:
penny = Decimal('0.01')
total.quantize(penny)

Decimal('7.42')

## Fractions

In [17]:
from fractions import Fraction

Recipe uses \\(\frac{2}{3}\\) of a yam to serve 4.

But. 

Expecting 5 vaccinated guests.

So. \\(\frac{5}{4}\times\frac{2}{3}\\).


In [18]:
yams = Fraction(2, 3)
serves = 4
guests = 5
guests * yams/serves

Fraction(5, 6)

## 2. Strings

Unicode text (not bytes, not ASCII)

In [19]:
"Hello, 🌐"

'Hello, 🌐'

In [20]:
"""
Triple-quoted strings can be very long.
They're used at the beginning of module, class, method, and function definitions.
"""

"\nTriple-quoted strings can be very long.\nThey're used at the beginning of module, class, method, and function definitions.\n"

Note that Python has a preferred "canonical" display for strings. It uses single apostrophe strings. 

## String Transformations

In [21]:
s = "here's some data."

In [22]:
s.title()

"Here'S Some Data."

In [23]:
s.upper()

"HERE'S SOME DATA."

In [24]:
s.split()

["here's", 'some', 'data.']

The []'s indicate a list object, we'll return to this below.

In [25]:
s.index("'")

4

In [26]:
s[4]

"'"

This selects the 4th position of the string, ``s``. 

## Immutability

Strings are immutable. Like numbers, they have no internal state to change.

String transformations create new strings.

The unused old string is removed from memory when no longer needed.

## Fancy f-strings

In [27]:
n = 355
d = 113

In [28]:
f"{n=}, {d=}: {n/d=}"

'n=355, d=113: n/d=3.1415929203539825'

In [29]:
f"{n} / {d} = {n/d:.6f}"

'355 / 113 = 3.141593'

## Equality Tests

Unlike some languages, Python uses ``==``.

In [30]:
s_1 = "Some String"
s_2a = "Some "
s_2b = "String"

In [31]:
s_1 == s_2a + s_2b

True

The ``is`` tests asks if these are the same object.

They're not. The ``id()`` function reveals their internal object ID's are distinct.

In [32]:
s_1 is  s_2a + s_2b

False

In [33]:
id(s_1), id(s_2a + s_2b)

(140579705663344, 140579705546928)

## Raw Strings

In [34]:
r"This has \t in it"

'This has \\t in it'

In [35]:
print(r"This has \t in it")

This has \t in it


In [36]:
"This has \t in it"

'This has \t in it'

In [37]:
print("This has \t in it")

This has 	 in it


Python uses "escape" codes used to create characters not on your keyboard.

A few of these overlap with regular expressions.

Raw strings don't process escape codes. They leave the ``\`` in place.

In [38]:
"This is a \N{PLACE OF INTEREST SIGN} Symbol"

'This is a ⌘ Symbol'

In [39]:
my_piece = "\u265A"
f"Captured {my_piece} \u0021"

'Captured ♚ !'

## Bytes

These are sequences of numbers in the range 0 to 255. ASCII characters can be used.

Any string literal values must have a ``b`` prefix. 

In [40]:
b'\these a\re \bytes'

b'\these a\re \x08ytes'

In [41]:
bytes([65, 83, 67, 73, 73])

b'ASCII'

We built a ``bytes`` object from a list of individual integer values. The ``[]`` created a list.

In [42]:
data = b'some bytes'
data[0]

115

We examined the byte in position 0. 

In [43]:
bytes([115])

b's'

What bytes has code 115? Python displays the ASCII-coded ``b's'`` as its canonical short-hand.

## 3.  Booleans

Values are ``True`` and ``False``. 
Operators are ``and``, ``or``, and ``not``.

In [44]:
f"{True and True=}, {True and False=}, {False and True=}, {False and False=}"

'True and True=True, True and False=False, False and True=False, False and False=False'

In [45]:
f"{True or True=}, {True or False=}, {False or True=}, {False or False=}"

'True or True=True, True or False=True, False or True=True, False or False=False'

``and`` and ``or`` operators "short circuit". They only evaluate the right-hand side if necessary.

If left-side of ``and`` is False, that's it. No need to do more.

If left-side of ``or`` is True, that's it. 

In [46]:
False and 2/0

False

In [47]:
True and 2/0

ZeroDivisionError: division by zero

Further.

All Python objects have a "truthiness" to them. Most objects are True. A few objects are False.

False are values like ``0``, ``[]``, ``{}``, ``set()``, ``""``.

In [48]:
default = "Hello"

user_input = ""
user_input or default

'Hello'

In [49]:
user_input = "Welcome"
user_input or default

'Welcome'

## 4. Tuples

A fixed-length collection of values. Think ordered pairs or ordered triples. There's no defined limit on the size; only the limit imposed by finite memory resources.

Typle literals must have ``,``. They're often wrapped in ``()`` or ``tuple()``. An empty tuple is ``()``.

In [50]:
rgb = (0xc6, 0x2d, 0x42)

In [51]:
rgb

(198, 45, 66)

Singleton tuple special case

In [52]:
t = (42,)
t

(42,)

## Tuples and assignment

In [53]:
here = (35.354490, -82.527040)

In [54]:
lat, lon = here

Note the way the assignment statement decomposes the tuple

In [55]:
lat

35.35449

In [56]:
lon

-82.52704

In [57]:
here[0]

35.35449

In [58]:
here[1]

-82.52704

## Immutability

You cannot assign a new value into the middle of one.

In [59]:
here[0] = 35.4

TypeError: 'tuple' object does not support item assignment

You can, however, create a new tuple from pieces and parts of other tuples.

This works because tuples must have a fixed size with fixed semantics for each item in the tuple.

When in doubt, think (r,g,b) or (lat, lon) or (x,y,z) or some other fixed collection of values.

In [60]:
new_here = (35.4, here[1])

In [61]:
here

(35.35449, -82.52704)

In [62]:
new_here

(35.4, -82.52704)

## Tuple Data Types

Types can be mixed.

In [63]:
color = ("brick red", (0xc6, 0x2d, 0x42))

The ``color`` tuple has two elements: a string and a tuple.

Mixed types work because tuples have a fixed size, and we need to agree on the order of the items.

We describe it like this in a type annotation.

In [64]:
tuple[str, tuple[int, int, int]]

tuple[str, tuple[int, int, int]]

The notebook doesn't use the annotations. Other tools do. We'll see this in the last section when we talk about tools and workflows.

## 5a. Sets

Essential math. Set Intersection, Union, Difference, Symmetric Difference.

\\[\cap, \cup, \setminus, \triangle \\]

While mixed types are allowed, you won't be happy with it. 

Set literals are wrapped in ``{}``. 

Note there's no literal value for an empty use, use ``set()``.

In [65]:
e = {2, 4, 6, 8}
f = {1, 1, 2, 3, 5, 8}

Intersection \\( e \cap f \\)

In [66]:
e & f

{2, 8}

Union \\( e \cup f \\)

In [67]:
e | f

{1, 2, 3, 4, 5, 6, 8}

Subtraction \\( e \setminus f \\)

In [68]:
e - f

{4, 6}

Symmetric Difference \\(e \triangle f\\)

In [69]:
e ^ f

{1, 3, 4, 5, 6}

## Mutability

Sets are mutable -- you can update a set.

In [70]:
values = set()
values.update(e)
values.update(f)
values

{1, 2, 3, 4, 5, 6, 8}

In [71]:
values.intersection_update({3, 5, 7, 9})
values

{3, 5}

## Set Elements

Set elements must be immutable: numbers, strings, tuples

In [72]:
s = set("distinct letters")
s

{' ', 'c', 'd', 'e', 'i', 'l', 'n', 'r', 's', 't'}

In [73]:
s.remove(' ')
s

{'c', 'd', 'e', 'i', 'l', 'n', 'r', 's', 't'}

Lets make three separate set objects.

In [74]:
empty = set()
singleton_string = {'one string'}
singleton_int = {42}

They work as expected.

In [75]:
empty | singleton_string | singleton_int

{42, 'one string'}

Now, let's try to create a set that contains a mutable set and two other immutable objects.

\\[
    \bigl\{ \{ \}, 42, \text{one string} \bigr\}
\\]

In [76]:
{empty, 42, 'one string'}

TypeError: unhashable type: 'set'

The "unhashable" is a hint as to why. We'll return to this when we talk about dictionaries.

## 5b. Lists

Ordered sequence of objects.

They can be of mixed types, but that way lies madness. You're generally happiest with lists of a uniform type.

Literals are wrapped in `[]`.  An empty list is either `[]` or `list()`.

In [77]:
fib = [1, 1]
fib += [2]
fib

[1, 1, 2]

In [78]:
len(fib)

3

Index values are the position of an item in the list. Start from zero. End just before the length of the list.

Length 3: Index positions are 0, 1, and 2.

In [79]:
fib[0]

1

In [80]:
fib[1]

1

In [81]:
fib[2]

2

In [82]:
fib[3]

IndexError: list index out of range

## Reverse Index

Check this out. Negative index values work backwards.

In [83]:
letters = "The quick brown fox"
letters[-1]

'x'

In [84]:
letters[-2]

'o'

In [85]:
pal = "9009"

In [86]:
pal[0] == pal[-1]

True

In [87]:
pal[1] == pal[-2]

True

In [88]:
pal[2] == pal[-3]

True

In [89]:
pal[3] == pal[-4]

True

## Mutability

In [90]:
fib = [1, 1]
fib.append(fib[-1] + fib[-2])
fib.append(fib[-1] + fib[-2])
fib.append(fib[-1] + fib[-2])
fib.append(fib[-1] + fib[-2])
fib.append(fib[-1] + fib[-2])
fib

[1, 1, 2, 3, 5, 8, 13]

The ``append()`` method adds a single item.

In [91]:
words = []
words += ["one"]
words += ["two", "three"]
words

['one', 'two', 'three']

The ``extend()`` method (and the ``+=`` assignment) grow a list with another list.

## Sorting and Reversing

We have methods to update a list to put in into order.

In [92]:
from random import randint

In [93]:
values = [randint(1, 6) for _ in range(10)]
values

[4, 1, 4, 2, 1, 4, 5, 2, 5, 4]

In [94]:
values.sort()

In [95]:
values

[1, 1, 2, 2, 4, 4, 4, 4, 5, 5]

In [96]:
values.reverse()

In [97]:
values

[5, 5, 4, 4, 4, 4, 2, 2, 1, 1]

## List ordering functions

The ``sorted()`` function create a new list from an old list.

The ``reversed()`` function creates an "iterator" from which we can clone the list.

In [98]:
v2 = [randint(1, 6) for _ in range(10)]
v2

[6, 4, 4, 2, 6, 4, 4, 3, 4, 6]

In [99]:
sorted(v2)

[2, 3, 4, 4, 4, 4, 4, 6, 6, 6]

In [100]:
list(reversed(v2))

[6, 4, 3, 4, 4, 6, 2, 4, 4, 6]

In [101]:
v2

[6, 4, 4, 2, 6, 4, 4, 3, 4, 6]

In [102]:
min(v2)

2

In [103]:
max(v2)

6

In [104]:
v2.count(min(v2))

1

## 5c. Dictionaries

A Key➔Value Mapping. 

Literals have ``:`` and are wrapped in ``{}``. The ``dict()`` function expects a sequence of two-tuples. 

In [105]:
words = {"one": 1, "two": 2, "three": 3, "four": 4, "five": 5, "six": 6, "seven": 7, "eight": 8, "nine": 9}

In [106]:
words["two"]

2

In [107]:
words["four"]*10 + words["two"]

42

In [108]:
words.keys()

dict_keys(['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'])

In [109]:
words.values()

dict_values([1, 2, 3, 4, 5, 6, 7, 8, 9])

## Mutability

In [110]:
words["ten"]

KeyError: 'ten'

In [111]:
"ten" in words

False

In [112]:
words["ten"] = 10
words["zero"] = 0

In [113]:
"ten" in words

True

In [114]:
del words["ten"]

In [115]:
"ten" in words

False

## 5d. Composite Objects

Example is a spreadsheet in CSV notation.

Rows of dictionaries with row header and row value.

In [116]:
import csv
from pathlib import Path

In [117]:
source = Path("series_1.csv")
with source.open() as source_file:
    reader = csv.DictReader(source_file)
    data = list(reader)

We've defined the ``Path`` to our data. It's in the current working directory.

We've opened the file in a context (so that it will be properly closed when we're done.)

We've created a Reader for the CSV-format data. This will parse each line of text and create a dictionary for the row of data

We created a list object from the rows of data.

In [118]:
data

[{'x': '10.0', 'y': '8.04'},
 {'x': '8.0', 'y': '6.95'},
 {'x': '13.0', 'y': '7.58'},
 {'x': '9.0', 'y': '8.81'},
 {'x': '11.0', 'y': '8.33'},
 {'x': '14.0', 'y': '9.96'},
 {'x': '6.0', 'y': '7.24'},
 {'x': '4.0', 'y': '4.26'},
 {'x': '12.0', 'y': '10.84'},
 {'x': '7.0', 'y': '4.82'},
 {'x': '5.0', 'y': '5.68'}]

The type annotation is the following

In [119]:
list[dict[str, str]]

list[dict[str, str]]

The values are all strings; we really need them to be float values. That's the topic for part III. Working with the built-in data structures.

Here are some teasers.

In [120]:
for row in data:
    print(f"{float(row['x']):5.2f} {float(row['y']):5.2f}")

10.00  8.04
 8.00  6.95
13.00  7.58
 9.00  8.81
11.00  8.33
14.00  9.96
 6.00  7.24
 4.00  4.26
12.00 10.84
 7.00  4.82
 5.00  5.68


In [121]:
x_values = [float(row['x']) for row in data]

In [122]:
from statistics import mean, stdev

In [123]:
mean(x_values)

9.0

In [124]:
stdev(x_values)

3.3166247903554

## 6. Assignment and Variables

Important additional note on the language at its foundation.

What is a variable?

Lots of languages have variable declarations where a variable is bound to a type.

There's no such thing in Python.

In [125]:
a = 42

In [126]:
type(a)

int

In [127]:
a = "forty-two"

In [128]:
type(a)

str

In [129]:
del a

In [130]:
type(a)

NameError: name 'a' is not defined

Yes. We can delete variable names. They're dynamic; not declared.

How does this work?

Need to switch to a non-iPython session to show this.

```
(python4hr) slott@MacBookPro-SLott ODSC-Live-4hr % python
Python 3.9.6 (default, Aug 18 2021, 12:38:10) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> locals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}
>>> a = 42
>>> locals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'a': 42}
>>>
```

Python variables are a dictionary. The dictionary maps variable names to objects.

We call it a "namespace" because it is the context in which variable names are understood.

Objects have types.

Variables are just a sticky note hanging off the object.

There are other namespaces to for class names, imported modules, warning status, loggers, codecs. Lots of namespaces.

The core assignment statement, ``=``, creates or replaces the labeled value in a namespace. 

## Shared References

A not completely obvious consequence of this is two variables can share a reference to an object.  

This is how function parameters work.

In [131]:
def palindromic(n: int) -> bool:
    n_text = str(n)
    for i in range(len(n_text)):
        if n_text[i] != n_text[-1-i]:
            return False
    return True

In [132]:
palindromic(9009)

True

In [133]:
palindromic(1234)

False

In [134]:
a = 959
palindromic(a)

True

Consider what happens inside the ``palindromic()`` function:

A single object, ``959``, will have two references:

- both ``a`` (in the global namespace)
- and ``n`` (in the function's namespace)

Other obejcts, like the string ``"959"`` assigned to ``n_text`` only has a reference count of one.

When the function is done, objects are removed:

1. The namespace associated with the function evaluation is removed.

2. The objects in the ``locals()`` dictionary are no longer referenced by the namespace. Theese are ``n_text``, ``n``, and ``i``.

3. Objects with a zero reference count (i.e. local objects) are cleaned up.
   Other objects have a non-zero reference count; these are shared.

## Spooky Action at a Distance

This is a rare mistake, but everyone makes it sooner or later.

Two references to a mutable object.

In [135]:
d_1 = {"Some": "Dictionary", "Of": "Values"}

In [136]:
d_1["Like"] = "This"

In [137]:
d_1

{'Some': 'Dictionary', 'Of': 'Values', 'Like': 'This'}

In [138]:
d_2 = d_1

What just happened?

Copy of the entire dictionary?

Shared reference?

In [139]:
del d_2["Some"]

In [140]:
d_2

{'Of': 'Values', 'Like': 'This'}

What happens to ``d_1``?

In [141]:
d_1

{'Of': 'Values', 'Like': 'This'}

This is super handy when you provide a mutable object to a function as an argument value.

But.

Be wary of simply assigning mutable objects to other variables.

If you want a copy, ask for it

In [142]:
d_copy = d_1.copy()

In [143]:
d_copy["Some"] = "Collection"

In [144]:
d_1

{'Of': 'Values', 'Like': 'This'}

In [145]:
d_copy

{'Of': 'Values', 'Like': 'This', 'Some': 'Collection'}

## Wrap-up

1.  Numbers

2.  Strings

3.  Booleans

4.  Tuples

5.  Mutable Data Structures

    a.  Sets

    b.  Lists

    c.  Dictionaries
    
    d.  Composites: list of dict 
    
6. Assignment and Variables

# Questions?

We'll start again with Part 3,  **The Five Kinds of Python Functions**