jd's python 3 testing/reference notebook  
(work in progress!)

Quotations not otherwise cited or linked are from the [Python 3
documentation](https://docs.python.org/3/), Copyright 2001-2023, Python
Software Foundation

In [1]:
import locale
import warnings
import json
import re
import sys
import random

# Built-in data types overview

|                  | type(x)    | x               | notes |
| ---------------- | ---------- | --------------- | ----- |
| [numbers][num]   | int        | `5`             | dynamically sized; includes all the overhead of any other object in Python (a pointer to its type, number of references...); at minimum, 28 bytes/int (`sys.getsizeof(12345)`); NumPy can avoid this overhead (as can the `array` module?)
|                  | float      | `5.0`           | see `sys.float_info` for max value and precision
|                  | complex    | `5 + 3j`        | implemented as two floats
| numeric sequence | range      | `range(`*start*`,`*stop*`,`*step*`)` | ordered, immutable sequence of integers
| text sequence    | str        | `'xyz'`         | ordered, immutable sequence of textual characters
| object sequences | tuple      | `('abc', 123)`  | ordered, immutable sequence of objects
|                  | list       | `['xyz', 890]`  | ordered, *mutable* sequence of objects
| binary sequences | bytes      | `b'Hello'`      | ordered, immutable sequence of bytes (ints 0-255)
|                  | bytearray  | `bytearray(`*size*`)`, or<br>`bytearray(`*iterable*`)`, or<br>`bytearray(`*bytes*`)` | ordered, *mutable* sequence of bytes. Arg can specify array size (which will be zero-filled), an iterable of ints to load in, or a `b'bytes'` sequence.<br>Alternatively, `bytearray.fromhex()` can read a string of hex values to create the array.
|                  | memoryview | `memoryview(bytes(5))`
| mappings         | dict       | `{'a': 1, 'b': 2}` | ordered (fwiw), mutable collection of key:value pairs. keys must be unique, hashable objects; values can be any object
| sets             | set        | `{1, 2, 3}`     | unordered, mutable collection of unique objects
|                  | frozenset  | `frozenset(`*iterable*`)` | unordered, immutable collection of unique objects
| booleans         | bool       | `True`          | https://docs.python.org/3/library/stdtypes.html#truth
| null             | NoneType   | `None`          | "There is exactly one null object, named `None`" (note: automatically returned by functions that don't explicitly return a value)

Some classifications:
- SEQUENCES (ordered collections of objects accessible with 0-based integer
  indices)
    - Per spec, sequences must implement both `__getitem__()` and `__len__()`.
    - Sequences are therefore both iterable and subscriptable.
    - All sequences are [sliceable](#slicing): `range(0,100,10)[2:4]` yields
      `range(20, 40, 10)`
- HASHABLES (any object which you can call `hash()` on)
    - Dictionary keys and set elements must be hashable (because dictionaries
      and sets are implemented using a hash table for lookup)
    - Built-in immutable objects (*e.g.*, `str`, `int`, `bool`, `bytes`) are
      generally 'hashable'
    - The exceptions are `frozenset` and `tuple`, which are hashable iff they
      contain only immutables
    - Mutable objects (*e.g.*, `list`, `dict`, `set`, `tuple`s containing e.g.,
      `list`s) are 'unhashable'.
- ITERABLES (objects which can "return their members one at a time").
    - Generally speaking, there are two ways to implement iterability for an
      object: either an `__iter__()` method that returns an *iterator*, or a `__getitem__()`
      method that implements *sequence semantics*. Practically speaking, iterables
      are one of the following:
    - sequences (accessible with integer indices *0 .. n*): numeric (`range`),
      text (`str`), object (`list`, `tuple`), and binary (`bytes`, `bytearray`,
      `memoryview`)
    - mappings (accessible with arbitrary, hashable keys): `dict`
    - sets: `set`, `frozenset`
    - generators: [generator functions](#generators),
      [generator expressions](#generator-expressions)
    - [file-type objects](#disk) created with `open()`
    - bespoke iterable objects:
       - users may make custom iterable objects in three ways:
       1. \_\_iter__/iterators: an `__iter__()` method returns an
           [iterator][iterator] object stream. The iterator implements a
           `__next__()` method which yields successive objects from the stream
           each time it is called, and raises `StopIteration` when no more
           objects are available.
       2. \_\_getitem__/sequence semantics: a `__getitem__()` method which
          yields successive members of the iterable for interger indices 0 ..
          n, and raises `IndexError` when no more elements are available. (Per
          spec, the object must also implement `__len__()`, although this seems
          to make no difference in my testing.)
       3. sentinal iterators: created out of a regular, callable function and
          the built-in `iter` — see
          [sentinal iterators](#case-3-sentinal-iterator-demo), below
    - See also, the `itertools` module, a toolkit of iterators and building
      blocks.
- SUBSCRIPTABLES:
    - An object is subscriptable if it implements `__getitem__()`.
    - Often iterables are subscriptable and vice versa, but not always:
        - Sets are iterable, but not subscriptable. So is any user-defined
          class which implements an iterator protocol, but not `__getitem__()`.
          (See following example.)
        - `re.Match` objects are subscriptable, but not iterable. (See
          subsequent example.)

Types reference: https://docs.python.org/3/library/stdtypes.html

[num]: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
[iterator]: https://docs.python.org/3/glossary.html#term-iterator

## Iterators and Subscripting

Some test implementations of iterables/iterators and sequences/subscripting

### Case 1: iterator class
(also demonstrates adding subscriptability to a class at runtime)

In an iterable context, such as:
 - *iterable
 - for x in iterable
 - ...
 
Python calls `__iter__()` on the specified iterable, receives an iterator
object back, and then calls `__next__()` on the iterator which returns its
members, one at a time:

In [2]:
# Primes(n) iterates over the first n primes
class Primes():
    def __init__(self, max_iters = 10):
        self.max_iters = max_iters
        self.iterations = 0
        self.current_try = 0

    def __next__(self):
        if self.iterations >= self.max_iters:
            raise StopIteration
        self.iterations += 1

        while True:
            self.current_try += 1
            if Primes.is_prime(self.current_try):
                return self.current_try

    def __iter__(self):
        # both the container (Primes) and iterator (instances) are required to
        # implement __iter__() so that either can be used in, e.g., a for loop
        return self

    def __len__(self):
        return self.max_iters

    def is_prime(i:int):
        if i <= 1:
            return False
        for divisor in range(2, (i//2)+1):
            if i % divisor == 0:
                return False
        return True

prime_iter = Primes(15)
print(f'The first {len(prime_iter)} primes are:', *prime_iter)
# our iterator is exhausted; it should continue to raise StopIteration:
print(f'Anything left in the iterator?:', *prime_iter)
# (Strictly speaking this is a broken iterator, because after
prime_iter.iterations = 12
# it will no longer continue to return StopIteration, as required by the spec)
print(f'Anything left in the iterator?:', *prime_iter)


# We can also use Primes() directly in an iterable context,
#     without manually creating an iterator:
for x in Primes(2):
    print(f'For statement context: {x}')
print(f'Iterable unpacking context:', *Primes(2))


prime_iter2 = Primes(10)
# Despite iterator values being ordered, they are not automatically
#     accessible by index as a "sequence"
try:
    print(prime_iter2[4])
except TypeError as e:
    print(f"ERROR: tried to access iterator by index: {e}")

# Adding a __getitem__() method to Primes allows indexing:
def prime_indexer(self, n):
    return f'[prime #{n}]'  # toy indexer function
Primes.__getitem__ = prime_indexer
# inline version:
#Primes.__getitem__ = lambda self, n: f'[Prime #{n}]'
print(f'Now we can access our iterator by index values: {prime_iter2[4]}')

The first 15 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47
Anything left in the iterator?:
Anything left in the iterator?: 53 59 61
For statement context: 2
For statement context: 3
Iterable unpacking context: 2 3
ERROR: tried to access iterator by index: 'Primes' object is not subscriptable
Now we can access our iterator by index values: [prime #4]


### Case 2: Subscriptable, but not iterable

In [3]:
# subscriptable but not iterable case study: re.Match objects
m = re.match(r"\w+ (\w+)", "Cats woofed, dogs meowed")
print('re.match returns', m)
print('Subscriptable: index "1" is the first parentheses:', m[1])
print('Explicit lookup with re.Match.__getitem__(1):', m.__getitem__(1))
try:
    for item in m:
        print(item)
except TypeError as err:
    print('Tried to iterate but got TypeError:', err)

# it's a mystery to me why Python doesn't fall back to using the sequence
# protocol, since __getitem__ is zero-indexed:
try:
    print('1:', m.__getitem__(0))
    print('2:', m.__getitem__(1))
    print('3:', m.__getitem__(2))
except IndexError as err:
     print('Hit the end of the sequence:', err)
# (while re.Match objects don't have a __len__(), which is required by the
# sequence API, this doesn't seem to matter in my other testing)

# note, re.finditer() can be used when an iterator is needed!

re.match returns <re.Match object; span=(0, 11), match='Cats woofed'>
Subscriptable: index "1" is the first parentheses: woofed
Explicit lookup with re.Match.__getitem__(1): woofed
Tried to iterate but got TypeError: 're.Match' object is not iterable
1: Cats woofed
2: woofed
Hit the end of the sequence: no such group


### Case 3: Sentinal iterator

In [4]:
def random_iterator():
    return random.randint(0, 5)

sentinal_iter = iter(random_iterator, 5) # new iterator object, with 5 as sentinal value
iteration = 0
for num in sentinal_iter:
    iteration += 1
    print(f'Iteration {iteration}: {num}')
print(f'After iteration {iteration} we got our sentinal value')
print('It immediate raised StopIteration, before iterating on the sentinal')

# Iterators are supposed to continue to raise StopIteration
# Sentinal iterators get this good behavior for free:
for num in sentinal_iter:
    print(f'Another iteration: {num}') # not printed!

Iteration 1: 2
Iteration 2: 4
Iteration 3: 3
Iteration 4: 3
After iteration 4 we got our sentinal value
It immediate raised StopIteration, before iterating on the sentinal


### Case 4: Sequence protocol iterator

In [5]:
class SequenceIteratable():
    def __getitem__(self, index:int):
        match index:
            case 0:
                return 'first'
            case 1:
                return 'last'
            case _:
                raise IndexError
    
sequence_iter = SequenceIteratable()
print(type(sequence_iter))
for x in sequence_iter:
    print(x)

# bind the iterator explicitly:
sequence_iter2 = iter(SequenceIteratable())
print(type(sequence_iter2))
for x in sequence_iter2:
    print(x)

<class '__main__.SequenceIteratable'>
first
last
<class 'iterator'>
first
last


### Coda: iterator types
Internally, iterators come in several types, but we typically needn't sweat
those details. Here some are, for reference.

Note that our custom Prime iterator follows the common practice of using the
same object as both iterable and iterator; therefore, the type() of its
iterator is just the class itself. Meanwhile, our sentinal iterator has its own
internal type.

In [6]:
print(type((1,2).__iter__()))
print(type({1,2}.__iter__()))
print(type({1:2,3:4}.__iter__()))
print(type([1,2].__iter__()))
print(type([x for x in (1,)].__iter__()))
print(type((x for x in (1,)).__iter__()))
print(type('sam'.__iter__()))
print(type('s\xfe5m'.__iter__()))
print(type(b'ham'.__iter__()))
print(type(range(10).__iter__()))

print(type(open('jd-python.ipynb').__iter__())) # go go gadget garbage collection, I hope
print(type(prime_iter.__iter__()))
print(type(sentinal_iter.__iter__()))
print(type(iter(sequence_iter))) # NB: has no __iter__ since it's using sequence protocol for iteration

foo = iter((1,2))
print(*foo)
print(*foo)
print('(foo was exhausted at second unpacking)')

<class 'tuple_iterator'>
<class 'set_iterator'>
<class 'dict_keyiterator'>
<class 'list_iterator'>
<class 'list_iterator'>
<class 'generator'>
<class 'str_ascii_iterator'>
<class 'str_iterator'>
<class 'bytes_iterator'>
<class 'range_iterator'>
<class '_io.TextIOWrapper'>
<class '__main__.Primes'>
<class 'callable_iterator'>
<class 'iterator'>
1 2

(foo was exhausted at second unpacking)


-------------------------------------------------------------------------------

# Expressions

An [expression](https://docs.python.org/3/reference/expressions.html) is a
syntactic entity which evaluates to ('yields'/'returns') a value. 
* atomic expressions:
    * names: `name` (return the value pointed to by name)
    * literals: `42`, `'foo'` (return themselves)
    * enclosures:
        * parenthesized expression: `(0)` --> `0`
        * parenthesized tuples: empty pairs of parentheses `()` or parentheses
          containing at least one comma return tuples: `(0,)` --> `(0,)`
        * lists, sets, dicts (with contents either explicitly listed, or
          computed via a [comprehension](#list-comprehensions)): return a new
          list/set/dict
        * [generator expessions](#generator-expressions):
          `(x**2 for x in range(10))`: returns a new generator object
        * `yield` expressions in generator functions
* primary expressions: ("the most tightly bound operations of the language")
    * attribute refereces: `name.attribute`
    * subscription: `container_name[subscript1, subscript2 ...]`
    * slicings: `sequence_name[index1, index2...]`,
      `sequence_name[start:stop:stride]`
    * calls: `callable_name(arg1, arg2, arg3='...')` (functions, built-ins,
      methods, classes)
* unary/binary arithmetic/bitwise operator expressions: `1 + 2` or `~bytes` or
  `"string" + "addenda"`. See [Operators](#Operators) below.
* comparisons and membership tests: `a < b` or `c not in d`. Yield `True` or
  `False`. See [Comparison operators](#comparison-operators) below.
* boolean negation expressions: `not x` (returns `True` if x is false, `False`
  otherwise)
* boolean conjuction expressions: `x and y` (returns `x` if x is false, `y`
  otherwise)
* boolean disjunction expressions: `x or y` (returns `x` if x is true, `y`
  otherwise)
* assignment expressions: whereas assignment (`x = y`) is a statement that
    yields no value, assignment expressions using the "walrus" operator `:=`
    both yield and assign an expression (`x := y` returns `y`, in addition to
    assigning it)
* conditional expressions (aka ternary operator) `x if condition else y`
  (returns either `x` or `y`)
* lambda expressions: `lambda x: x**2` (returns a function object)
* `await` expressions in asynchronous coroutine functions

"Note: neither `and` nor `or` restrict the value and type they return to
False and True, but rather return the last evaluated argument. This is
sometimes useful, e.g., if `s` is a string that should be replaced by a default
value if it is empty, the expression `s or 'foo'` yields the desired value."

In [7]:
def testgen():
    yield None

print(testgen())
print(x for x in range(0))

<generator object testgen at 0x10ec17d70>
<generator object <genexpr> at 0x10eea86c0>


## Operators

Operator precedence:
<https://docs.python.org/3/reference/expressions.html#operator-precedence>

### Mathematical and bitwise operators

In [8]:
a = b = 1   # (int implements all of these)

# Binary mathematical operators
a + b  #Add
a - b  #Sub
a * b  #Mult
a / b  #Div
a // b #FloorDiv
a ** b #Power
a % b  #Mod
#c @ d #matrix multiply (no builtin types have __matmul__ method, cf. NumPy)

# Binary bitwise operators
a | b  # bitwise OR
a ^ b  # bitwise XOR
a & b  # bitwise AND
a << b # left shift
a >> b # right shift

# Unary operators
print(+a) # unary Add
print(-a) # unary Sub
print(~a) # unary bitwise NOT (invert)
print(bin(a), bin(~a))

1
-1
-2
0b1 -0b10


### Comparison operators

These all yield `True` or `False`, unless the corresponding dunder method for
the object has been changed to return something fancier.

In [9]:
a = b = ''   # (str implements all of these)

# value comparisons
a < b   # a.__lt__(b)
a > b   # a.__gt__(b)
a <= b  # a.__le__(b)
a >= b  # a.__ge__(b)
a == b  # a.__eq__(b)
a != b  # a.__ne__(b)

# identity comparisons
a is b      # these use id() to test if a and b are the same object
a is not b  #   https://docs.python.org/3/library/functions.html#id

# membership tests
a in b      # a.__contains__(b), falling back to __iter__(), then __getitem__()
a not in b; #   https://docs.python.org/3/reference/expressions.html#comparisons


### Walrus operator

Assignment *expressions* use the walrus `:=` operator to both yield and assign
the value of an expression. (This is how assignment works by default in C, for
example.)

For regular assignment, see [Assignment statements](#assignment), below.

In [10]:
if (match := re.search('f(o+)b', 'foooooooobar')):
    print(match.group(1))

# is the same as:

match = re.search('f(o+)b', 'foooooooobar')
if match:
    print(match.group(1))

# especially useful in a long if .. elif chain

oooooooo
oooooooo


-------------------------------------------------------------------------------

# Statements

A statement is a syntactic entity which can be executed. (Roughly, statements
are actions or commands.) Statements may or may not return a value.

  * simple statements (comprise only a single line):
      * *expression statements*: expressions used on their own, rather than as
        part of a larger statement. Most expressions are not useful on their
        own (`1+1`). Generally, expression statements are useful insofar as
        they cause desired side effects: e.g., `print()` (an expression which
        evaluates to None) is used on its own for its "side effect" of printing
        to the terminal.
      * *assignment statements*: `name = expression()` (bind a name to a value)
  * complex statements
    * `if` statement
    * `for` and `while` loops
    * ...


## Assignment statements

In [11]:
# assignment
a = 1                     # bind target name 'a' to value 'int(1)'
a = b = c = 1             # bind multiple names to the same value
a, b, c = 1, 2, 3         # tuple unpacking before binding: a == int(1)
foo = 1, 2, 3             # implicit grouping as tuple: foo == tuple(1, 2, 3)
d, e, *f, g = range(1,10) # (PEP 3132): a 'starred' target is greedy:
print(f)                  # [3, 4, 5, 6, 7, 8]

# augmented assignment (plus type coercion demo)
c += 2;     print(str(c).ljust(3), type(c))  # 5
c -= 1;     print(str(c).ljust(3), type(c))  # 4
c *= 2;     print(str(c).ljust(3), type(c))  # 8
c /= 2;     print(str(c).ljust(3), type(c))  # 4.0 - type coercion
c = int(c); print(str(c).ljust(3), type(c))  # 4
c //= 3;    print(str(c).ljust(3), type(c))  # 1 - floor division, no type coercion

# full list, from parser:
# augassign: ('+=' | '-=' | '*=' | '@=' | '/=' | '%=' | '&=' | '|=' | '^=' |
#            '<<=' | '>>=' | '**=' | '//=')

[3, 4, 5, 6, 7, 8]
5   <class 'int'>
4   <class 'int'>
8   <class 'int'>
4.0 <class 'float'>
4   <class 'int'>
1   <class 'int'>
