# Python3 testing/reference notebook

In [9]:
import locale, warnings, json, re, sys, random

# Contents

* [Data types overview](#Data-types-overview)  
* [Iteratability and Subscriptability](#Iteratability-and-Subscriptability)  
    * [Case study: Primes iterator](#Case-study:-Primes-iterator)
    * ...
* [Expressions](#expressions)
* [Statements](#statements)
* [Data types and methods](#data-types-and-common-methods)
    * [Text sequences](#text-sequences-str)

# Data types overview

|                  | type(x)     | x               | notes |
| ---------------- | ----------- | --------------- | ----- |
| [numbers][num]   | int         | `5`             | dynamically sized -- includes all the overhead of any other object in Python (a pointer to its type, number of references...)
|                  | float       | `5.0`           | see `sys.float_info` for max value and precision
|                  | complex     | `5 + 3j`        | implemented as two floats
| numeric sequence | range       | `range(`*start*`,`*stop*`,`*step*`)` | ordered, immutable sequence of integers
| text sequence    | [str][str]  | `'xyz'`         | ordered, immutable sequence of textual characters
| object sequences | tuple       | `('abc', 123)`  | ordered, immutable sequence of objects
|                  | list        | `['xyz', 890]`  | ordered, *mutable* sequence of objects
| binary sequences | bytes       | `b'Hello'`      | ordered, immutable sequence of bytes (ints 0-255)
|                  | bytearray   | `bytearray(`*size*`)`, or<br>`bytearray(`*iterable*`)`, or<br>`bytearray(`*bytes*`)` | ordered, *mutable* sequence of bytes. Arg can specify array size (which will be zero-filled), an iterable of ints to load in, or a `b'bytes'` sequence.
|                  | memoryview  | `memoryview(bytes(5))`
| mappings         | dict        | `{'a': 1, 'b': 2}` | ordered (fwiw), mutable collection of key:value pairs. keys must be unique, hashable objects; values can be any object [hash table]
| sets             | set         | `{1, 2, 3}`     | unordered, mutable collection of unique, hashable objects [hash table]
|                  | frozenset   | `frozenset(`*iterable*`)` | unordered, immutable collection of unique, hashable objects [hash table]
| booleans         | bool        | `True`          | https://docs.python.org/3/library/stdtypes.html#truth
| null             | NoneType    | `None`          | "There is exactly one null object, named `None`" (note: automatically returned by functions that don't explicitly return a value)

Some classifications:
- SEQUENCES (ordered collections of objects accessible with 0-based integer
  indices)
    - Per spec, sequences must implement both `__getitem__()` and `__len__()`.
    - Sequences are therefore both iterable and subscriptable.
    - All sequences are [sliceable](#slicing): `range(0,100,10)[2:4]` yields
      `range(20, 40, 10)`
- HASHABLES (any object which you can call `hash()` on)
    - Dictionary keys and set elements must be hashable (because dictionaries
      and sets are implemented using a hash table for lookup)
    - Built-in immutable objects (*e.g.*, `str`, `int`, `bool`, `bytes`) are
      generally hashable
    - `frozenset` and `tuple` are also hashable iff they contain only immutable
      objects
    - Mutable objects (*e.g.*, `list`, `dict`, `set`, `tuple`s containing e.g.,
      `list`s) are unhashable.
- ITERABLES (objects which can "return their members one at a time").
    - Generally speaking, there are two ways to implement iterability for an
      object: either an `__iter__()` method that returns an *iterator*, or
      `__getitem__()` and `__len__()` methods implementing the *sequence
      protocol*. Practically speaking, iterables are one of the following:
    - sequences (accessible with integer indices *0 .. n*): numeric (`range`),
      text (`str`), object (`list`, `tuple`), and binary (`bytes`, `bytearray`,
      `memoryview`)
    - mappings (accessible with arbitrary, hashable keys): `dict`
    - sets: `set`, `frozenset`
    - generators: [generator functions](#generators),
      [generator expressions](#generator-expressions)
    - [file-type objects](#disk) created with `open()`
    - bespoke iterable objects:
       - users may make custom iterable objects in three ways:
       1. \_\_iter__/iterators: an `__iter__()` method returns an
           [iterator][iterator] object stream. The iterator implements a
           `__next__()` method which yields successive objects from the stream
           each time it is called, and raises `StopIteration` when no more
           objects are available.
       2. \_\_getitem__/sequence semantics: a `__getitem__()` method which
          yields successive members of the iterable for interger indices 0 ..
          n, and raises `IndexError` when no more elements are available. (Per
          spec, the object must also implement `__len__()`, although this seems
          to make no difference in my testing.)
       3. sentinal iterators: created out of a regular, callable function and
          the built-in `iter` — see
          [sentinal iterators](#case-3-sentinal-iterator), below
    - See also, the `itertools` module, a toolkit of iterators and building
      blocks.
- SUBSCRIPTABLES:
    - An object is subscriptable if it implements `__getitem__()`.
    - Often iterables are subscriptable and vice versa, but not always:
        - Sets are iterable, but not subscriptable. So is any user-defined
          class which implements an iterator protocol, but not `__getitem__()`.
          (See following example.)
        - `re.Match` objects are subscriptable, but not iterable. (See
          subsequent example.)

Types reference: https://docs.python.org/3/library/stdtypes.html

[num]: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
[iterator]: https://docs.python.org/3/glossary.html#term-iterator
[str]: #text-sequences-str "Text sequences (str)"

# Iteratability and Subscriptability

Not all iterables are subscriptable, and vice versa. These test cases explore
different facets of iterables/iterators and sequences/subscripting.

## Case study: Primes iterator

In an iterable context (*e.g.* `for x in iterable`, `*iterable`,
comprehensions, generator expressions, `any()`/`all()`, `min()`/`max()`,
`sum()`...), Python calls `__iter__()` on a specified iterable, receives an
iterator object back, and then calls successively calls `__next__()` on the
iterator which returns its members, one at a time. We can create a custom
iterator by defining a class that implements these two methods.

This example also demonstrates adding subscriptability (via `__getitem__`) to
a class at runtime.

In [183]:
class Primes():
    """Iterator yielding prime numbers. Optional parameters:
           max_primes=n: stop iteration after finding the n-th prime
           max_value=v: stop iteration after searching up through v"""
    def __init__(self, max_primes = float('inf'), max_value = float('inf')):
        self.max_primes = max_primes
        self.max_value = max_value
        self.primes = []

    def __next__(self):
        if len(self.primes) == self.max_primes:
           raise StopIteration
        if not self.primes:
            self.primes.append(2)                  # bootstrap
            return 2
        candidate = self.primes[-1]
        while candidate := candidate + 1:
            if candidate > self.max_value:
                raise StopIteration
            candidate_sqrt = candidate ** 0.5      # much faster to pre-compute
            for divisor in self.primes:
                if divisor > candidate_sqrt:       # candidate has no factors
                    self.primes.append(candidate)
                    return candidate
                if candidate % divisor == 0:       # found a factor
                    break                          # skip to next candidate
            else:
                assert False, "This code block should never be reached"

    def __iter__(self):
        # since both the iterable (Primes) and iterator (instances) must
        # implement __iter__(), common practice is to use the same object:
        return self

In [175]:
primes_iter = Primes(10_000)
print(f'The first {len(primes_iter):,} primes are:', *primes_iter)
print('Iterator should continue to raise StopIteration:', *primes_iter)
# (Strictly speaking this is a broken iterator, because after some meddling...
primes_iter.max_primes = 10_005
# it will no longer continue to return StopIteration, per spec)
print(f'Iterator is rejuvenated:', *primes_iter, '\n')

The first 10,000 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 ...output trimmed... 104707 104711 104717 104723 104729
Iterator should continue to raise StopIteration:
Iterator is rejuvenated: 104743 104759 104761 104773 104779 



In [172]:
# The Primes() class can also be used directly in an iterable context:
print(f'First 10 primes:', *Primes(max_primes=10))
print(f'Primes up to 10:', *Primes(max_value=10))

First 10 primes: 2 3 5 7 11 13 17 19 23 29
Primes up to 10: 2 3 5 7


#### Adding len() support to Primes

In [184]:
def primes_len(self):
    if self.max_value == float('inf'):
        if isinstance(self.max_primes, int):
            return self.max_primes
        else:
            raise TypeError(self, 'iterator length is undefined')
    else:
        # unless we've already hit max_primes...
        if len(self.primes) == self.max_primes:
                return self.max_primes
        # spin up a clone instance to finish calculating...
        calc_len = self
        try:
            while next(calc_len):
                pass
        except StopIteration:
            return len(calc_len.primes)

Primes.__len__ = primes_len

print(f'There are {len(Primes(max_value=100))} prime numbers ≤ 100')

######
# here's a curious problem... (add a print() to the len function to test)
# once Primes has a __len__ method, *-unpacking will internally invoke __len__:
#print(*Primes(5))
# although if the unpacking is inside a set, it won't:
#print({*Primes(5)})
# this is a serious problem for Primes(), since len() is potentially expensive
######
#prime = Primes()
#print(len(prime))
#for x in range(10):
#    print(next(prime))
#print(len(prime))

There are 25 prime numbers ≤ 100


#### Adding getitem() and subscriptability to Primes

In [203]:
del Primes.__getitem__  # (reset cell)


prime_indexes = Primes(10)
# Despite iterator values being ordered, they are not automatically
# accessible by index as a "sequence"
try:
    print(prime_indexes[5])
except TypeError as e:
    print(f"ERROR: tried to access iterator by index: {e}")
# Nor is the iterator reversable:
try:
    rev = reversed(prime_indexes)
except TypeError as e:
    print(f"ERROR: tried to create reversed iterator: {e}")


# Adding a __getitem__() method to Primes allows indexing and reversing:
def prime_indexer(self, n):
    if len(self.primes) <= n:
        calc_all = self
        try:
            while next(calc_all):
                pass
        except StopIteration:
            return calc_all.primes[n]
    return self.primes[n]
Primes.__getitem__ = prime_indexer
print(f'Now we can access arbitrary index values: {prime_indexes[5]}')
print(f'And we can run our iterator in reverse: {[x for x in reversed(Primes(10))]}')

ERROR: tried to access iterator by index: 'Primes' object is not subscriptable
ERROR: tried to create reversed iterator: 'Primes' object is not reversible
Now we can access arbitrary index values: 13
And we can run our iterator in reverse: [29, 23, 19, 17, 13, 11, 7, 5, 3, 2]


### Case study: subscriptable but not iterable

re.Match objects

In [24]:
print('re.match returns', m := re.match(r"\w+ (\w+)", "Cats woof, dogs meow"))
print('Subscriptable: index "1" ==', m[1])
try:
    print(group for group in m)
except TypeError as e:
    print('Iterable?: Tried to iterate but got TypeError:', e)

# it's a mystery to me why Python doesn't fall back to using the sequence
# protocol, since __getitem__ is zero-indexed:
try:
    print('__getitem__(0):', m.__getitem__(0))
    print('__getitem__(1):', m.__getitem__(1))
    print('__getitem__(2):', m.__getitem__(2))
except IndexError as e:
     print('Hit the end of the sequence:', e)
# (re.Match objects don't have a __len__(), which is required by the sequence
# API, but this doesn't seem to matter in other similar cases)

re.match returns <re.Match object; span=(0, 9), match='Cats woof'>
Subscriptable: index "1" == woof
Iterable?: Tried to iterate but got TypeError: 're.Match' object is not iterable
__getitem__(0): Cats woof
__getitem__(1): woof
Hit the end of the sequence: no such group


NB, `re.finditer()` can be used when an iterator is needed!

### Case 3: Sentinal iterator

Sentinal iterators can be used to turn a function which doesn't follow the iterator protocol into a genuine iterator. We provide a function and a "sentinal value" to `iter()`, which returns our iterator object. Our iterator will, when iterated on, call the function and yield its return value — unless the function returns the sentinal value, in which case our iterator raises `StopIteration`.

In [136]:
def random_iterator():
    return random.randint(0, 5)

sentinal_iter = iter(random_iterator, 5) # new iterator object, sentinal value=5
print(type(sentinal_iter))

i = -1
for i,num in enumerate(sentinal_iter):
    print(f'Iteration {i+1}: {num}')
print(f'After {i+1} iterations we rolled our sentinal value')
print("Note that we never iterate on the sentinal value; StopIteration is raised immediately")

# Iterators are supposed to continue to raise StopIteration
# Sentinal iterators get this good behavior for free, even if the function
# wouldn't otherwise continue to yield the sentinal value:
for num in sentinal_iter:
    print(f'Another iteration: {num}') # not printed!
try:
    sentinal_iter.__next__()
except StopIteration:
    print("Sentinal iterators (appropriately) continue to raise StopIteration")
    print("Even if the function wouldn't continue to yield the sentinal value")

<class 'callable_iterator'>
Iteration 1: 2
Iteration 2: 1
After 2 iterations we rolled our sentinal value
Note that we never iterate on the sentinal value; StopIteration is raised immediately
Sentinal iterators (appropriately) continue to raise StopIteration
Even if the function wouldn't continue to yield the sentinal value


### Case 4: Sequence protocol iterator

In [34]:
class SequenceIteratable():
    def __getitem__(self, index:int):
        match index:
            case 0:
                return 'first'
            case 1:
                return 'last'
            case _:
                raise IndexError
    
    def __len__(self):
        return 2            # len() support isn't needed for the below uses, but
                            # is required by the sequence protocol, and is used
                            # e.g., for reverse()
    
sequence_iter = SequenceIteratable()
print(type(sequence_iter))
for x in sequence_iter:
    print(x)

# bind the iterator explicitly:
sequence_iter2 = iter(SequenceIteratable())
print(type(sequence_iter2))
for x in sequence_iter2:
    print(x)

<class '__main__.SequenceIteratable'>
first
last
<class 'iterator'>
first
last


### Case 4: generator functions

Generator functions are another type of iterator, w

In [11]:
def testgen():
    yield None

print(testgen())
print(x for x in range(0))

<generator object testgen at 0x11264c460>
<generator object <genexpr> at 0x112477440>


### Epilogue: interator miscellanea

In [78]:
print('"*" unpacking:')
foo = iter((1,2,3))
print(type(foo))
print('Everything from foo:', *foo)
print('Everything from foo:', *foo) # foo was exhausted during first unpacking

print('\nreversed() built-in function returns a reverse iterator:')
rev = reversed(sequence_iter)
print(type(rev))
print('Reverse iterator:', *rev)

"*" unpacking:
<class 'tuple_iterator'>
Everything from foo: 1 2 3
Everything from foo:

reversed() built-in function returns a reverse iterator:
<class 'reversed'>
Reverse iterator: last first


### Coda: iterator types
Internally, iterators come in several types, but we typically needn't sweat
those details. Here some are, for reference.

In [79]:
print("tuple's __iter__:      ", type((1,2).__iter__()))
print("set's __iter__:        ", type({1,2}.__iter__()))
print("dict's __iter__:       ", type({1:2,3:4}.__iter__()))
print("list's __iter__:       ", type([1,2].__iter__()))
print("list comp's __iter__:  ", type([x for x in (1,)].__iter__()))
print("generator's __iter__:  ", type((x for x in (1,)).__iter__()))
print("str's __iter__:        ", type('sam'.__iter__()))
print("str's __iter__:        ", type('s\xfe5m'.__iter__()))
print("byte's __iter__:       ", type(b'ham'.__iter__()))
print("range()'s __iter__:    ", type(range(10).__iter__()))
print("file object's __iter__:", type(open('python-reference.ipynb').__iter__()))
# Because our custom Prime iterator follows the common practice of using the
# same object as both iterable and iterator, the type() of its iterator is just
# the class itself:
print("iterator instance's __iter__:", type(prime_iter.__iter__()))
print("sentinal iterator's __iter__:", type(sentinal_iter.__iter__()))
# Sequence protocol iterator has no __iter__:
print("sequence protocol iterator:  ", type(iter(sequence_iter)))

tuple's __iter__:       <class 'tuple_iterator'>
set's __iter__:         <class 'set_iterator'>
dict's __iter__:        <class 'dict_keyiterator'>
list's __iter__:        <class 'list_iterator'>
list comp's __iter__:   <class 'list_iterator'>
generator's __iter__:   <class 'generator'>
str's __iter__:         <class 'str_ascii_iterator'>
str's __iter__:         <class 'str_iterator'>
byte's __iter__:        <class 'bytes_iterator'>
range()'s __iter__:     <class 'range_iterator'>
file object's __iter__: <class '_io.TextIOWrapper'>
iterator instance's __iter__: <class '__main__.Primes'>
sentinal iterator's __iter__: <class 'callable_iterator'>
sequence protocol iterator:   <class 'iterator'>


# Expressions

An [expression](https://docs.python.org/3/reference/expressions.html) is a
syntactic entity which evaluates to ('yields'/'returns') a value. 
* atomic expressions:
    * names: `name` (return the value pointed to by name)
    * literals: `42`, `'foo'` (return themselves)
    * enclosures:
        * parenthesized expression: `(0)` --> `0`
        * parenthesized tuples: empty pairs of parentheses `()` or parentheses
          containing at least one comma return tuples: `(0,)` --> `(0,)`
        * lists, sets, dicts (with contents either explicitly listed, or
          computed via a [comprehension](#list-comprehensions)): return a new
          list/set/dict
        * [generator expessions](#generator-expressions):
          `(x**2 for x in range(10))`: returns a new generator object
        * `yield` expressions in generator functions
* primary expressions: ("the most tightly bound operations of the language")
    * attribute refereces: `name.attribute`
    * subscription: `container_name[subscript1, subscript2 ...]`
    * slicings: `sequence_name[index1, index2...]`,
      `sequence_name[start:stop:stride]`
    * calls: `callable_name(arg1, arg2, arg3='...')` (functions, built-ins,
      methods, classes)
* unary/binary mathematical/bitwise operator expressions: `1 + 2` or `~bytes` or
  `"string" + "addenda"`. See [Operators](#Operators) below.
* comparisons and membership tests: `a < b` or `c not in d`. Yield `True` or
  `False`. See [Comparison operators](#comparison-operators) below.
* boolean negation expressions: `not x` (returns `True` if x is false, `False`
  otherwise)
* boolean conjuction expressions: `x and y` (returns `x` if x is false, `y`
  otherwise)
* boolean disjunction expressions: `x or y` (returns `x` if x is true, `y`
  otherwise)
* assignment expressions: whereas assignment (`x = y`) is a statement that
    yields no value, assignment expressions using the "walrus" operator `:=`
    both yield and assign an expression (`x := y` returns `y`, in addition to
    assigning it)
* conditional expressions (aka ternary operator) `x if condition else y`
  (returns either `x` or `y`)
* lambda expressions: `lambda x: x**2` (returns a function object)
* `await` expressions in asynchronous coroutine functions

"Note: neither `and` nor `or` restrict the value and type they return to
False and True, but rather return the last evaluated argument. This is
sometimes useful, e.g., if `s` is a string that should be replaced by a default
value if it is empty, the expression `s or 'foo'` yields the desired value."

## Operators

Operator precedence:
<https://docs.python.org/3/reference/expressions.html#operator-precedence>

### Mathematical and bitwise operators

In [17]:
a = b = 1   # (int implements all of these)

# Mathematical operators
a + b  #Add
a - b  #Sub
a * b  #Mult
a / b  #Div
a // b #FloorDiv
a ** b #Power
a % b  #Mod
#c @ d #matrix multiply (no builtin types have __matmul__ method, cf. NumPy)
+a     #unary Add: a.__pos__(), generally no effect on value
-a     #unary Sub: a.__neg__(), generally inverts sign

# Bitwise operators
a & b  # bitwise AND           turn off k'th bit: n & ~(1 << (k-1))
a | b  # bitwise OR             turn on k'th bit: n | (1 << (k-1))
a ^ b  # bitwise XOR               flip k'th bit: n ^ (1 << (k-1))
a << b # left shift
a >> b # right shift
~a     # unary bitwise NOT (invert)

-2

### Comparison operators

These all yield `True` or `False`, unless the corresponding dunder method for
the object has been changed to return something fancier.

In [9]:
a = b = ''   # (str implements all of these)

# value comparisons
a < b   # a.__lt__(b)
a > b   # a.__gt__(b)
a <= b  # a.__le__(b)
a >= b  # a.__ge__(b)
a == b  # a.__eq__(b)
a != b  # a.__ne__(b)

# identity comparisons
a is b      # these use id() to test if a and b are the same object
a is not b  #   https://docs.python.org/3/library/functions.html#id

# membership tests
a in b      # a.__contains__(b), falling back to __iter__(), then __getitem__()
a not in b; #   https://docs.python.org/3/reference/expressions.html#comparisons

#### Chained comparison

For more concise and readable code, comparison operators can be chained

In [3]:
5 < 10 <= 10 == 10 != 5
# evaluated as: (5 < 10) and (10 <= 10) and (10 == 10) and (10 != 5)

True

### Boolean logic operators

TODO: document how non-bools interact with the boolean operators, e.g., type(0 and 1) -> int

In [2]:
a = True
b = False

print(not a)
print(a and b)
print(a or b)

False
False
True


### Walrus operator

Assignment *expressions* use the walrus `:=` operator to both yield and assign
the value of an expression. (This is how assignment works by default in C, for
example.)

For regular assignment, see
[assignment statements](#assignment-statements), below.

In [10]:
if (match := re.search('f(o+)b', 'foooooooobar')):
    print(match.group(1))

# is the same as:

match = re.search('f(o+)b', 'foooooooobar')
if match:
    print(match.group(1))

# especially useful in a long if .. elif chain

oooooooo
oooooooo


# Statements

A statement is a syntactic entity which can be executed. (Roughly, statements
are actions or commands.) Statements may or may not return a value.

  * simple statements (comprise only a single line):
      * *expression statements*: expressions used on their own, rather than as
        part of a larger statement. Most expressions are not useful on their
        own (`1+1`). Sometimes, however, expressions cause desired side
        effects: e.g., `print()` (an expression which evaluates to None) is
        used on its own for its "side effect" of printing to the terminal.
      * *assignment statements*: `name = expression()` (bind a name to a value)
      * `del` statements
      * `import`
      * `break`, `continue`
      * `pass` statements: "when a statement is required syntactically, but no
        code needs to be executed"
      * `return`, `yield`
      * `raise`, `assert`
      * `nonlocal`, `global` scope statements
  * complex statements
    * `if` statement
    * `for` and `while` loops
    * `with` statements (and context managers)
    * ...


## Assignment statements

In [11]:
# assignment
a = 1                     # bind target name 'a' to value 'int(1)'
a = b = c = 1             # chain assignment, bind multiple names to the same object
a, b, c = 1, 2, 3         # tuple unpacking before binding: a == int(1)
foo = 1, 2, 3             # implicit grouping as tuple: foo == tuple(1, 2, 3)
d, e, *f, g = range(1,10) # (PEP 3132): a 'starred' target is greedy:
print(f)                  # [3, 4, 5, 6, 7, 8]



[3, 4, 5, 6, 7, 8]
5   <class 'int'>
4   <class 'int'>
8   <class 'int'>
4.0 <class 'float'>
4   <class 'int'>
1   <class 'int'>


In [None]:
# augmented assignment (plus type coercion demo)
c += 2;     print(str(c).ljust(3), type(c))  # 5
c -= 1;     print(str(c).ljust(3), type(c))  # 4
c *= 2;     print(str(c).ljust(3), type(c))  # 8
c /= 2;     print(str(c).ljust(3), type(c))  # 4.0 - type coercion
c = int(c); print(str(c).ljust(3), type(c))  # 4
c //= 3;    print(str(c).ljust(3), type(c))  # 1 - floor division, no type coercion

# full list, from the parser:
#     +=    -=    *=    /=    //=
#     %=    **=   @=
#     &=    |=    ^=    <<=   >>=

# Data types and common methods

## Text sequences (str)

### String literals
Adjoining string literals are automatically concatenated, as if the + operator were given:

In [1]:
'foo'      'bar'

'foobar'

In [None]:
'foo' \
    'bar'

'foobar'

In [None]:
('foo'
 'bar')

'foobar'

#### Raw string literals (r'')
String literals normally interpret [backslash escapes](https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences)
similar to in C.  
String literals prefixed with r (or R) (mostly) ignore backslashed escape sequences.

In [None]:
print(r'String \freely\ \using\ backslashes'
     '\nString with newlines,\n \ttabs,'
     '\nUnicode characters by \u2605code point or name\N{Black Star}')

String \freely\ \using\ backslashes
String with newlines,
 	tabs,
Unicode characters by ★code point or name★


### String interpolation/formatting

#### (1) % operator (printf style)
% is a binary operator that can be applied to any string literal for "printf
style" formatting.  
Note % works in any context, not just within print()  
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting

In [None]:
# format operator: %[(name)][conversion-flags][min-width][.precision]type
#    (name): a named key, for dictionary (rather than tuple) lookup
#    conversion-flags: 0 (zero-padded)     - (left align)     # (use alt. form)
#                      + (+ before pos. num)     ' ' (space before pos. num)
#    min-width/.precision: '*' will read width/precision from next tuple value
#    type: i/d (int)               o  (oct)                  x/X (hex) 
#          f/F (decimal float)    e/E (exponential float)    g/G (auto float)
#           s  (str())             r  (repr())                a  (ascii())
#           c  (single character, given as a string or integer)

print('%% string: %s' % 'foo')                # single substitution
print('%% string: %s %05.1f' % ('foo', 42))   # tuple of substitutions (% only accepts one argument)
print('%% string: %(name)s %(num)i' % {'num': 42, 'name': 'foo'})  # dict for keyed substitution

print('%s' % 'foo%s' % 'bar')                 # chained % operators
print('use a second %%, not a backslash, to escape %s' % 'the operator')

% string: foo
% string: foo 042.0
% string: foo 42
foobar
use a second %, not a backslash, to escape the operator


#### (2) str.format() -- Python 3.0
* https://docs.python.org/3/library/string.html#format-string-syntax  
* [PEP 3101](https://peps.python.org/pep-3101/)

In [None]:
# replacement field: { [name] [! conversion] [: format specifier] }
# conversions: "r" | "s" | "a" (call repr()/str()/ascii() on the value)
# format_spec: see below
print('empty braces:\t',  '{} {:.1f}'.       format('sub', 36))
print('numbered braces:', '{1} {0:.1f}'.     format(36, 'sub'))
print('named string:\t',  '{name} {num:.1f}'.format(name = 'sub', num = 36))

empty braces:	 sub 36.0
numbered braces: sub 36.0
named string:	 sub 36.0


#### (3) f-strings (f'') -- Python 3.6
Allows for the direct interpolation of almost-arbitrary single-line expressions
within a string literal. f-strings reuse the main features of str.format() (the
`__format__` protocol, [format specification
syntax](#format-specification-syntax), and `!` conversion) while making them
easier to use and more readable.
* https://docs.python.org/3/reference/lexical_analysis.html#f-strings
* [PEP 498](https://peps.python.org/pep-0498/)

In [None]:
# replacement field: { expression [=] [! conversion] [: format specifier] }
#   =  include the expression in the substitution along with its value
#   !  conversion: "r" | "s" | "a" (call repr()/str()/ascii() on the value)
#   :  format specification: (see next section)

# unlike str.format(), f-strings support near-arbitrary simple expressions:
# (`yield`, comments, backslashes, and function definitions are not allowed)
data = [-2, 3, -5, 10]
print(f'List comprehension: {[i**2 for i in data]}')
print(f'Ternary conditional: {"Even" if data[-1] % 2 == 0 else "Odd"}')
print(f"Generator expression: {sum(i**2 for i in data)}")
# ":" and "!", which mark the start of a conversion or format specifier, must
# be enclosed in a string or ()/[]/{}, so careful with walrus and lambdas:
print(f'Walrus: {(d := data[0])} {(d := d + 1)} {(d := d + 1)}')   # NB: evaluated L-to-R
print(f'Lambda: {list(filter(lambda x: x>0, data))}')
# However, != is allowed as a special case:
print(f'{0 != 1}')

# nested {} allows use of an expression to generate the the format specifier:
names={'Eddie', 'Sebastian', 'Sam'}
for name in names:
    print(f'| {name:^{max(len(n) for n in names)}} |')

print(f'"=" will include the expression in the substitution as well:\n\t{name=}')

print(f'use a second {{ brace }}, not a backslash, to escape a brace')

List comprehension: [4, 9, 25, 100]
Ternary conditional: Even
Generator expression: 138
Walrus: -2 -1 0
Lambda: [3, 10]
True
|   Eddie   |
| Sebastian |
|    Sam    |
"=" will include the expression in the substitution as well:
	name='Sam'
use a second { brace }, not a backslash, to escape a brace


### Format specification syntax

f-strings, str.format, and the builtin format() all use a common "format
specification" syntax to specify formatting for string conversion during
interpolation.  
https://docs.python.org/3/library/string.html#formatspec

In [None]:
# f-string vs str.format() vs format() equivalence:
print(f'a{"b":>5}')
print('a{:>5}'.format('b'))
print('a' + format('b', '>5'))

a    b
a    b
a    b


In [None]:
# [[fill]align][sign]["z"]["#"]["0"][width][grouping_option]["." precision][type]

# fill: any character (default, ' ')

# align:
print(f'| {"left":<20} |')
print(f'| {"center":^20} |')
print(f'| {"right":>20} |')

# sign:
print(format(1000, '+'))    # '+' before positive numbers
print(format(1000, ' '))    # ' ' before positive numbers

# z: coerce negative zero float to positive
print(f'{-1.0 / float("inf"): }')
print(f'{-1.0 / float("inf"): z}')

# #: use alternate form for numeric types
print(f'{0xbad:x}')
print(f'{0xbad:#x}')

# width: total field width, including separators, signs, etc.
# 0: preceding the width with a '0' "enables sign-aware zero-padding":
print(format(1000, '+06'))    # +01000
print(format(1000, '0>+6'))   # 0+1000
print(format(1000, '0<+6'))   # +10000
print(format(1000, '0=+6'))   # +01000  (= alignment puts padding b/w sign and digits)

# grouping_option: use _ or , as thousands separator
print(format(1000, '_'))    # + before positive numbers
print(format(1000, ','))    # space before positive numbers

# precision:
"""
The precision is a decimal integer indicating how many digits should be displayed after the decimal point for presentation types 'f' and 'F', or before and after the decimal point for presentation types 'g' or 'G'. For string presentation types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The precision is not allowed for integer presentation types.
"""


# type
# https://www.w3schools.com/python/ref_string_format.asp has a lot for formatting nummbers
print(f'the binary of {age} is {age:08b}')




| left                 |
|        center        |
|                right |
+1000
 1000
-0.0
 0.0
bad
0xbad
+01000
0+1000
+10000
+01000
1_000
1,000


### String methods

https://docs.python.org/3/library/stdtypes.html#string-methods

In [None]:
s = "foo\u00df"

# Case alteration
print(f'{s.capitalize()=}')
print(f'{s.casefold()=}')


print(f'{s.format("bar")=}')
print(f'{s.center(20, "_")=}')
print(f'{s=:_^20}')             # centering with f-string format spec, for comparison

print(f'{s.count("o")=}')       # 
print(f'{s.find("o")=}')        # index value of first occurrence, -1 if not found
print(f'{s.index("o")=}')       # index value of first occurrence, ERROR if not found
# str.endswith()
# str.startswith()

# Tests
print(s.isalpha())   # True iff nonempty and each character is a Unicode "Letter"
print(s.isdecimal()) # True iff nonempty and each character is Unicode Numeric_Type=Decimal (base-10 number representors)
print(s.isdigit())   # True iff nonempty and each character is Unicode Numeric_Type=Decimal|Digit
print(s.isnumeric()) # True iff nonempty and each character is Unicode Numeric_Type=Decimal|Digit|Numeric
print(s.isalnum())   # True iff nonempty and each character isalpha() or isnumeric()
print(s.isascii())   # True iff empty, or all characters have code point <= 0x7f


print(f'{s.islower()=}  \tTrue iff all cased characters are lowercase and there is at least one cased character')
print(f'{s.isupper()=}  \tTrue iff all cased characters are uppercase and there is at least one cased character')
# istitle




print(s.split('o'))
print('o'.join(s.split('o')))
print(s.join(('a', 'b')))
# splitlines

# also skipping encode, expandtabs




s.capitalize()='Fooß'
s.casefold()='fooss'
s.center(20, "_")='________fooß________'
s=________fooß________
s.count("o")=2
s.find("o")=1
s.index("o")=1
s.format("bar")='fooß'
True
False
False
False
True
False
s.islower()=True  	True iff all cased characters are lowercase and there is at least one cased character
s.isupper()=False  	True iff all cased characters are uppercase and there is at least one cased character
['f', '', 'ß']
fooß
afooßb


## Numbers (int, float, complex)

In [25]:
# int
i = 1_000_000                     # underscores in numeric literals are ignored
print(f'{type(i)} {i:_} {i:,}')   # output with underscore/comma grouping
# int's in Python include all the overhead of any other Python object; use NumPy, et al., to avoid this
print(f'int in Python consumes at minimum {sys.getsizeof(0)=} bytes')   # (28 in Python 3.11)
print(f'OverflowError converting int to str when ≥ {sys.int_info.default_max_str_digits} digits')

# float
f = 4.5
print(type(f), f.as_integer_ratio(), f.conjugate(), f.is_integer(), sys.float_info.max)

# complex
c = 3+4j
print(type(c), c.real, c.imag, c.conjugate(), c/c)

<class 'int'> 1_000_000 1,000,000
int in Python consumes at minimum sys.getsizeof(0)=28 bytes
OverflowError converting int to str when ≥ 4300 digits
<class 'float'> (9, 2) 4.5 False 1.7976931348623157e+308
<class 'complex'> 3.0 4.0 (3-4j) (1+0j)


### Alternate bases

In [45]:
x = 5
y = 0b101                         # int specified in binary
z = int('10', 5)                  # int specified in base-5
print(x == y == z)                # all three stored as int(5)
print(z, bin(z), oct(z), hex(z))  # convert integer value to bin/oct/hex string representation
print(int(bin(z), 2))             # (inverse functions)
print(f"{z} requires {z.bit_length()} bits; {z.bit_count()} bits are 'on'")

True
5 0b101 0o5 0x5
5
5 requires 3 bits; 2 bits are 'on'


## Binary sequences (bytes, bytearrray)

In [6]:
print('--Bytes--')
foo = b'Hello'
print(type(foo), foo)
print(type(foo[0:1]), foo[0:1])     # access via slice -> bytes
print(type(foo[0]), foo[0])         # access via index -> int
try:                                # immutable
    foo[2] += 1
except TypeError as e:
    print(f'Tried to modify bytes but {e}.')
print(foo, *(foo[c] for c in range(len(foo))))

print('\n--Bytearray--')
bar = bytearray(b'Hello')
baz = bytearray.fromhex('48 65 6c 6c 6f')   # equivalent
print(bar == baz)
print(type(baz), baz)
print(type(bar[0:1]), bar[0:1])     # access via slice -> bytearray
print(type(bar[0]), bar[0])         # access via index -> int
bar[2] += 1                         # mutable
print(bar, *(bar[c] for c in range(len(bar))))

print('\n--Character value comparison--')
# When working with ASCII text as a bytes/bytearray, taking indices of those
# sequences will in effect give us the ASCII value (Unicode code point) of the
# 'character' (since it's stored as an unencoded byte).
# With strings, we need to use ord() to yield the Unicode code point:
print('H:',
      ord('Hello'[0]),
      b'Hello'[0],
      bytearray(b'Hello')[0])

--Bytes--
<class 'bytes'> b'Hello'
<class 'bytes'> b'H'
<class 'int'> 72
Tried to modify bytes but 'bytes' object does not support item assignment.
b'Hello' 72 101 108 108 111

--Bytearray--
True
<class 'bytearray'> bytearray(b'Hello')
<class 'bytearray'> bytearray(b'H')
<class 'int'> 72
bytearray(b'Hemlo') 72 101 109 108 111

--Character value comparison--
H: 72 72 72


Quotations not otherwise cited or linked are from the [Python 3
documentation](https://docs.python.org/3/), Copyright 2001-2023, Python
Software Foundation