# CH1-2

## TOC<a id='toc'></a>
* [Ch1 Notes](#ch1_notes)
* [Ch2 Notes](#ch2_notes)

### CH1 Notes <a id='ch1_notes'></a>
[toc](#toc)

The Python Data Model
* **special methods** aka **magic methods** aka **dunder methods**
    - Methods implemented by python objects, meant to be called by python interpreter (not you), which are the back bone for the python data model - i.e. pythonic behavior
    - `len(obj)` actually calls obj.`__len__` 
        * not always - for very native types, just calls an attribute
    - the function len() associated with len is called the **built-in function** associated with the special method
    - ex: `__len__`, `__repr__`, `__str__`
* **namedtuple** used for bundles of attributes without methods
* to use `from random import choice` all your objects needs is a `__getitem__`
    - also allows iteration
    - the fact that getitem is enough for iteration is for legacy reasons. It predates the iterator protocol. The current iterator protocol is
        1. check for iter method. If it exists use the new iteration protocol
        2. otherwise try calling getitem with successively larger integer values until it raises an Index-Error
* interactive console and debugger call `__repr__` on results of expressions
    - use this to "identify" object - should be unambiguous and if possible, string should match code necessary to recreate object.
    - if str not implemented, then `str(obj)` falls back on repr
* *in* operator calls `__contains__`
    - if not implemented, but is iterable, in just does a sequential scan 
    - so getitem suffices.
* What in python is called the "Python data model", most authors would call the "Python object model"
    - has also been called the **Metaobject Protocol**

#### Questions
* <font color=red> Do all dunder methods have associated built-in function(obj) that calls them? </font>

### CH2 Notes <a id='ch2_notes'></a>
[toc](#toc)

* sequence types (impemented in C)
* Division 1:
    - Container sequences: lists, tuples, collections.deque
        * hold references to objects (can this be nested)
    - Flat sequences: str, bytes, bytearray, memoryview, array.array 
        * store values in its own memory (more compact but can only hold primitives)
* Division 2:
    - mutable: lists, collections.deque, bytearray, memoryview, array.array
    - Immutable: tuples, str, bytes

In [4]:
myTup = (2,3)
myTup[1]

3

In [5]:
myTup[1] = 5

TypeError: 'tuple' object does not support item assignment

### listcomps and genexps
* use listcomps when the intent is to build a list (for loops for other stuff, like processing stuff)
    - never use listcomp just for its side-effects [not readable code]
* *filter* and *map* can be used also, but readability suffers
    - also they are not faster than listcomps
* To fill up other sequence types, use a **genexp** (generator expression)
    - saves memory because it yields items one by one using the *iterator protocol* instead of building the whole thing just to feed to another constructor
    - syntax: replace [] --> ()
    - if genexp is single argumnet to function call, no need to duplicate parenthesis

In [8]:
list(filter(lambda x: x%2, [1,2,3,4,5]))

[1, 3, 5]

In [9]:
[x for x in [1,2,3,4,5] if x%2]

[1, 3, 5]

### tuples are note just mutable lists
* can be used as immutable lists, but also as *records with no field names* - position of item gives its meaning
* **tuple unpacking** - most commonly used in *parallel assignment*, also used in argument unpacking (putting * ahead ot tuple in func call)
    - works with any iterable so long as number of objects in tuple match number of receiving ojects 
        * can use * to grab excess items
    - works with nested structures ( so long as you match nesting structure)
        * before python 3, it was possible to define functions with nested tuples in it formal paramters - this was removed for practical reasons (can still call using tuples of course)
* if want to name records - use `collections.namedtuple`
    - it is a factory that produces subclasses of tuple enhanced with field names and class name
    - data must be passed a possitional arguments to the constructor, wheras tuple constructor takes single iterable
    - useful attributes: class atribute `_fields`, class method `_make(iterable)` and instance method `_asdict()`

In [11]:
a,b, *rest, c = range(6)

In [13]:
rest

[2, 3, 4]

In [14]:
from collections import namedtuple

In [15]:
Card = namedtuple('Card', ['rank', 'suit'])
myCard = Card(3,'hearts')

In [21]:
myCard, myCard.rank, myCard[1], myCard._fields

(Card(rank=3, suit='hearts'), 3, 'hearts', ('rank', 'suit'))

In [22]:
Card._make([3, 'spades'])

Card(rank=3, suit='spades')

In [23]:
Card([3,'spades'])

TypeError: __new__() missing 1 required positional argument: 'suit'

### Slicing
* excluding last item in slice (among other things) makes it easy to split sequence: `my_list[:x]` and `my_list[x:]`
* notation a:b:c is only valid within [] when used as indexing or subscript operator and it produces a slice object `slice(a,b,c)`
    - `seq[a:b:c]` ---> `seq.__getitem__(slice(a,b,c))`
    - can assign names to slices - like cell ranges in spreadsheets - very useful when parsing fixed witdth string docs
* ellipsis, written as ..., is a recognized token by python parser. Alias for **Ellipsis** object, single isntance of **ellipsis** class.
    - can be passed as argument to functions and as part of slice specification. Numpy uses it as shortuct when slicing multidimensional arrays
     - ex: `x[i, ...] --> x[i, :, :, :]`
     - unaware of uses in standard library
* mutable sequences can be grafted, excised and otherwise modified in place using slice notation on left hand side of assignment operator (need iterable on right hand side)
    - ex: `l = list(range(10));  l[3:7] = [-1,2]`
    - This means repeated concatenation of immutable sequences is inefficient, because instead of appending new items it copies the whole target sequences.
        * Claim: this is not the case for str - optimized in cpython because += so common for them, so allocated with extra memory.
        * <font color=red> So this implies that no extra copying for concats of **mutable** sequences. However, in coding, when I loop extend a list, it takes forever - Dirk's claim is that it is the reallocation of memory and copying. TEST THIS </font>
* <font color=blue> useful recipe: naming a slice - improves readability </font>

In [26]:
...

Ellipsis

In [9]:
l = list(range(10))

In [10]:
l[3:7] = [-1,-2]
l

[0, 1, 2, -1, -2, 7, 8, 9]

In [11]:
l[1] = [11,12,13,14]
l

[0, [11, 12, 13, 14], 2, -1, -2, 7, 8, 9]

In [12]:
l[2:2] = [0,0,0,0,0]
l

[0, [11, 12, 13, 14], 0, 0, 0, 0, 0, 2, -1, -2, 7, 8, 9]

### +, *, +=, *= with sequences
* careful: `x = [['_'] * 3 for _ in range(3)]` differs from `x = [['_'] * 3] * 3`
    - the latter has same copy of iner list three times. so modifying  `x[0][0] = 'O'`, actually modfies `x[1][0]` and `x[2][0]`
* `+=` calls `__iadd__` (in-place addition), and if not implemented falls back on `__add__` that is `(a = a + b)`
    - sim `__imul__`

In [13]:
t = (1,2,3)
id(t)

2338855644184

In [14]:
s = t
id(s)

2338855644184

In [15]:
t *=2
t

(1, 2, 3, 1, 2, 3)

In [17]:
s

(1, 2, 3)

In [16]:
id(t), id(s)

(2338855027240, 2338855644184)

**TODO** profile loop extension of lists, as compared to preallocated list and just modification.

### A += Assignment Puzzler

In [18]:
t = (1, 2, [30, 40])
t[2] += [50, 60]

TypeError: 'tuple' object does not support item assignment

In [19]:
t

(1, 2, [30, 40, 50, 60])

In [20]:
import dis

In [21]:
dis.dis('s[a] += b')

  1           0 LOAD_NAME                0 (s)
              3 LOAD_NAME                1 (a)
              6 DUP_TOP_TWO
              7 BINARY_SUBSCR
              8 LOAD_NAME                2 (b)
             11 INPLACE_ADD
             12 ROT_THREE
             13 STORE_SUBSCR
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE


<br>
<hr>
**Super cool `dis.dis` disasemmbly into bytecode**
<br>
<hr>

### Sort and bisect
* `list.sort` modifies in-place and returns **None**. <font color=blue> VIP Python API convention: when modifying in place return None to make it clear to caller. </font>
* `sorted(iterable)` returns copy; works even for immutable
* once sorted use **bisect** module - efficient *binary search* algorithms
    - `bisect.bisect(haystack, needle)` returns index at which needle can be inserted into haystack and it remain sorted
        * can pass `lo` and `hi` to bisect to limit search range - defaults to `0` and `len()`
        * actually alias for `bisect_right`; ther is also `bisect_left`
    - `bisect.insort(haystack, needle)` actually does the insertion too. more efficient thatn doing it yourself.

In [22]:
from bisect import bisect

In [25]:
l = [1,2,3,4,5]
bisect(l,4)

4

In [28]:
l = [1,-2,12, 2,7,3,4,5]
bisect(l,4)

2

** Note doesn't throw if not sorted!!**

### When a list is not the answer
* Consider:
    - if storing alot of floats (only numbers) - use arrays instead of lists - array doesnt store full float, just packed bytes representing machine value (like array in C)
    - if adding or removing from ends alot, use deque
    - if checking for containment alot, use set
* `array.array`: 
    - useful methods for fast loading and saving
* built-in `memoryview`: shared memory sequence type - inspired by numpy
* `collections.deque`: thread safe double ended queue optimized for inserting and removing form both ends.
    - can `.append` and `.pop` (even `.pop(0)` for FIFO) on lists. But expensive to insert/remove at left end (entire list is shifted)
    - can be bounded (max length) so good buffer
    - can `.rotate` - push from 'top' into 'bottom' (or vice-versa)
    - hidden cost: removing items from middle is not fast
    - append and popleft are *atomic* so safe to use as LIFO queue in multithreaded without needs for locks


<font color=red> What is this Blaze tool about? </font>

[toc](#toc)