# 01 - Sequence Types

Sequence types have the general concept of a first element, a second element, and so on. Basically an ordering of the sequence items using the natural numbers. In Python (and many other languages) the starting index is set to `0`, not `1`.

So the first item has index `0`, the second item has index `1`, and so on.

Python has built-in mutable and immutable sequence types.

Strings, tuples, ranges and bytes are immutable - we can access but not modify the **content** of the **sequence**:

In [15]:
t = (1, 2, 3)

t[0] = 100

TypeError: 'tuple' object does not support item assignment

But of course, if the sequence contains mutable objects, then although we cannot modify the sequence of elements (cannot replace, delete or insert elements), we certainly **can** change the contents of the mutable objects:

In [17]:
t = ( [1, 2], 3, 4)

`t` is immutable, but its first element is a mutable object:

In [18]:
t[0][0] = 100

t

([100, 2], 3, 4)

Sequences can be **Homogeneous** or **heterogeneous** where elements in a homogeneous sequences are all of the same type. For example, in a string, all characters have the same type. Lists can be heterogeneous where different elements have different types. Generally homogeneous sequences are usually more efficient (storage-wise).

#### Iterables

An **iterable** is just something that can be iterated over, for example using a `for` loop. It is a **container** type of object which means it contains other objects, and it becomes an iterable if we can list out the elements one by one. 

But iterables are more general; they are **not necessarily** a sequence type. For example a set `s = {1, 2, 3}` is an iterable because we can type `for e in s` but it's not a sequence because we can't type `s[0]`.

So, we can iterate over both the tuple and the set. Iterating the tuple preserved the **order** of the elements in the tuple, but not for the set. Sets do not have an ordering of elements - they are iterable, but not sequences.

Most sequence types support the `in` and `not in` operations. Ranges do too, but not quite as efficiently as lists, tuples, strings, etc.

In [11]:
'a' in ['a', 'b', 100]

True

In [12]:
100 in range(200)

True

#### Min, Max and Length

Sequences also generally support the `len` method to obtain the number of items in the collection. Some iterables may also support that method.

In [13]:
len('python'), len([1, 2, 3]), len({10, 20, 30}), len({'a': 1, 'b': 2})

(6, 3, 3, 2)

Sequences (and even some iterables) may support `max` and `min` as long as the data types in the collection can be **ordered** in some sense (`<` or `>`).

In [14]:
a = [100, 300, 200]
min(a), max(a)

(100, 300)

In [15]:
s = 'python'
min(s), max(s)

('h', 'y')

In [16]:
s = {'p', 'y', 't', 'h', 'o', 'n'}
min(s), max(s)

('h', 'y')

But if the elements do not have an ordering defined:

In [17]:
a = [1+1j, 2+2j, 3+3j]
min(a)

TypeError: '<' not supported between instances of 'complex' and 'complex'

`min` and `max` will work for heterogeneous types as long as the elements are pairwise comparable (`<` or `>` is defined). 

For example:

In [18]:
from decimal import Decimal

In [19]:
t = 10, 20.5, Decimal('30.5')

In [20]:
min(t), max(t)

(10, Decimal('30.5'))

In [21]:
t = ['a', 10, 1000]
min(t)

TypeError: '<' not supported between instances of 'int' and 'str'

Even `range` objects support `min` and `max`:

In [22]:
r = range(10, 200)
min(r), max(r)

(10, 199)

#### Concatenation

We can **concatenate** sequences using the `+` operator:

In [25]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

In [26]:
(1, 2, 3) + (4, 5, 6)

(1, 2, 3, 4, 5, 6)

What happens if we want try concatenation on a mutable. Below `[0,0]` is a mutable.

In [27]:
x = [ [0, 0] ]

a = x + x
a

[[0, 0], [0, 0]]

It looks like it worked well, but let's look at the memory addresses of all three `[0, 0]`s mentioned.

In [28]:
id(x[0]) == id(a[0]) == id(a[1])

True

**All** of them have the exact same memory addresses. So when python did `x + x`, it literally took the object in `x` and duplicated it. They all point to the same object, and that object now has multiple pointers to it. So what happens if we modify this single object?

In [29]:
a[0][0] = 100

a

[[100, 0], [100, 0]]

Since the first and second objects are identical, they both are modified. Even `x` is modified.

In [30]:
x

[[100, 0]]

This doesn't happen with strings or integers because they are both immutable objects.

In [31]:
x = [1, 2]
a = x + x

y = 'python'
b = y + y

print(a)
print(b)

[1, 2, 1, 2]
pythonpython


Note that the type of the concatenated result is the same as the type of the sequences being concatenated, so concatenating sequences of varying types will not work:

In [25]:
(1, 2, 3) + [4, 5, 6]

TypeError: can only concatenate tuple (not "list") to tuple

In [26]:
'abc' + ['d', 'e', 'f']

TypeError: must be str, not list

Note: if you really want to concatenate varying types you'll have to transform them to a common type first:

In [27]:
(1, 2, 3) + tuple([4, 5, 6])

(1, 2, 3, 4, 5, 6)

In [28]:
tuple('abc') + ('d', 'e', 'f')

('a', 'b', 'c', 'd', 'e', 'f')

In [29]:
''.join(tuple('abc') + ('d', 'e', 'f'))

'abcdef'

#### Repetition

Most sequence types also support **repetition**, which is essentially concatenating the same sequence an integer number of times:

In [30]:
'abc' * 5

'abcabcabcabcabc'

In [31]:
[1, 2, 3] * 5

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

We'll come back to some caveats of concatenation and repetition in a bit.

We see the same issue of duplicate objects with repetition as we saw with concatenation

In [32]:
a = [ [0, 0] ] * 2 
print(a)
id(a[0]) == id(a[1])

[[0, 0], [0, 0]]


True

In [33]:
a[0][0] = 100 
print(a)

[[100, 0], [100, 0]]


But, again, we're safe if we are repeating an immutable object such as a string.

In [36]:
a = ['python'] * 2
print(a)

a[0][0] = 'P'
print(a)

['python', 'python']


TypeError: 'str' object does not support item assignment

So the key takeaway is to be careful using mutable elements inside your sequence.

If you really want these repeated objects to be different objects, you'll have to copy them somehow. A simple list comprehensions would work well here:

In [10]:
x = [ [0, 0] ]
m = [e.copy() for e in x*3]

In [11]:
m

[[0, 0], [0, 0], [0, 0]]

In [12]:
m[0][0] = 100

In [13]:
m

[[100, 0], [0, 0], [0, 0]]

The wrong way to do it would be: (this is pretty much the same as doing `x = [ [0, 0] ] * 3`.

In [4]:
x = [ [0, 0] ]
m = [e for e in x*3]
m

[[0, 0], [0, 0], [0, 0]]

In [5]:
m[0][0] = 100
m

[[100, 0], [100, 0], [100, 0]]

#### Finding things in Sequences

We can find the index of the occurrence of an element in a sequence.

We have 1 required argument and two optional arguments.

`s.index(x, i, j)`: This will return the index of the first occurrence of `x` in `s` at or after index `i` but before index `j`. These final two indices can be left out. It will also only return the first match.

In [32]:
s = "gnu's not unix"

In [33]:
s.index('n')

1

In [34]:
s.index('n', 1), s.index('n', 2), s.index('n', 8)

(1, 6, 11)

An exception is raised of the element is not found, so you'll want to catch it if you don't want your app to crash:

In [35]:
s.index('n', 13)

ValueError: substring not found

In [36]:
try:
    idx = s.index('n', 13)
except ValueError:
    print('not found')

not found


Note that these methods of finding objects in sequences do not assume that the objects in the sequence are ordered in any way. These are basically searches that iterate over the sequence until they find (or not) the requested element.

If you have a sorted sequence, then other search techniques are available - such as binary searches. I'll cover some of these topics in the extras section of this course.

#### Slicing

We'll come back to slicing in a later lecture, but sequence types generally support slicing, even ranges (as of Python 3.2). Just like concatenation, slices will return the same type as the sequence being sliced:

In [37]:
s = 'python'
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

It's ok to extend ranges past the bounds of the sequence:

In [40]:
s[4:1000]

'on'

We can even have extended slicing, which provides a start, stop and a step:

In [44]:
s, s[::2]

('python', 'pto')

Technically we can also use negative values in slices, including extended slices (more on that later):

In [45]:
s, s[-3:-1], s[::-1]

('python', 'ho', 'nohtyp')

If we try to unpack a full list of elements to another variable using the `[:]` notation, that new variable will take on a new memory address.

In [9]:
l = [1, 2, 3]
l2 = l[:]

l is l2

False

Range objects are more restrictive. They **don't** support concatenation or repetition and `min`, `max`, `in`, `not in` are not as efficient. Since ranges are sequences, we can slice them, returning back another range object.

In [46]:
r = range(11)  # numbers from 0 to 10 (inclusive)

In [47]:
print(r)
print(list(r))

range(0, 11)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [48]:
print(r[:5])

range(0, 5)


In [49]:
print(list(r[:5]))

[0, 1, 2, 3, 4]


As you can see, slicing a range returns a range object as well, as expected.

#### Hashing

Immutable sequences generally support a `hash` method that we'll discuss in detail in the section on mapping types. But immutable sequences are not hashable if they contain mutable types. There's good reasons why we don't want to hash mutable types which we'll see later on. 

In [50]:
l = (1, 2, 3)
hash(l)

2528502973977326415

In [51]:
s = '123'
hash(s)

-1892188276802162953

In [52]:
r = range(10)
hash(r)

-6299899980521991026

But mutable sequences (and mutable types in general) do not:

In [53]:
l = [1, 2, 3]

In [54]:
hash(l)

TypeError: unhashable type: 'list'

Note also that a hashable sequence, is no longer hashable if one (or more) of it's elements are not hashable:

In [55]:
t = (1, 2, [10, 20])
hash(t)

TypeError: unhashable type: 'list'

But this would work:

In [56]:
t = ('python', (1, 2, 3))
hash(t)

-8790163410081325536

In general, immutable types are likely hashable, while mutable types are not. So numbers, strings, tuples, etc are hashable, but lists and sets are not:

In [57]:
from decimal import Decimal
d = Decimal(10.5)
hash(d)

1152921504606846986

Sets are not hashable:

In [58]:
s = {1, 2, 3}
hash(s)

TypeError: unhashable type: 'set'

But frozensets, an immutable variant of the set, are:

In [59]:
s = frozenset({1, 2, 3})

In [60]:
hash(s)

-7699079583225461316

# 02 - Mutable Sequence Types

#### Lecture

If we create a list and then concatenate to it via `+`, we are **not** mutating the original object. Instead, a new object is created and the elements from the old object are added along with the concatenated values.

So, in the example below, the `names` label just moves over to the new object.

**Concatenation does not mutate objects; it just creates new ones.**

![2.1.png](s2-images/2.1.png)

So how do we mutate an object?

We need to use the correct method, such as `append`, or using slicing `[]`.

In [22]:
names = ['Eric', 'John']
names.append('Michael')

This will add 'Michael' to our `names` object without creating a new object in memory.

In [23]:
print(names)
names[2] = 'Michaelson' 
print(names)

['Eric', 'John', 'Michael']
['Eric', 'John', 'Michaelson']


Or..

In [24]:
print(names)
more_names = ['Mark', 'Harry', 'Brian']
names[1:3] = more_names
print(names)

['Eric', 'John', 'Michaelson']
['Eric', 'Mark', 'Harry', 'Brian']


Take a look at the following:

In [56]:
new_names = ['a', 'b', 'c', 'd', 'e']
print(f'{new_names=}')

more_new = ['A', 'B', 'C', 'D', 'E']
new_names[1:3] = more_new

print(f'{new_names=}')

new_names=['a', 'b', 'c', 'd', 'e']
new_names=['a', 'A', 'B', 'C', 'D', 'E', 'd', 'e']


Notice that in the first case 'Michaelson' was overwritten, but in the second case, the original letters 'd' and 'e' weren't. 

Deleting using `del s[i]`,`del s[i:j]` or `s.clear()` (removes all items from `s`) will also mutate the object.

Here are some other methods that mutate an object:

- `del s[i:j]`: removes an entire slice from `s`.
- `s.clear()`: removes all items from `s`.
- `s.append(x)`: appends `x` to the **end** of `s`.
- `s.insert(i, x)`: inserts one element `x` at the index `i`.
- `s.extend(iterable)`: appends an entire iterable (list of elements) to `s`.
- `s.pop(i)`: removes **and** returns element at index `i`. If no argument is specified, then the last element gets popped.
- `s.remove(x)`: removes the first occurrence of a specific object `x` in `s`.
- `s.reverse()`: reverses all elements of `s` **in-place** so no more memory is needed.
- `s.copy()`: copies all elements of `s` into a **new** sequence (with new memory address) and then returns it. This is a **shallow copy**.

and more..

All these methods are supported by the 'list' sequence type but not all sequences will necessarily support them. When we make our own sequences later, we can choose.

#### Replacing Elements

Suppose you have the following setup:

In [6]:
suits = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
alias = suits
suits = []
print(suits, alias)

[] ['Spades', 'Hearts', 'Diamonds', 'Clubs']


This is because `suits = []` created a new empty list object and pointed `suits` towards it.

But using clear:

In [7]:
suits = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
alias = suits
suits.clear()
print(suits, alias)

[] []


Big difference! This was because these methods **mutate the object**, not the label to the object.

#### Extending a Sequence

If we want to add more than one element at a time, we can extend a sequence with the contents of any iterable (not just sequences):

In [11]:
l = [1, 2, 3, 4, 5]
print(id(l))
l.extend({'a', 'b', 'c'})
print(id(l), l)

1979932844488
1979932844488 [1, 2, 3, 4, 5, 'c', 'b', 'a']


Of course, since we extended using a set, there was not gurantee of positional ordering.

If we extend with another sequence, then positional ordering is retained:

In [12]:
l = [1, 2, 3]
l.extend(('a', 'b', 'c'))
print(l)

[1, 2, 3, 'a', 'b', 'c']


#### Reversing a Sequence

As mentioned above, We can do in-place reversal:

In [16]:
l = [1, 2, 3, 4]
print(id(l))
l.reverse()
print(id(l), l)

1979930587080
1979930587080 [4, 3, 2, 1]


We can also reverse a sequence using extended slicing (we'll come back to this later):

In [17]:
l = [1, 2, 3, 4]
l[::-1]

[4, 3, 2, 1]

But this is **NOT** mutating the sequence - the slice is returning a **new** sequence - that happens to be reversed.

In [18]:
l = [1, 2, 3, 4]
print(id(l))
l = l[::-1]
print(id(l), l)

1979932143176
1979932696968 [4, 3, 2, 1]


Remember, **slices always return new objects**.

#### Copying Sequences

We'll take a quick look at shallow and deep copies now and go into more detail after. Consider the following list and its relevant IDs:

In [62]:
l = [ ['a', 'b'], 'c', 'd']

f'{id(l)=}', f'{id(l[0])=}', f'{id(l[1])=}'

('id(l)=140561331827648',
 'id(l[0])=140561332000320',
 'id(l[1])=140561467903408')

Let's copy it with `l.copy()` which is like `l[:]` as they both create a new object:

In [65]:
l2 = l.copy()

f'{id(l2)=}', f'{id(l2[0])=}', f'{id(l2[1])=}'

('id(l2)=140561331975424',
 'id(l2[0])=140561332000320',
 'id(l2[1])=140561467903408')

So `l` and `l2` have different IDs but their elements contain the same, duplicated objects as `l`. 

We have that mutability issue again. The first element `[a, b]` is a mutable object so if we change it in `l`, it'll apply to `l2` too. 

In [71]:
l[0].append('x')

print(f'{l}')
print(f'{l2}')

[['a', 'b', 'x', 'x', 'x'], 'c', 'd']
[['a', 'b', 'x', 'x', 'x'], 'c', 'd']


This occurred because we did a **shallow copy** with `l.copy()`. To get around this, we've got to do a **deep copy**.

Note that we don't need to worry about the elements `'c'` or `'d'` because they're immutable so we can't modify them anyways.

# 03 - Lists vs Tuples

#### Lecture

Generally, tuples are more efficient that lists, so, unless you need mutability of the container, prefer using a tuple over a list.

#### Creating Tuples

We saw some of this already in the first section of this course when we looked at some of the optimizations Python implements, but let's revisit it in this context.

Here is Wikipedia's definition of constant folding:

`
Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime.
`

To see how this works, we are going to use the `dis` module which allows to see the disassembled Python bytecode - not for the faint of heart, but can be really useful!

In [6]:
from dis import dis

We want to understand what Python does when it compiles statements such as:

In [7]:
(1, 2, 3)
[1, 2, 3]

[1, 2, 3]

In [8]:
dis(compile('(1,2,3, "a")', 'string', 'eval'))

  1           0 LOAD_CONST               0 ((1, 2, 3, 'a'))
              2 RETURN_VALUE


In [9]:
dis(compile('[1,2,3, "a"]', 'string', 'eval'))

  1           0 BUILD_LIST               0
              2 LOAD_CONST               0 ((1, 2, 3, 'a'))
              4 LIST_EXTEND              1
              6 RETURN_VALUE


Notice how for a tuple containing constants (such as ints and strings in this case), the values are loaded in one step, a single constant value essentially. 

Lists, on the other hand are built-up one element at a time. So, that's one reason why tuples can "load" faster than a list.

But even if we have a single mutable object within our immutable tuple then it has to load the entire thing similar to the mutable list approach.

In [10]:
dis(compile('(1,2,3, [10, 20])', 'string', 'eval'))

  1           0 LOAD_CONST               0 (1)
              2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (3)
              6 LOAD_CONST               3 (10)
              8 LOAD_CONST               4 (20)
             10 BUILD_LIST               2
             12 BUILD_TUPLE              4
             14 RETURN_VALUE


So merely replacing our immutable `"a"` with `[10, 20]` forced us to load everything individually. The same is true for any other non-constant object in our tuple, such as a function:

In [11]:
def fn1():
    pass

In [12]:
dis(compile('(fn1, 10, 20)', 'string', 'eval'))

  1           0 LOAD_NAME                0 (fn1)
              2 LOAD_CONST               0 (10)
              4 LOAD_CONST               1 (20)
              6 BUILD_TUPLE              3
              8 RETURN_VALUE


We can easily time this penalty:

In [13]:
from timeit import timeit

In [14]:
timeit("(1,2,3,4,5,6,7,8,9)", number=10_000_000)

0.5144668039999942

In [15]:
timeit("[1,2,3,4,5,6,7,8,9]", number=10_000_000)

1.0272097249999206

As you can see creating a tuple was faster.

#### Copying Lists and Tuples

Let's look at creating a copy of both a list and a tuple:

In [16]:
l1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
t1 = (1, 2, 3, 4, 5, 6, 7, 8, 9)

In [17]:
id(l1), id(t1)

(140584363710848, 140584362927136)

In [18]:
l2 = list(l1)
t2 = tuple(t1)

In [19]:
l1 is l2, t1 is t2

(False, True)

Notice how the `l1` and `l2` are **not** the same objects, whereas as `t1` and `t2` are!

So for lists, the elements had to be copied (shallow copy, more on this later), but for tuples it did not. This is an optimisation that Python does; it doesn't make sense to make a shallow copy of a tuple because we can't mutate it anyways. So, for the tuples in this scenario, Python only needs one memory address with 2 pointers (variables) to it. 

With lists we had that concern when we modified an iterable using one variable and then were surprised to see the modification using another variable. This occurred because those two variables were pointers to the same mutable object. So, in general we want two separate objects in memory.

But for tuples this concern vanishes because we cannot modify the tuple in the first place, so it's more efficient just to create one object in memory.

Note that this is the case even if the tuple contains non constant elements:

In [22]:
t1 = ([1,2], fn1, 3)
t2 = tuple(t1)
t1 is t2

True

#### Storage Efficiency

When mutable container objects such as lists, sets, dictionaries, etc are  created, and during their lifetime, the allocated capacity of these containers (the number of items they can contain) is greater than the number of elements in the container. This is done to make adding elements to the collection more efficient, and is called over-allocating.

Immutable containers on the other hand, since their item count is fixed once they have been created, do not need this overallocation - so their storage efficiency is greater.

Let's look at the size (memory) of lists and tuples as they get larger:

In [21]:
import sys

In [24]:
prev = 0
for i in range(10):
    c = tuple(range(i+1))
    size_c = sys.getsizeof(c)
    delta, prev = size_c - prev, size_c
    print(f'{i+1} items: {size_c}, delta={delta}')

1 items: 56, delta=56
2 items: 64, delta=8
3 items: 72, delta=8
4 items: 80, delta=8
5 items: 88, delta=8
6 items: 96, delta=8
7 items: 104, delta=8
8 items: 112, delta=8
9 items: 120, delta=8
10 items: 128, delta=8


In [25]:
prev = 0
for i in range(10):
    c = list(range(i+1))
    size_c = sys.getsizeof(c)
    delta, prev = size_c - prev, size_c
    print(f'{i+1} items: {size_c}, delta={delta}')

1 items: 96, delta=96
2 items: 104, delta=8
3 items: 112, delta=8
4 items: 120, delta=8
5 items: 128, delta=8
6 items: 136, delta=8
7 items: 144, delta=8
8 items: 160, delta=16
9 items: 192, delta=32
10 items: 200, delta=8


As you can see the size delta for tuples as they get larger, remains a constant 8 bytes (the pointer to the new element added), but not so for lists which will over-allocate space (this is done to achieve better performance when appending elements to a list).

Let's see what happens to the same list when we keep appending elements to it:

In [23]:
c = []
prev = sys.getsizeof(c)
print(f'0 items: {sys.getsizeof(c)}')
for i in range(16):
    c.append(i)
    size_c = sys.getsizeof(c)
    delta, prev = size_c - prev, size_c
    print(f'{i+1} items: {size_c}, delta={delta}')

0 items: 56
1 items: 88, delta=32
2 items: 88, delta=0
3 items: 88, delta=0
4 items: 88, delta=0
5 items: 120, delta=32
6 items: 120, delta=0
7 items: 120, delta=0
8 items: 120, delta=0
9 items: 184, delta=64
10 items: 184, delta=0
11 items: 184, delta=0
12 items: 184, delta=0
13 items: 184, delta=0
14 items: 184, delta=0
15 items: 184, delta=0
16 items: 184, delta=0


So each delta tells us how much we're over-allocating. For 0 items we use 56 bytes, all of which are in use. For 1 item we use 88 bytes but only (56+8=) 64 out of 88 bytes are in use. Tge remaining 24 bytes will not be used until we add 3 more elements.

The bigger the list, the more over-allocation we perform which we recognise by the increasing deltas above.

# 04 - Copying Sequences

Firstly, remember mutable objects can be modified either directly or through a function. As a result, these modifications may be inadvertent. 

If we have a function called `reverse` that takes an iterable and reverses it, we need to make it clear to the caller that an in-place modification has taken place. How do we do that? **We should not return the object we modified if we are performing an in-place modification/mutation**.

In [27]:
# BAD, DON'T DO THIS

def reverse(s):
    s.reverse()
    return s

# BETTER BUT BEST NOT TO DO IN-PLACE IN THE FIRST PLACE

def reverse(s):
    s.reverse()
    
# BEST 

def reverse(s):
    # s2 = <some copy>  any copy approach like s[::-1]
    s2.reverse()
    return s2

Now for copying, there are five methods, each of which will create a new memory address for the list, **but the memory addresses of the elements will point to the  memory address of the original list's elements.

**These five are all shallow copy methods. They copy the object references from one sequence to another.**

In [29]:
s = [10, 20, 30]

**Simple loop**

In [30]:
cp = []
for e in s:
    cp.append(e)

**List comprehension: (identical to the above)**

In [31]:
cp = [e for e in s]

**Copy method: (only implemented for mutable types like lists, NOT immutables like tuples or strings**

In [32]:
cp = s.copy()

**Slicing:**

In [33]:
cp = s[:]

**list:**

In [34]:
cp = list(s)

Just a reminder that `tuple_2 = tuple(tuple_1)` and `tuple_2 = tuple_1[:]` **do not create a new tuple with a new memory address - we just add another pointer to the same tuple**. 

So, `id(tuple_1) = id(tuple_2)`.

For all of these methods, if we mutate the **list** in the copy, it has no impact on the original. But, if we mutate the **element**, then it will have an impact on the original. This is only possible if the element is a mutable such as a list.

In [35]:
s = [ [10, 20], [30, 40] ]
cp = s.copy()

cp[0] = 'python' # THIS IS A MUTATION OF THE LIST/CONTAINER, NOT OF THE ELEMENT, THEREFORE ORIGINAL UNIMPACTED.

print(s)
print(cp)

[[10, 20], [30, 40]]
['python', [30, 40]]


In [36]:
s = [ [10, 20], [30, 40] ]
cp = s.copy()

cp[1][0] = 100 # THIS IS A MUTATION OF THE ELEMENT, THEREFORE ORIGINAL IMPACTED

print(s)
print(cp)

[[10, 20], [100, 40]]
[[10, 20], [100, 40]]


This effect occurred because we copied the **object reference**, not the object itself. 

![2.2.png](s2-images/2.2.png)

Below, if we did either `s[0][0] = 100` or `cp[0][0] = 100`, we would be modifying the objects. We do nothing to `s` and `cp` themselves.

![2.3.png](s2-images/2.3.png)

**Partial Deep copies**

This gives us a way of copying the objects as opposed to the object references.

How do we understand a deep copy approach like:

In [None]:
s = [ [0, 0], [0, 0] ]
     
cp = [e.copy() for e in s]

When we take each element `e` in `s`, we don't just put it into `cp` - this would copy the object reference. Instead we make a copy of the object and put that into `cp`.

![2.4.png](s2-images/2.4.png)

Why is it a partial deep copy instead of a deep copy? Because `e.copy()` is performing a shallow copy. 

![2.5.png](s2-images/2.5.png)

In the above, if we took our `cp = [e.copy() for e in s]` approach, we would perform a shallow copy on the two objects indicated by the yellow line (not orange or white). In doing so, we return to the original problem. So, we want to copy the elements indicated by the white lines too. 

This is why we need a recursive approach. But, in this case we don't need to go any further with recursion because the elements in these objects (`[0, 1]` for example) are immutable (the `0` and `1` are immutable).

To break down the steps in the above image, we first start with `cp = [e.copy() for e in s]`.

Then, `e = [ [0, 1], [2, 3] ]` initially. 

Then, `e.copy()` will take the container denoted by `e` and create an identical empty structure. Then, the **pointers** to each of `e`'s elements, first `[0, 1]` and then `[2, 3]` will be copied. These elements and their memory addresses were created when we first initialised `s`. So, when we wish to modify `[0, 1]` using `cp`, we will be using a pointer to get to the original element, as opposed to a pointer to a copied version of `[0, 1]`.

In [46]:
s = [ [ [0, 1], [2, 3] ], [ [4, 5], [6, 7] ] ]

cp = [e.copy() for e in s]

cp

[[[0, 1], [2, 3]], [[4, 5], [6, 7]]]

In [47]:
print(cp[0])
print(cp[0][0])
print(cp[0][0][0])

cp[0][0][0] = 100

[[0, 1], [2, 3]]
[0, 1]
0


In [48]:
print(f'{s=}')
print(f'{cp=}')

s=[[[100, 1], [2, 3]], [[4, 5], [6, 7]]]
cp=[[[100, 1], [2, 3]], [[4, 5], [6, 7]]]


Again, compare that with what we did before (see below). As we see at the end, `cp` was modified but `s` was untouched.

In [49]:
s = [ [0, 1], [2, 3] ]

cp = [e.copy() for e in s]

cp

[[0, 1], [2, 3]]

In [50]:
print(cp[0])
print(cp[0][0])

cp[0][0] = 100

[0, 1]
0


In [51]:
print(f'{s=}')
print(f'{cp=}')

s=[[0, 1], [2, 3]]
cp=[[100, 1], [2, 3]]


I think an important takeaway is that `.copy()` on an iterable creates an identical container whose elements point towards the memory address of the original container's elements. We don't look any further.

Deep copies can get messy very quickly when handling nested objects and circular references. Consider the infinite recursion that will occur in the example below.

In [66]:
a = [10, 20]
b = [a, 30]
a.append(b)
a[2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2][0][2] # this can go on forever

[[10, 20, [...]], 30]

The standard library **copy** has generic copy and deepcopy operations. We also have the dunder classes `__copy__` and `__deepcopy__` to override how copying works ourselves.

Here's an example with classes. It works as we expect.

![2.6.png](s2-images/2.6.png)

Here's a more complicated example:

![2.7.png](s2-images/2.7.png)

Do note that `y = MyClass(x)` means that `y.a` is a mutable object and that mutable object is the object `x`. (Classes are mutable objects.)

The important thing to note is that Python intelligently understands that since there was a relationship between `y.a` and `x`, there should also be the same relationship between `cp_y.a` and `cp_x`.

Shallow and deep copies are not just for sequences. They're applicable to objects in general because objects can nest other objects. We demonstrated this with the classes above.

# 05 - Slicing

#### Lecture

In the conventional notation `my_list[i:j]`, the slice definition is `i:j` and this is an object - in particular, a **slice** object. 

We can create our own identical object using the constructor: `s = slice(i, j)`. This has properties, `s.start` and `s.stop`. Our slice object can be used in substitution of our traditional literal.

In [4]:
l = [1, 2, 3, 4, 5]

s = slice(0, 2)

print(s.start)
print(s.stop)

0
2


In [5]:
l[s]

[1, 2]

It's important to recognise that in `l[3:100]` for example, the slice object `3:100` is independent of the list object `l`. We take the slice and apply it to the list. This is why we can specify slice boundaries that are larger than the list that they're being applied to.

When it comes to slicing, Python follows a set of rules. For a given list `l = ['a', 'b', 'c', 'd', 'e', 'f']` with slice indices `[i:j]`:

- if `i > len(seq)` -> `len(seq)`
- if `j > len(seq)` -> `len(seq)`

- if `i < 0` -> `max(0, len(seq) + i)`
- if `j < 0` -> `max(0, len(seq) + j)`

For our example list above, if we wanted to determine `[-10, 3]`, since `i < 0`, we compute `max(0, len(seq) + i) = max(0, 6+-10) = max(0, -4) = 0` for our first index. For our second index, `j` is within range so we take its value. 

Thus, `[-10: 3] -> range(0, 3)`.

For another example, if we wanted to determine `[-5, 3]`, since `i < 0`, we compute `max(0, len(seq) + i) = max(0, 6+-5) = max(0, 1) = 1` for our first index. For our second index, `j` is within range so we take its value. 

Thus, `[-5: 3] -> range(1, 3)`. 

This makes sense because the -5th index of a sequence with 6 objects is equivalent to the 1st index. 


The rules for transformations `[i:j:k]` are pretty intuitive, but you just have to be careful when `k < 0`. 

If you ever get confused, the **slice** object has a method called **indices** that returns the equivalent range with its start/stop/step for any slice (but as a tuple), **given the length** of the sequence being sliced.

`slice(start, stop, step).indices(length)` -> `(start, stop, step)`.

In [6]:
slice(10, -5, -1).indices(6)

(5, 1, -1)

This is a tuple object. We can convert it into a range object by unpacking:

In [8]:
range(*slice(10, -5, -1).indices(6))

range(5, 1, -1)

We can see the actual values in this range by converting it into a list:

In [11]:
list(range(*slice(10, -5, -1).indices(6)))

[5, 4, 3, 2]

#### Coding

A quick example where this may be useful is if you need to keep repeating the slice object. 

Here's the literal approach:

In [9]:
data = []

for row in data:
    first_name = row[0:51]
    last_name = row[51:100]
    ssn = row[101:111]

And here's the constructor approach. You could keep these slice definitions at the top of your file so it's easy to find and modify.

In [10]:
range_first_name = slice(0, 51)
range_last_name = slice(51, 101)
range_ssn = slice(101, 111)

for row in data:
    first_name = row[range_first_name]
    last_name = row[range_last_name]
    ssn = row[range_ssn]

# 06 - Custom Sequences - Part 1

#### Lecture

We're going to be creating our own sequence types. For part 1, we'll focus on immutable sequence types only. At it's most basic, an immutable sequence type should support two things:

- returning the **length** of the sequence,
- given an **index**, return the element at that index.

If an object provides this functionality, then in theory we should be able to:

- retrieve elements **by index** using square brackets `[]`,
- **iterate** through the elements using Python's native looping mechanisms such as `for` loops, list comprehensions, etc.

**How does Python do it?** 

The sequence types must at the minimum implement:

- `__len__`: We'll find out that this is not necessary but still recommended
- `__getitem__`: This is the main one for taking a single integer argument - the index. However, it may also choose to handle a **slice** type argument. We decide.


#### The `__getitem__` method

**This method should return an element of the sequence based on the specified index or raise an `IndexError` exception if the index is out of bounds.**

If we implement these **two** things then Python can actually utilise our `__getitem__` method to iterate through our sequence, e.g. by using the `for` loop.

(We may choose to support negative indices and slices but this is optional.)

Python's `list` already supports `__getitem__` as an equivalent approach to the `[]`. Negative indices are valid so long as they are not out of bounds.

In [18]:
my_list = ['a', 'b', 'c', 'd', 'e', 'f']

print(my_list.__getitem__(0))
print(my_list.__getitem__(slice(0, 6, 2)))
print(my_list.__getitem__(-1))

a
['a', 'c', 'e']
f


This include the `IndexError`:

In [16]:
print(my_list.__getitem__(100))

IndexError: list index out of range

**How would we implement a `for` loop ourselves?**

In [19]:
my_list = [1, 2, 3, 4, 5]

for item in my_list:
    print(item**2)

1
4
9
16
25


In [21]:
my_list = [1, 2, 3, 4, 5]
index = 0

while True:
    try:
        item = my_list.__getitem__(index)
    except IndexError:
        break
    
    print(item**2)
    index += 1

1
4
9
16
25


The approach above of using `while True` paired with `except` is more Pythonic than something like `while index < len(our_sequence)`.

When we implement `__getitem__` in our class, the `for` loop (or list comprehension) notation will function, where the `x` in `for x in sequence` will be passed to `__getitem__`. Again, it  will keep iterating until it hits an `IndexError`

In [31]:
class Silly:
    def __init__(self, n):
        self.n = n
        
    def __getitem__(self, value): 
        print(f'{value=}')
        
        if value < 0 or value >= self.n:
            raise IndexError
        
        else:
            return 'This is a silly element'

In [32]:
silly = Silly(3)

for e in silly:
    print(e)

value=0
This is a silly element
value=1
This is a silly element
value=2
This is a silly element
value=3


The small difference we notice with the list comprehension approach below is that the `__getitem__` is being called multiple times as the list is being built up, and then finally, the list object is returned. But of course, we won't normally have a `print` statement within the `__getitem__` method.

In [34]:
[e for e in silly]

value=0
value=1
value=2
value=3


['This is a silly element',
 'This is a silly element',
 'This is a silly element']

**This object `silly` *is* a sequence**

We can also use the slice notation such as `silly[0:5:2]` and Python will use the `__getitem__` method where the value is a `slice` object with the specified parameters. 

It won't work with our current setup because the line `if value < 0 or value >= self.n:` contains comparison operators which doesn't support slice objects of course. 

The appropriate approach is to check the instance/type of `value` and deal with it differently depending on whether it's a **slice** object or an integer. We'll demonstrate this below.

Now let's work through a more practical example.

**Fibonacci Sequence Example**

Here, we're going to implement our own Fibonacci sequence type and we want to support slicing.

We'll make our sequence type bounded (i.e. we'll have to specify the size of the sequence). But we are not going to pre-generate the entire sequence of Fibonacci numbers, we'll only generate the ones that are being requested as needed.

So, when we create our object, we'll have to specify the maximum Fibonacci number that will be possible for this sequence (but we won't generate that value or any other Fibonacci value during the creation of the object). See below.

```
f = Fib(10)    # Creating our Fibonacci sequence that can generate up to 10 values.
f[3:5]         # It can handle slices. In this case, return a list of the element 3 and 4 of the Fibonacci sequence (i.e. 4th and 5th).
f[::-1]        # Return a list of all Fibonacci values in reverse order. In this case, the 10th value and below. 
```

Here are the steps:


1. First we'll need to handle requesting a single index e.g. `f[3]` for the 4th Fibonacci value. To do this, we just need `__getitem__` which will allow the `[]` notation. 


2. To actually calculate the nth Fibonacci number, we'll use memoisation as well (see lecture on decorators and memoisation if you need to refresh your memory on that).

3. To be able to use `list(f)`, we need to raise an `IndexError` exception when the index is out of bounds as this is one of the two essential criteria for iterating through a sequence. 

4. We also need to remap negative indices (for example `-1` should correspond to the last element of the sequence, and so on). This is done with:     

```
if s < 0:
    s = self._n + s
```

5. Now let's handle slice requests. If you recall from Section 05 - Slicing, you might think it to be difficult to implement all those rules. But remember, the slice method `indices(<length of sequence>)` returns a tuple of 3 values which correspond to the equivalent start, stop and step values. These can be converted into a range object which we can iterate through, e.g. using a list comprehension or a 'for' loop, computing the corresponding Fibonacci value at each step. We can go further by generating the full list to show us which indices are going to be asked for.

   For example:
   
   ```
   my_slice = slice(2, -7, -2)          # When applied to a sequence s, it'd be equivalent to s[2:-7:-2]. Hard to see which indices we are requiring.

   list(range(*my_slice.indices(6)))    # If we pass in the length of our sequence (6) then it outputs that we're asking for...
   
   > [2, 0]                             # ...index 2 and index 0. So find _fib(2) and then _fib(0), but we've not implemented this extra detail below.
   ```
   
   We're done.

In [27]:
from functools import lru_cache

class Fib:
    def __init__(self, n):
        self._n = n
    
    def __getitem__(self, s):
        if isinstance(s, int):
            # single item requested
            if s < 0:
                s = self._n + s
            if s < 0 or s > self._n - 1:
                raise IndexError
            return self._fib(s)                                 # this is basically our 'else' statement.
            
        else:
            # slice being requested
            idx = s.indices(self._n)                            # the argument required for s.indices is the length of the sequence which is self._n
            rng = range(*idx)                                   # This will create a range sequence object e.g. range(0, 10, 2), corresponding to the indices
            return [self._fib(n) for n in rng]                  # in the Fibonacci sequence that we want.
            
    @staticmethod
    @lru_cache(2**32)
    def _fib(n):
        if n < 2:
            return 1
        else:
            return Fib._fib(n-1) + Fib._fib(n-2)

In [28]:
f = Fib(10)

print(f[0])
print(f[9])
print(list(f))
print(list(item for item in f))
print(f[2:9:2])

1
55
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
[2, 5, 13, 34]


------------------
**Sidenote:**

A static method is bound to a class rather than the objects for that class. This means that a static method can be called without an object for that class. This also means that static methods cannot modify the state of an object as they are not bound to it.

Static methods have a very clear use-case. When we need some functionality not w.r.t an Object but w.r.t the complete class, we make a method static. This is pretty much advantageous when we need to create Utility methods as they aren’t tied to an object lifecycle usually. Finally, note that in a static method, we don’t need the self to be passed as the first argument.

------------------------

------------------------
**Sidenote:**

You may be wondering why the last line is `Fib._fib(n-1)` as opposed to `self._fib(n-1)` that we see in the `__getitem__` method. We could do it that way but things will start to look convoluted very quickly. For starters `def _fib(n)` will need to become `def _fib(self, n)`. Then, in `__getitem__`, `return self._fib(s)` will need to become `return self._fib(self, s)`. The code below is the version with these changes made.


```    
def __getitem__(self, s):
    if isinstance(s, int):
        # single item requested
        if s < 0:
            s = self._n + s
        if s < 0 or s > self._n - 1:
            raise IndexError
                
        return self._fib(self, s)                                 # this is basically our 'else' statement.
    else:
        # slice being requested
        print(f'requesting [{s.start}:{s.stop}:{s.step}]')
        idx = s.indices(self._n)                            # the argument required for s.indices is the length of the sequence which is self._n
        rng = range(*idx)
        print(f'\trange({idx[0]}, {idx[1]}, {idx[2]}) --> {list(rng)}')  # \t is tab special character
            
@staticmethod
@lru_cache(2**32)
def _fib(self, n):
    if n < 2:
        return 1
    else:
        return self._fib(self, n-1) + self._fib(self, n-2)
```
------------------------

One thing I want to point out here: we did not need to use inheritance! There was no need to inherit from another sequence type. All we really needed was to implement the `__getitem__` and `__len__` methods.

The other thing I want to mention, is that I would not use recursion for production purposes for a Fibonacci sequence, even with memoization - partly because of the cost of recursion and the limit to the recursion depth that is possible.

Also, when we look at generators, and more particularly generator expressions, we'll see better ways of doing this as well.

I really wanted to show you a simple example of how to create your own sequence types.

# 07 - In-Place Concatenation and Repetition

#### In-Place Concatenation

We saw that using concatenation ended up creating a new sequence object:

In [36]:
l1 = [1, 2, 3, 4]
l2 = [5, 6]
print(f'{id(l1)=} (before concatenation)', l1)
print(f'{id(l2)=}', l2)

l1  = l1 + l2 
print(f'{id(l1)=} (after concatenation)', l1)

id(l1)=140117538850368 (before concatenation) [1, 2, 3, 4]
id(l2)=140117624613760 [5, 6]
id(l1)=140117538861760 (after concatenation) [1, 2, 3, 4, 5, 6]


But watch what happens when we use the in-place concatenation operator `+=:

In [37]:
l1 = [1, 2, 3, 4]
l2 = [5, 6]
print(f'{id(l1)=} (before concatenation)', l1)
print(f'{id(l2)=}', l2)

l1 += l2 
print(f'{id(l1)=} (after concatenation)', l1)

id(l1)=140117538807616 (before concatenation) [1, 2, 3, 4]
id(l2)=140117539448192 [5, 6]
id(l1)=140117538807616 (after concatenation) [1, 2, 3, 4, 5, 6]


**When using `+=` on an mutable object (that supports this operator), the object is mutated.**

If we used a tuple instead of a list above, we have no other option but to create a new sequence object and store the concatenated data because tuples are not mutable.

Another point to note is that normally we cannot concatenate a list and a tuple.

In [45]:
l1 = [1, 2, 3, 4]
t1 = (5, 6)

l1 = l1 + t1
print(l1)

TypeError: can only concatenate list (not "tuple") to list

But we CAN do so **if we use the concatenation operator**. But this only works if we concatenate the mutable list with a tuple - we cannot concatenate the immutable tuple with a list, because the tuple is immutable.

In [49]:
l1 = [1, 2, 3, 4]
t1 = (5, 6)

l1 += t1
print(l1)

[1, 2, 3, 4, 5, 6]


#### In-Place Repetition

**The same is true for the `*=` operator.**

In [41]:
l1 = [1, 2, 3, 4]
l2 = [5, 6]
print(f'{id(l1)=} (before concatenation)', l1)
print(f'{id(l2)=}', l2)

l1 *= 2
print(f'{id(l1)=} (after concatenation)', l1)

id(l1)=140117539326464 (before concatenation) [1, 2, 3, 4]
id(l2)=140117539440000 [5, 6]
id(l1)=140117539326464 (after concatenation) [1, 2, 3, 4, 1, 2, 3, 4]


So the key takeaway from this section is that `a = a + b` is NOT equivalent to `a += b` if `a` is a mutable object; in the former, a new object is created and `a` points to it, and in the latter, `a` is modified as is.

# 08 - Assignments in Mutable Sequences

#### Replacing

Up until now we've used slicing to read elements from a sequence. But, as you know we can also replace the elements in a slice using assignment. Those elements need to be retrieved from any iterable e.g. tuple, set, etc.

For regular slices (step size = 1, i.e. non-extended), the slice and iterable **need *not* be the same length**.

This operation **performs a mutation**.

In [52]:
l = [1, 2, 3, 4, 5]
l[1:2] = (10, 20, 30)
l

[1, 10, 20, 30, 3, 4, 5]

For extended slices, the length of the slice and the length of the iterable we are setting on the RHS must have the **same length**:

In [54]:
l = [1, 2, 3, 4, 5]

In [57]:
l[::2]

[1, 3, 5]

In [58]:
l[::2] = ['a', 'b']

ValueError: attempt to assign sequence of size 2 to extended slice of size 3

In the last example, we are telling Python to replace the elements 1, 3 and 5. But we are only assigning two elements on the RHS. How will Python know where those two elements should go? Perhaps the 'a' should replace 1 and 'b' replace 3 and leave 5 as it is? That would be confusing so Python requires the same length on each side.

In [56]:
l = [1, 2, 3, 4, 5]
l[::2] = ['a', 'b', 'c']
l

['a', 2, 'b', 4, 'c']

#### Deleting

Deleting is a special case of replacement. We just replace the element with an **empty** iterable. This only works for standard slicing - not extended slicing because that requires the iterable to have the same length. The problem with that is we can't assign, say, 3 empty iterables to replace out 3 non-contiguous elements (i.e. non-sequential elements).

In [60]:
l = [1, 2, 3, 4, 5]
l[1:3]

[2, 3]

In [61]:
l[1:3] = []
l

[1, 4, 5]

The empty iterable we've assigned on the RHS is `[]` but we could've equivalently done `''` or `()` or `{}` if we wanted to.

In [69]:
l2 = [1, 2, 3, 4, 5]
print(l2[::2])

l2[::2] = []

[1, 3, 5]


ValueError: attempt to assign sequence of size 0 to extended slice of size 3

#### Insertion

The trick for inserting elements is that the slice must be empty and the RHS must contain an iterable. If the slice is not empty, then it will replace those elements as opposed to insert next to it, so it will no longer be considered insertion.

Here's how we insert a string iterable `abc` at index 1.

In [70]:
l = [1, 2, 3, 4, 5]
print(l[1:1])
l[1:1] = 'abc'
l

[]


[1, 'a', 'b', 'c', 2, 3, 4, 5]

Again, this won't work for extended slicing because insertion requires a single empty slice at a given location which is replaced by our iterable, whereas extended slicing will always give us a non-empty slice (unless we write something like `l[3:3:1]` but that's just a technicality - it's identical to `l[3:3]`).

So a key takeaway for this section is that we can use regular slicing for replacing, deleting and inserting, and generally, the LHS and RHS **need not be the same length**. 

If we want to perform an operation on non-contiguous elements, then we'll have to use extended slicing to select those elements, but now the LHS and RHS **need to be the same length**. Since, inserting requires an empty slice on the LHS while deleting requires an empty slice on the RHS, these two operations are impossible using this approach.

# 09 - Custom Sequences - Part 2a

We have seen before how we could define our own custom sequence type by implementing the `__len__` and `__getitem__` methods.

Here we are going to look at how to implement:
* concatenation (`+`) : same as `.__add__()`
* in-place concatenation (`+=`) : same as `.__iadd__()`
* repetition (`seq * n`); if Python runs into `TypeError`, it then tries the below with reversed args: same as `.__mul__()`
* repetition reversed (`n * seq`) : same as `.__rmul__()`
* in-place repetition (`*=`) : same as `.__rmul__()`
* index assignment (`seq[i]=val`) : same as `.__setitem__()`
* slice assignment (`seq[i:j]=iter` and `seq[i:j:k]=iter`) : same as `.__setitem__()`
* del, : same as `.__delitem__()`
* `<value> in <sequence>` : same as `__contains__()`
* append, extend, pop : These are *not* special methods. If we want them, we just implement them as regular methods.

Implementing these methods will **overload** the current operator methods with our method.

#### The `+` and `+=` Operators

First we look at how we can overload the `+` and `+=` operators in a custom class in general. Then we'll look at how to use this in the context of sequences.

We use the special functions `__add__` and `__iadd__`.

Just to see how those methods get called, we're actually going to implement them to just print out that they were called. As you can see, we can implement them however we want!

In [7]:
class MyClass:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return f'MyClass(name={self.name})'
    
    def __add__(self, other):
        print(f'You called + on {self} and {other}')
        return 'Hello from __add__'
        
    def __iadd__(self, other):
        print(f'You called += on {self} and {other}')
        return 'Hello from __iadd__'

In [8]:
c1 = MyClass('instance 1')
c2 = MyClass('instance 2')

In [9]:
c3 = c1 + c2

You called + on MyClass(name=instance 1) and MyClass(name=instance 2)


Let's try the in-place addition operator. We may expect mutation to occur and therefore, the id should stay the same.

In [10]:
print(id(c1))
c1 += c2
print(id(c1))
print(c1)

140528194484784
You called += on MyClass(name=instance 1) and MyClass(name=instance 2)
140528049348144
Hello from __iadd__


But it doesn't! That's because `c1` is now the string `"Hello from __iadd__"` which is a different object. So the special method `__iadd__` doesn't inherently impose this behaviour - we don't have to make it do that - but it's what everyone expects it to do so we should. The only behaviour that it imposes is allowing for `+=` functionality.

So, `__add__` expects us to take two objects (typically of the same instance) and return a new object of the same instance. 

`__iadd__` expect us to take two objects and typically return the first object but mutated. So we're going to apply a change to the first object as opposed to creating a new instance altogether.

How do we do these two things in code?

In [11]:
class MyClass:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return f'MyClass(name={self.name})'
    
    def __add__(self, other):
        return MyClass(self.name + ' ' + other.name)
        
    def __iadd__(self, other):
        self.name += ' ' + other.name   # Remember, this is NOT inplace concatenation / mutation 
                                        # because 'name' is a string which is immutable so Python is forced to create a new object.
        return self
        

In [12]:
c1 = MyClass('Eric')
c2 = MyClass('Idle')

In [13]:
c3 = c1 + c2

In [14]:
c3

MyClass(name=Eric Idle)

These two methods are not restrictive (which is not necessarily an issue). All we require for the 2nd operand is that it has a property of `name` and is of a type that supports concatenation.

Also, to emphasise once more, `__iadd__` will now have the ensure that the ID remains the same. This is because we return `self` (the original object) as opposed to `MyClass(self.name + other.name)`.

#### The `*` and `*=` Operators

Just as easily we can overload the `*` and `*=` operators too, using the `__mul__` and `__imul__` methods.

In [14]:
class MyClass:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return f'MyClass(name={self.name})'
    
    def __add__(self, other):
        return MyClass(self.name + ' ' + other.name)
        
    def __iadd__(self, other):
        self.name += ' ' + other.name
        return self
    
    def __mul__(self, n):
        return MyClass(self.name * n)
        
    def __imul__(self, n):
        self.name *= n
        return self

In [15]:
c1 = MyClass('Eric')

In [16]:
c1 * 3

MyClass(name=EricEricEric)

In [17]:
print(id(c1), c1)
c1 *= 3
print(id(c1), c1)

140528194484304 MyClass(name=Eric)
140528194484304 MyClass(name=EricEricEric)


What about multiplying an integer by the sequence?

In [25]:
c1 = MyClass('Monty')
2 * c1

TypeError: unsupported operand type(s) for *: 'int' and 'MyClass'

To handle this we need to implement the `__rmul__` method:

In [18]:
class MyClass:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return f'MyClass(name={self.name})'
    
    def __mul__(self, n):
        return MyClass(self.name * n)
    
    def __rmul__(self, n):
        return self.__mul__(n)

In [19]:
c1 = MyClass('Monty')

In [20]:
2 * c1

MyClass(name=MontyMonty)

Python first tries `2.__mul__(c1)` and raises a `TypeError` at `self.name` = `2.name`, so it runs `__rmul__` with the arguments reversed. So, `self <=> n` and `return self.__mul__(n)` becomes `c1.__mul__(3)`, which works.

#### Implementing the `in` operator

For this example, we'll want `in` to test if the something is contained in the name string of our class:

In [21]:
class MyClass:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return f'MyClass(name={self.name})'
    
    def __contains__(self, value):
        return value in self.name

In [22]:
c1 = MyClass('MontyPython')

In [23]:
'ty' in c1

True

# 10 - Custom Sequences - Part 2b_c

For this example we'll re-use the Polygon class from a previous lecture on extending sequences.

We are going to consider a polygon as nothing more than a collection of points (and we'll stick to a 2-dimensional space).

So, we'll need a `Point` class, but we're going to use our own custom class instead of just using a named tuple.

We do this because we want to enforce a rule that our Point co-ordinates will be real numbers. We would not be able to use a named tuple to do that and we could end up with points whose `x` and `y` coordinates could be of any type.

In [2]:
from collections import namedtuple

Point = namedtuple('Point', 'x y')
p1 = Point(10, 5)
p2 = Point('abc', [1,2,3])          # We want to be able to prevent passing in the wrong type in our custom sequence

x, y = p1                           # But, we would like this functionality of namedtuples in our custom sequence
print(x)
print(y)

10
5


First we'll need to see how we can test if a type is a numeric real type.

We can do this by using the numbers module.

In [5]:
import numbers

This module contains certain base types for numbers that we can use, such as Number, Real, Complex, etc.

In [6]:
print(isinstance(10, numbers.Number))
print(isinstance(1+1j, numbers.Number))

True
True


We will want our points to be real numbers only, so we can do it this way:

In [7]:
isinstance(1+1j, numbers.Real)

False

So now let's write our Point class. We want it to have these properties:

  1. The `x` and `y` coordinates should be real numbers only
  2. Point instances should be a sequence type so that we can unpack it as needed in the same way we were able to unpack the values of a named tuple.

In [8]:
class Point:
    def __init__(self, x, y):
        if isinstance(x, numbers.Real) and isinstance(y, numbers.Real):
            self._pt = (x, y)
        else:
            raise TypeError('Point co-ordinates must be real numbers.')
            
    def __repr__(self):
        return f'Point(x={self._pt[0]}, y={self._pt[1]})'
    
    def __len__(self):
        return 2
    
    def __getitem__(self, s):
        return self._pt[s]

In `__getitem__(self, s)`, recall that `s` is the index or slice we provide in square brackets to the sequence: `sequence[s]`. You might think we'll need to worry about whether `s` is a value or a slice but we don't. Why?

`self._pt` is a tuple. Tuples know how to index and slice/extended slice perfectly well. **We are *delegating* this request to the tuple**.

Let's use our point class and make sure it works as intended:

In [9]:
p = Point(1, 2)

In [10]:
p

Point(x=1, y=2)

In [11]:
len(p)

2

In [12]:
p[0], p[1]

(1, 2)

The unpacking above **only** works because `p` is a custom sequence, and it's a custom sequence because of the `__getitem__` method. This unpacking is the same as the unpacking below, except we just assign them to variables.

In [13]:
x, y = p

But why would we want to do this? This will allow us to take one point and unpack it into a new point. When we create a polygon class (which takes a series of xy points), we can just feed in our `Point` object (or we can feed in a regular 2-element list or tuple or any 2-element sequence for that matter).

For example:

In [16]:
p2 = Point(*p1)
p2

Point(x=10, y=5)

Now, we can start creatiung our Polygon class, that will essentially be a mutable sequence of points making up the verteces of the polygon.

In [17]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]  # if pt is already a Point object, we don't have to worry about dealing with it separately 
                                                    # since we implemented unpacking functionality
        else:
            self._pts = []
            
    def __repr__(self):
        return f'Polygon({self._pts})'

Let's try it and see if everything is as we expect:

In [18]:
p = Polygon((0,0), [1,1])

In [19]:
p

Polygon([Point(x=0, y=0), Point(x=1, y=1)])

Now, to see if the `__repr__` is accurate, we should copy and paste it to see if it creates the expected object.

In [20]:
Polygon([Point(x=0, y=0), Point(x=1, y=1)])

TypeError: Point co-ordinates must be real numbers.

Our representation contains those square brackets which technically should not be there as the Polygon class `__init__` assumes multiple comma-separated arguments, not a single iterable.

So we should fix that by taking each Point in the iterable, converting it into a string and append it to a list. Then join the list via `,` so that we get comma-separated arguments.

In [21]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]
        else:
            self._pts = []
            
    def __repr__(self):
        pts_str = ', '.join([str(pt) for pt in self._pts])
        return f'Polygon({pts_str})'

In [22]:
p = Polygon((0,0), [1,1])
p

Polygon(Point(x=0, y=0), Point(x=1, y=1))

In [25]:
p2 = Polygon(Point(x=0, y=0), Point(x=1, y=1)) # Works, no errors

So now we can start making our Polygon into a sequence type, by implementing methods such as `__len__` and `__getitem__`.

Then we'll support concatenation with `__add__`. (We don't need to support repetition - it doesn't make sense with polygons.)

In [27]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]
        else:
            self._pts = []
            
    def __repr__(self):
        pts_str = ', '.join([str(pt) for pt in self._pts])
        return f'Polygon({pts_str})'
    
    def __len__(self):
        return len(self._pts)
    
    def __getitem__(self, s):
        return self._pts[s]         # No need to worry whether s is an integer or a slice because _pts is a list 
                                    # and lists are sequences and sequences support indexing and slicing. We've delegated again.]
    
    def __add__(self, other):
        if isinstance(other, Polygon):
            new_pts = self._pts + other._pts
            return Polygon(*new_pts)
        else:
            raise TypeError('can only concatenate with another Polygon')

In [28]:
p1 = Polygon((0,0), (1,1))
p2 = Polygon((2,2), (3,3))

new_polygon = p1 + p2
new_polygon

Polygon(Point(x=0, y=0), Point(x=1, y=1), Point(x=2, y=2), Point(x=3, y=3))

Now, let's add in-place concatenation. Remember, we need to mutate ourselves (`self`) by adding the points from the other point `other` and then **return ourselves (`self`)**.

In [29]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]
        else:
            self._pts = []
            
    def __repr__(self):
        pts_str = ', '.join([str(pt) for pt in self._pts])
        return f'Polygon({pts_str})'
    
    def __len__(self):
        return len(self._pts)
    
    def __getitem__(self, s):
        return self._pts[s]
    
    def __add__(self, other):
        if isinstance(other, Polygon):
            new_pts = self._pts + other._pts
            return Polygon(*new_pts)
        else:
            raise TypeError('can only concatenate with another Polygon')
            
    def __iadd__(self, pt):
        if isinstance(pt, Polygon):
            self._pts = self._pts + pt._pts   # THIS IS NOT IN-PLACE CONCATENATION; self._pts on the LHS will be a new object, but that's okay!
            return self
        else:
            raise TypeError('can only concatenate with another Polygon')

You'll notice that in `__iadd__` we have: `self._pts = self._pts + pt._pts` instead of `self._pts += pt._pts`. That's because we don't care if we `self._pts` gets a new memory address or not - we only want to ensure that the memory address of the object `self` stays the same.

In [32]:
p1 = Polygon((0,0), (1,1))
p2 = Polygon((2,2), (3,3))

print(f'{id(p1)=}')
p1 += p2                # Equivalent to p1 = p1.__iadd__(p2)
print(f'{id(p1)=}')

id(p1)=139642594553872
id(p1)=139642594553872


But what we can't do yet is concatenate with anything that appears like a point, such as a list or tuple with 2 elements.

In [75]:
p1 = Polygon((0,0), (1,1))
p1 += [(2,2), (3,3)]

To fix this, we need to rewrite our `__iadd__` to not include the `if isinstance(pt, Polygon)`. Then, if the `other` is a Polygon, we deal with it as we have above. 

But, if it's an iterable, we'll need to take each point in the iterable and unpack its 2 elements into a `Point` object. If those elements are not real numbers, our `Point` class should catch it.

For example:

`[(3,3), (4,4)]` --> `[Point(3,3), Point(4,4)]`

In [41]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]
        else:
            self._pts = []
            
    def __repr__(self):
        pts_str = ', '.join([str(pt) for pt in self._pts])
        return f'Polygon({pts_str})'
    
    def __len__(self):
        return len(self._pts)
    
    def __getitem__(self, s):
        return self._pts[s]
    
    def __add__(self, pt):
        if isinstance(pt, Polygon):
            new_pts = self._pts + pt._pts
            return Polygon(*new_pts)
        else:
            raise TypeError('can only concatenate with another Polygon')
            
    def __iadd__(self, pts):
        if isinstance(pts, Polygon):
            self._pts = self._pts + pts._pts
        else:
            # assume we are being passed an iterable containing Points
            # or something compatible with Points
            points = [Point(*pt) for pt in pts]
            self._pts = self._pts + points
        return self

In [42]:
p1 = Polygon((0,0), (1,1))
p1 += [(2,2), (3,3), Point(5,5), {3,4}]

Now, let's implement `append`, `insert` and `extend`. 

Remember, these are not special methods. But, everyone expects them to behave in a particular way so we should replicate that. 

For example, the method `extend` generally does a mutation and **has no return**. 

In fact, **`extend` is identical to `__iadd__` except that it has no return, so we can refactor our code to reflect this** We do this by making `__iadd__` use our `extend` method but just return the outcome.

In [44]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]
        else:
            self._pts = []
            
    def __repr__(self):
        pts_str = ', '.join([str(pt) for pt in self._pts])
        return f'Polygon({pts_str})'
    
    def __len__(self):
        return len(self._pts)
    
    def __getitem__(self, s):
        return self._pts[s]
    
    def __add__(self, pt):
        if isinstance(pt, Polygon):
            new_pts = self._pts + pt._pts
            return Polygon(*new_pts)
        else:
            raise TypeError('can only concatenate with another Polygon')

    def append(self, pt):
        self._pts.append(Point(*pt))
        
    def extend(self, pts):
        if isinstance(pts, Polygon):
            self._pts = self._pts + pts._pts
        else:
            # assume we are being passed an iterable containing Points
            # or something compatible with Points
            points = [Point(*pt) for pt in pts]
            self._pts = self._pts + points
    
    def __iadd__(self, pts):
        self.extend(pts)
        return self
    
    def insert(self, i, pt):
        self._pts.insert(i, Point(*pt))

In [50]:
p1 = Polygon((0,0), Point(1,1))
p2 = Polygon([2, 2], [3, 3])

print(f"p1 original. Result: {p1}")

p1.append((4, 4))
print(f"p1 appended. Result: {p1}")

p1.insert(1, Point(-1, -1))
print(f"p1 inserted. Result: {p1}")

p3 = Polygon((6,6), Point(20,20))
p1.extend(p3)
print(f"p1 extended. Result: {p1}")

p1 original. Result: Polygon(Point(x=0, y=0), Point(x=1, y=1))
p1 appended. Result: Polygon(Point(x=0, y=0), Point(x=1, y=1), Point(x=4, y=4))
p1 inserted. Result: Polygon(Point(x=0, y=0), Point(x=-1, y=-1), Point(x=1, y=1), Point(x=4, y=4))
p1 extended. Result: Polygon(Point(x=0, y=0), Point(x=-1, y=-1), Point(x=1, y=1), Point(x=4, y=4), Point(x=6, y=6), Point(x=20, y=20))


**Part 2c:`__setitem__` Method**

The two things we want to do is:

- provide an index and a new point, and replace our Polygon's element at that index with the new point.
- same as above but with a slice.

For example:

`p1[3] = Point(100, 100)`
`p1[0:2] = [Point(-1,-1), Point(-2,-2)]`

This is going to be easier than we may think because we can use **delegation** to delegate our slice operation to the list type because lists support setting items.

We'll start off from where we left off above and add in our `__setitem__` method. Before we write it in our class above, I'll write one possible approach in isolation below, and then we'll see what issues may arise.

In [None]:
def __setitem__(self, s, value):
    
    if isinstance(s, int):
        self._pts[s] = Point(*value)
    else:
        self._pts[s] = [Point(*pt) for pt in value]           

Consider 

`p[0] = [Point(10, 10), Point(20, 20)]`

This won't work. Why not?

The LHS is fine. It's the the RHS that's causing the issue. RHS = `[Point(10, 10), Point(20, 20)]` which gets equated to `value`.

Since `s` is an `int`, we go through the first condition which evaluates to: `self._pts[0] = Point( Point(10, 10) , Point(20, 20) )` 

In other words, we have a Point made of Points instead of integers. 

**This throws an error as it should but the error is found in Point which we created way back: 'Point co-ordinates must be real numbers.'**

-----------
```
class Point:
    def __init__(self, x, y):
        if isinstance(x, numbers.Real) and isinstance(y, numbers.Real):
            self._pt = (x, y)
        else:
            raise TypeError('Point co-ordinates must be real numbers.')
            
    def __repr__(self):
        return f'Point(x={self._pt[0]}, y={self._pt[1]})'
    
    def __len__(self):
        return 2
    
    def __getitem__(self, s):
        return self._pt[s]
```
-------------
The error is technically correct - Point(10, 10) isn't an integer after all - but, we'd want to be more specific.

We'd find a similar sort of somewhat meaningless/unintuitive error if we tried 

`p[0:2] = Point(20, 20)`

`> TypeError: type object argument after * must be an iterable, not int`

Again, the fact that an error is thrown is good but we just want a clear error.

**How are we going to fix this?**

We need to understand these two rules:

- If `s` is an integer, `value` must be a single `Point`.
- If `s` is a slice, `value` must be a sequence/iterable of Points

So what's our method?

- We'll try to take the RHS e.g `[Point(10, 10), Point(20, 20)]` and **`try`** to make a list of points. If we succeed, we know we have a list of points.
- Otherwise, we probably have a single point when we want a list of points. So, we have a TypeError that we want to catch.
- But we *may not* have a single point, so let's **`try`** to make a `Point` from the RHS. If we succeed, we have a Point.
- Otherwise, we have something useless like a complex value or string. For this, we need to **`raise`** our own error.

- If no exceptions were raised throughout this process, **then we have a valid RHS**.
- Only now we can implement the two rules above which can be done with a single `x or y` instead of two `if` statements. 

Here's our isolated method solution:

In [51]:
def __setitem__(self, s, value):
    # we first should see if we have a single Point
    # or an iterable of Points in value
    try:
        rhs = [Point(*pt) for pt in value]
        is_single = False
    except TypeError:
        # not a valid iterable of Points
        # maybe a single Point?
        try:
            rhs = Point(*value)
            is_single = True
        except TypeError:
            # still no go
            raise TypeError('Invalid Point or iterable of Points')

    # reached here, so rhs is either an iterable of Points, or a Point
    # we want to make sure we are assigning to a slice only if we have an iterable of points,
    # or, assigning to an index if we have a single Point only
 
    if (isinstance(s, int) and is_single) or isinstance(s, slice) and not is_single: 
        self._pts[s] = rhs
    else:
        raise TypeError('Incompatible index/slice assignment')

Let's add `del` and `pop` functionality to our class quickly using **delegation** as we've done previously before testing out our exception handling.

Here's our complete class:

In [57]:
class Polygon:
    def __init__(self, *pts):
        if pts:
            self._pts = [Point(*pt) for pt in pts]
        else:
            self._pts = []
            
    def __repr__(self):
        pts_str = ', '.join([str(pt) for pt in self._pts])
        return f'Polygon({pts_str})'
    
    def __len__(self):
        return len(self._pts)
    
    def __getitem__(self, s):
        return self._pts[s]
    
    def __setitem__(self, s, value):
        # we first should see if we have a single Point
        # or an iterable of Points in value
        try:
            rhs = [Point(*pt) for pt in value]
            is_single = False
        except TypeError:
            # not a valid iterable of Points
            # maybe a single Point?
            try:
                rhs = Point(*value)
                is_single = True
            except TypeError:
                # still no go
                raise TypeError('Invalid Point or iterable of Points')
        
        # reached here, so rhs is either an iterable of Points, or a Point
        # we want to make sure we are assigning to a slice only if we 
        # have an iterable of points, and assigning to an index if we 
        # have a single Point only
        if (isinstance(s, int) and is_single) \
            or isinstance(s, slice) and not is_single:
            self._pts[s] = rhs
        else:
            raise TypeError('Incompatible index/slice assignment')
                
    def __add__(self, pt):
        if isinstance(pt, Polygon):
            new_pts = self._pts + pt._pts
            return Polygon(*new_pts)
        else:
            raise TypeError('can only concatenate with another Polygon')

    def append(self, pt):
        self._pts.append(Point(*pt))
        
    def extend(self, pts):
        if isinstance(pts, Polygon):
            self._pts = self._pts + pts._pts
        else:
            # assume we are being passed an iterable containing Points
            # or something compatible with Points
            points = [Point(*pt) for pt in pts]
            self._pts = self._pts + points
    
    def __iadd__(self, pts):
        self.extend(pts)
        return self
    
    def insert(self, i, pt):
        self._pts.insert(i, Point(*pt))
        
    def __delitem__(self, s):
        del self._pts[s]
        
    def pop(self, i):
        return self._pts.pop(i)

So now let's see if we get better error messages:

In [58]:
p1 = Polygon((0,0), (1,1), (2,2))

In [59]:
p1[0:2] = (10,10)

TypeError: Incompatible index/slice assignment

In [60]:
p1[0] = [(0,0), (1,1)]

TypeError: Incompatible index/slice assignment

In [61]:
p = Polygon(*zip(range(6), range(6)))
p.pop(1)

Point(x=1, y=1)

# 11- Sorting Sequences

#### Lecture

Just like with the concatenation and in-place concatenation we saw previously, we have two different ways of sorting a mutable sequence:

* returning a new sorted sequence
* in-place sorting (mutating sequence) - obviously this works for mutable sequence types only!


For any iterable, the built-in `sorted` function will return a **list** containing the sorted elements of the iterable.

So a few things here: 
* any iterable can be sorted (as long as it is finite)
* the elements must be pair-wise comparable (possibly indirectly via a sort key)
* the returned result is always a list
* the original iterable is not mutated

In addition:
* optionally specify a `key` - a function that extracts a comparison key for each element. If that key is not specified, Python will use the natural ordering of the elements (such as __gt__, etc, so that fails if they do not!)
* optional specify the `reverse` argument which will return the reversed sort

Of course, numbers have a natural sort order and strings do too (alphabetical, but don't forget that a distinction is made if the letter is capitalised). But what if we have numerous strings and we want to sort by the last character of the string. What word the **sort key** be?

```
item:    'hello'    'python'    'parrot'    'bird'
key:       'o'        'n'         't'         'd'
```

And we know that since these sort keys are strings, they have a natural sort order.

So what would the key function look like?

`key = lambda s: s[-1]`

What do we mean by the natural sort order?

These are how we'd generally and intuitively sort the items. If we do not provide the `key` keyword argument, then we can always think of the default key as the elements themselves. In other words,

`sorted(iterable) <-> sorted(iterable, key=lambda x: x)`

For things like dictionaries, this works slightly differently. Remember what happens when we iterate a dictionary?

In [8]:
d = {3: 100, 2: 200, 1: 10}
for item in d:
    print(item)

3
2
1


We actually are iterating the keys.

Same thing happens with sorting - we'll end up just sorting the keys:

In [9]:
d = {3: 100, 2: 200, 1: 10}

sorted(d)

[1, 2, 3]

But what if we wanted to sort the dictionary keys based on the values instead?

This is where the `key` argument of `sorted` will come in handy.

We are going to specify to the `sorted` function that it should use the value of each item to use as a sort key:

In [11]:
d = {'a': 100, 'b': 50, 'c': 10}

sorted(d, key=lambda k: d[k])

['c', 'b', 'a']

Note: **The `sorted` function makes a copy of the iterable and returns the sorted elements in a `list` always**.

#### Stable Sorting

You might have noticed that the words `this`,  `late` and `bird` all have four characters - so how did Python determine which one should come first? Randomly? No!

The sort algorithm that Python uses, called the *TimSort* (named after Python core developer Tim Peters - yes, the same Tim Peters that wrote the Zen of Python!!), is what is called a **stable** sort algorithm.

This means that items with equal sort keys maintain their relative position.

In [None]:
t = 'aaaa', 'bbbb', 'cccc', 'dddd', 'eeee'

In [None]:
sorted(t, key = lambda s: len(s))

['aaaa', 'bbbb', 'cccc', 'dddd', 'eeee']

Now let's change our tuple a bit:

In [None]:
t = 'bbbb', 'cccc', 'aaaa', 'eeee', 'dddd'

In [None]:
sorted(t, key = lambda s: len(s))

['bbbb', 'cccc', 'aaaa', 'eeee', 'dddd']

As you can see, when the sort keys are equal (they are all equal to 4), the original ordering of the iterable is preserved. `bbbb` came before `cccc` in our tuple so `bbbb` will come before `cccc` after sorting, even though they have the same sort key.

#### Reversed Sort

We also have the `reverse` keyword-only argument that we can use - basically it sorts the iterable, but returns it reversed:

In [1]:
t = 'this', 'bird', 'is', 'a', 'late', 'parrot'

In [40]:
sorted(t, key=lambda s: len(s), reverse=True)

['parrot', 'this', 'bird', 'late', 'is', 'a']

#### In-Place Sort

If the iterable is mutable, in-place sorting is possible. The `list` class has a `.sort()` instance method that does in-place sorting. **It won't return anything**.

This method is slightly more efficient than the `sorted()` function because it doesn't have to make a copy of the iterable before sorting.

In [4]:
l = ['this', 'bird', 'is', 'a', 'late', 'parrot']
result = l.sort(key = lambda x: len(x))
result

#### Natural Ordering for Custom Classes

I just want to quickly show you that in order to have a "natural ordering" for our custom classes, we just need to implement the `<` or `>` operators. (I discuss these operators in Part 1 of this course)

In fact, we can modify our class slightly so we can see that `sorted` is calling our `__lt__` method repeatedly to perform the sort:

In [8]:
class MyClass:
    def __init__(self, name, val):
        self.name = name
        self.val = val
        
    def __repr__(self):
        return f'MyClass({self.name}, {self.val})'
    
    def __lt__(self, other):
        print(f'called {self.name} < {other.name}')
        return self.val < other.val

In [9]:
c1 = MyClass('c1', 20)
c2 = MyClass('c2', 10)
c3 = MyClass('c3', 20)
c4 = MyClass('c4', 10)

In [10]:
sorted([c1, c2, c3, c4])

called c2 < c1
called c3 < c2
called c3 < c1
called c4 < c1
called c4 < c2


[MyClass(c2, 10), MyClass(c4, 10), MyClass(c1, 20), MyClass(c3, 20)]

Now we can sort those objects, without specifying a key, since that class has a natural ordering (`<` in this case). Moreover, notice that the sort is stable.

But we can still sort by using a key. For example:

In [11]:
l = [c2, c4, c1, c3]
sorted(l, key=lambda c: c.name)

[MyClass(c1, 20), MyClass(c2, 10), MyClass(c3, 20), MyClass(c4, 10)]

If you wanted to implement all the other orderings you can just use the `@total_ordering` decorator: `from functools import total_ordering`

# 12 - List Comprehensions

#### Lecture

See Part 1 for introduction on comprehensions if needed.

**Internal Mechanics of List Comprehensions**

Comprehensions have their own **local scope** - just like a function.

Functions can be nested inside other functions, creating an inner and an outer scope. If the inner references a variable in the outer, that variable becomes a free variable and the inner function is known as a **closure** - so closures have free variables.

We need to recognize that list comprehensions are essentially temporary functions that Python creates, executes and returns the resulting list from it.

Let's break this down with an example:

`sq = [item**2 for item in range(10)]`

- The entire RHS *is* the list comprehension. 
- There are two stages: compilation and execution. 
- During **compilation**, Python creates a temporary function, `item` being a variable in the local scope, that'll be used to evaluate the comprehension. 
- Something like:

```
def temp():
    new_list = []
    for item in range(10):
        new_list.append(item**2)
       
    return new_list
```

- When the original line is **executed**, python executes `temp()`.
- Then, it stores the returned object (`new_list`) at some memory address and points `sq` on the LHS to it.

#### Comprehension Scopes

As mentioned `item` in the list comprehension is a local symbol of the temporary function. 

But the comprehension has access to **global** variables just like how normal functions will access the global scope if it can't find a variable in the local scope.

But what about **nonlocal** variables? 

These are variables that are "neither local nor global". The nonlocal is a keyword in python that is used to declare any variable as not local but instead comes from the nearest enclosing scope that is not global. 

Consider the function:

In [13]:
def my_func(num):
    sq = [item**2 for item in range(num)]

Remember to think of the RHS as a function. It contains `num` which is a variable in the outer scope. We therefore have a comprehension function nested inside a regular function. Since the comprehension function is referencing `num`, a `nonlocal` symbol, it becomes a free variable - so we have a closure.

**The RHS *is* a closure - a function nested in another function that has access to one or more free variables in an outer scope because it references those nonlocal symbols.**

#### Things to Watch Out For

**Example 1**

To drive the point home about comprehension scopes, consider the following below. We have a regular `for` loop containing the symbol `number`:

In [13]:
l=[]

for number in range(5):
    l.append(number**2)
    
print(number)
print('number' in globals())

4
True


So, Python automatically creates a symbol within the scope that it's residing in. But what if we create this loop in a comprehension? (First, delete `number` from `globals()`.)

In [14]:
if 'number' in globals():
    del number

l = [number**2 for number in range(5)]

print('number' in globals())

False


It's not in the `globals()` scope because it was created within scope of the temporary function that Python created for the comprehension.

**But be careful!**

In `[number**2 for number in range(5)]`, the `number` in `for number` is the **declaration** of the variable within the function scope. But, the `number` in `number**2` is a **reference**.

First, note that, although the first mention of `number` comes first in the comprehension, it comes at the very end of the temporary function that Python creates - that's why it's considered a reference and NOT a declaration by Python.

Secondly, from what we know about functions in Python, if a variable is referenced within a function, Python first looks for any declarations within the local scope of the function. If none are found, it zooms out to look in the global scope.

So, take a look at the example below:

In [15]:
number = 100

[number * i for i in range(5)] # number is in the first part of the statement, so it's a reference NOT a DECLARATION
                               # i is AFTER the first part of the statement, it's a declaration.

[0, 100, 200, 300, 400]

- When Python created the temporary function and began with `for i in range(5):`, it declared `i` as a local variable. 
- Then, it got round to `number * i` but it couldn't find any declaration of `number` within its local (comprehension) scope, so it looked in globals and found `number = 100`.

**Example 2**

Now let's look at an example we've seen before when we studied closures.

Suppose we want to generate a list of functions that will calculate powers of their argument, i.e. we want to define a bunch of functions. Here's one approach:

We could certainly define a bunch of functions one by one:

In [45]:
fn_0 = lambda x: x**0
fn_1 = lambda x: x**1
fn_2 = lambda x: x**2
fn_3 = lambda x: x**3
# etc

But this would be very tedious if we had to do it more than just a few times.

Instead, why don't we create those functions as lambdas and put them into a list where the index of the list will correspond to the power we are looking for.

Something like this if we were doing it manually:

In [46]:
funcs = [lambda x: x**0, lambda x: x**1, lambda x: x**2, lambda x: x**3]

Now we can call these functions this way:

In [47]:
print(funcs[0](10))
print(funcs[1](10))
print(funcs[2](10))
print(funcs[3](10))

1
10
100
1000


Now all we need to do is to create these functions using a loop - the traditional way first:

First let's make sure `i` is not in our global symbol table:

In [17]:
if 'i' in globals():
    del i

In [18]:
funcs = []
for i in range(6):
    funcs.append(lambda x: x**i)

And let's use them as before:

In [20]:
print(funcs[0](10))
print(funcs[1](10))

funcs

100000
100000


[<function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>]

What happened?? It looks like every function is actually calculating `10**5`. To break it down:

- `lambda x: x**i` is a function which references `i` that exists in an outer scope. Therefore, the lambda is a closure.
- Inside `funcs`, each of those lambda's have a `x**i` but `i` is pointing to the global `i`.
- So, for whichever lambda we fetch, when lambda wants to compute `x**i`, it goes through a pointer to the global `i`.
- Since the `for` loop terminated, `i` ended as `5`. So any mention of `i` in any of those lambdas will point to whatever is the current value of `i`.

Take a look at the equivalent breakdown:

In [None]:
i = 0
def fn_0(x):
    return x ** i

i = 1
def fn_0(x):
    return x ** i

i = 2
def fn_0(x):
    return x ** i

What's the solution? Recall that if a function is defined with a default parameter, e.g. `def log(current_dt=datetime.now()`, that parameter is calculated during **compilation** of the function, not **execution**. So, if our `log` function is called on different days without a `current_dt` parameter provided, the same default value will be pulled every time.

We can use this to our advantage in this scenario. Take a look:

In [2]:
funcs = [lambda x, p=i: x**p for i in range(4)]

print(funcs[0](10))
print(funcs[1](10))
print(funcs[2](10))
print(funcs[3](10))

1
10
100
1000


It works because first the value of `i=0` is taken and during the creation of the first lambda (compilation), `p` gets assigned to `i`, so `p=0` is a default value to this particular lambda. If no default `p` value is provided, then Python pulls this value of `p=0`. It might be useful to think that a value for `p` gets hardcoded for each value of `i` in the loop. 

**Compilation**

Let's show that Python is indeed creating a function by compiling a comprehension, and then disassembling the compiled code to see what's happened:

In [11]:
import dis

In [12]:
compiled_code = compile('[i**2 for i in (1, 2, 3)]', 
                        filename='', mode='eval')

In [13]:
dis.dis(compiled_code)

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x000001F77210ED20, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               5 ((1, 2, 3))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE


As you can see, in step 4, Python created a function (`MAKE_FUNCTION`), called it (`CALL_FUNCTION`), and then returned the result (`RETURN_VALUE`) in the last step.

**Nested Comprehensions**

Comprehensions can be nested within each other. And since they're functions, a nested comprehension can access variables in the outer comprehension which are nonlocal to inner comprehension. 

If this is the case, the inner/nested comprehension becomes a closure because it's a nested 'function' accessing a free variable. 

In the example below, `i` is a nonlocal variable to the nested comprehension because `i` is spotted in the outer comprehension/function. So, the nested comprehension is a function.

In [14]:
[ [i * j for j in range(5)] for i in range(5)]

[[0, 0, 0, 0, 0],
 [0, 1, 2, 3, 4],
 [0, 2, 4, 6, 8],
 [0, 3, 6, 9, 12],
 [0, 4, 8, 12, 16]]

**Nested Loops in Comprehensions**

These are NOT nested comprehensions. Let's take a look at the regular way and compare it to having nested loops in a comprehension.

In [18]:
l = []

for i in range(2):
    for j in range(2):    
        for k in range(2):    
            l.append((i, j, k))
            
print(l)

[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]


In [20]:
l = [(i, j, k) for i in range(2) for j in range(2) for k in range(2)]

print(l)

[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]


We can have `if` statements but be careful - the order of `for` and `if` matters. Make sure your `if` condition only references variables that are earlier on in the loop (excluding the first bit of the loop - `(i,j)` is the first bit of the loop below, but `i` and `j` are not declared here).

```
l = [(i, j) for i in range(2) for j in range(2) if i==j] # CORRECT
l = [(i, j) for i in range(2) if i==j for j in range(2)] # WRONG, j not declared until 'for j...' but referenced earlier in 'i==j'
```

Here's an example with two conditions:

In [23]:
[(i, j)
 for i in range(1,6) if i%2==0 
 for j in range(1,6) if j%3==0]

[(2, 3), (4, 3)]

But note, we could put both `if` statements at the end because both `i` and `j` are defined by then:

In [24]:
[(i, j)
 for i in range(1,6) 
 for j in range(1,6)
 if i%2==0
 if j%3==0]

[(2, 3), (4, 3)]

Here's another example. Note that there's a significant difference between enclosing and not enclosing the first expression in [ ]. 

If we want an output of `['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']`, then we will only have **one** nested comprehension. As a result, the two variables are both local because they are both within the same [ ].

In [12]:
l1 = ['a', 'b', 'c']
l2 = ['x', 'y', 'z']

nested_comp = [[s1 + s2 for s1 in l1] for s2 in l2]
nested_loop = [s1 + s2 for s1 in l1 for s2 in l2]

print(nested_comp)
print(nested_loop)

[['ax', 'bx', 'cx'], ['ay', 'by', 'cy'], ['az', 'bz', 'cz']]
['ax', 'ay', 'az', 'bx', 'by', 'bz', 'cx', 'cy', 'cz']


The easy way to interpret the nested loop expression is to recognise that the loops are in the same order as the traditional way, i.e.,

```
for s1 in l1:
    for s2 in l2
```