# Python101 - part 1

### Generators and iterators

Mentoring material for a better understanding of some python basic concepts

**BEFORE CONTINUING, CLICK ON `Kernel > Restart & Run All`**

## Generators

In [1]:
def my_generator_func():
    yield 'start'
    for i in range(3):
        yield i
    yield 'end'
    return 'done'  # useless

for item in my_generator_func():
    print(item)

start
0
1
2
end


In [2]:
# retrieve all elements from a generator
list(my_generator_func())

['start', 0, 1, 2, 'end']

In [3]:
# think of yield as "push" or "append"!
def list_version():
    l = []
    l.append('start')
    for i in range(3):
        l.append(i)
    l.append('end')
    return l

list_version()

['start', 0, 1, 2, 'end']

### Generator != Generator function

In [4]:
# the generator is stateful
my_generator = my_generator_func()

print('next:', next(my_generator))
print('next:', next(my_generator))
print('next:', next(my_generator))
print('remaining:', list(my_generator))
print('remaining:', list(my_generator))

next: start
next: 0
next: 1
remaining: [2, 'end']
remaining: []


In [5]:
# the generator function returns a new generator each time
print('new call:', list(my_generator_func()))
print('new call:', list(my_generator_func()))

new call: ['start', 0, 1, 2, 'end']
new call: ['start', 0, 1, 2, 'end']


### Generators can be inifinite!

In [6]:
# does not have to be finite
def new_counter():
    i = 0
    while True:
        yield i
        i += 1

counter1 = new_counter()
print('next1:', next(counter1))
print('next1:', next(counter1))
print('next1:', next(counter1))
print('next1:', next(counter1))

next1: 0
next1: 1
next1: 2
next1: 3


In [7]:
# already exists in stdlib's itertools!
# https://docs.python.org/3/library/itertools.html#itertools.count
import itertools

counter2 = itertools.count()

print('next2:', next(counter2))
print('next2:', next(counter2))
print('next2:', next(counter2))

next2: 0
next2: 1
next2: 2


In [8]:
# DON'T DO THIS THOUGH!
# list(counter2)

### Generators make lazy evaluation possible

In [9]:
def take(n, iterable):
    """
    Return first n items of the iterable as a list
    from https://docs.python.org/3/library/itertools.html#recipes
    """
    return list(itertools.islice(iterable, n))

counter3 = itertools.count()
print('take 5:', take(5, counter3))
print('take 8:', take(8, counter3))

take 5: [0, 1, 2, 3, 4]
take 8: [5, 6, 7, 8, 9, 10, 11, 12]


In [10]:
import time

def heavy_generator():
    for i in range(4):
        time.sleep(0.2)
        print('processing...', i)
        yield i

generator = heavy_generator()
print('generator:', generator)
print('> still not evaluated!')
evaluated = list(generator)
print('evaluated:', evaluated)

generator: <generator object heavy_generator at 0x7f2fa88d34c0>
> still not evaluated!
processing... 0
processing... 1
processing... 2
processing... 3
evaluated: [0, 1, 2, 3]


In [11]:
def fibonacci():
    a, b = 1, 1
    while True:
        yield a
        a, b = a + b, a

# we only compute the terms we need
take(10, fibonacci())

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

#### Exercises

In [12]:
# Implement an infinite generator yielding 'A', 'B', 'C', 'A', 'B', ...

def abc_generator():
    yield None

take(5, abc_generator())  # should be ['A', 'B', 'C', 'A', 'B']

[None]

In [13]:
# Implement take without using itertools.islice

def my_take(n, iterator):
    return None

my_take(5, itertools.count())  # should be [0, 1, 2, 3, 4]

In [14]:
# Implement an infinte fizzbuzz

def fizzbuzz():
    yield None

expected = [
    1,
    2,
    'fizz',
    4,
    'buzz',
    'fizz',
    7,
    8,
    'fizz',
    'buzz',
    11,
    'fizz',
    13,
    14,
    'fizzbuzz',
]

take(15, fizzbuzz())  # should be == expected

[None]

## Iterables

Basically everything you can loop on:

In [15]:
# A string:
for c in 'Hello':
    print(c)

H
e
l
l
o


In [16]:
# A list:
for i in [5, 7, 1, 0]:
    print(i)

5
7
1
0


In [17]:
# A range:
for i in range(4):
    print(i)

0
1
2
3


In [18]:
# A tuple:
for i in (6, 8, 9):
    print(i)

6
8
9


In [19]:
# A set:
for i in {1, 2, 3, 4}:
    print(i)

1
2
3
4


In [20]:
# A generator:
for i in my_generator_func():
    print(i)

start
0
1
2
end


In [21]:
# but also real-world examples: a file, a database cursor...
i = 0
with open('.gitignore', 'r') as f:
    for chunk in f:
        i += 1
        print(i, chunk)
        if i > 5:
            break

1 

2 # Byte-compiled / optimized / DLL files

3 __pycache__/

4 *.py[cod]

5 *$py.class

6 



> BTW, you never do this in python. Use `enumerate`!

In [22]:
# use enumerate(iterator) to get a new iterator with the index
with open('.gitignore', 'r') as f:
    for i, chunk in enumerate(f):
        print(i, chunk)
        if i > 5:
            break

0 

1 # Byte-compiled / optimized / DLL files

2 __pycache__/

3 *.py[cod]

4 *$py.class

5 

6 # C extensions



### Pro-tip: the list constructor

You can use the list constructor on any (non-infinite) iterable to see all its elements:

In [23]:
# a string:
list('Hello')

['H', 'e', 'l', 'l', 'o']

In [24]:
# a list:
list([5, 7, 1, 0])

[5, 7, 1, 0]

In [25]:
# a range:
list(range(4))

[0, 1, 2, 3]

In [26]:
# a tuple:
list((6, 8, 9))

[6, 8, 9]

In [27]:
# a set:
list({1, 2, 3, 4})

[1, 2, 3, 4]

In [28]:
# a generator:
list(my_generator_func())

['start', 0, 1, 2, 'end']

In [29]:
# an enumerate:
list(enumerate([55, 777]))

[(0, 55), (1, 777)]

### Pro-tip: use constructors for casting!

You can use other constructors on any iterable (simple casting)

In [30]:
# a tuple:
tuple(range(5))

(0, 1, 2, 3, 4)

In [31]:
# a set:
set('SnoopDoggyDog')

{'D', 'S', 'g', 'n', 'o', 'p', 'y'}

In [32]:
# a dict (iterator of tuples):
dict([(1, 3), (4, 9)])

{1: 3, 4: 9}

In [33]:
# a dict (again:
dict(enumerate('hello'))

{0: 'h', 1: 'e', 2: 'l', 3: 'l', 4: 'o'}

### Pro-tip: Use list-comprehensions!

1. Less code, clearer code
2. Better performance!
3. They are idiomatic of good python code (i.e. they are "pythonic")

In [34]:
def next_ascii_char(c):
    """
    A simple function we'll use below to illustrate
    """
    return chr(ord(c) + 1)

In [35]:
# same as the "map" function
[next_ascii_char(c) for c in 'Hello']

['I', 'f', 'm', 'm', 'p']

In [36]:
# same as the "filter" function
[c for c in 'Hello' if c != 'l']

['H', 'e', 'o']

In [37]:
# filter & map at the same time
[next_ascii_char(c) for c in 'Hello' if c != 'l']

['I', 'f', 'p']

In [38]:
# two nested loops - not necessary a good idea ;)
emails = [f'{x}@{y}' for x in ['pierre', 'paul', 'jack'] for y in ['gmail.com', 'hotmail.com', 'yopmail.com']]
emails

['pierre@gmail.com',
 'pierre@hotmail.com',
 'pierre@yopmail.com',
 'paul@gmail.com',
 'paul@hotmail.com',
 'paul@yopmail.com',
 'jack@gmail.com',
 'jack@hotmail.com',
 'jack@yopmail.com']

### It's not just *list* comprehensions!

In [39]:
# dict comprehensions:
{ord(c): c for c in 'Hello'}

{72: 'H', 101: 'e', 108: 'l', 111: 'o'}

In [40]:
# set comprehensions:
{i for i in range(8) if i % 3}

{1, 2, 4, 5, 7}

In [41]:
# "tuple comprehensions":
tuple(ord(c) for c in 'Hello')

(72, 101, 108, 108, 111)

In [42]:
# Generator comprehensions:
all_ints_with_one = (i for i in itertools.count() if '1' in str(i))

all_ints_with_one # not evaluated here

<generator object <genexpr> at 0x7f2fa88d9410>

In [43]:
# evaluated lazily!
take(15, all_ints_with_one)  # replay this block

[1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 31, 41, 51]

In [44]:
filtered = (x for x in heavy_generator() if x % 2)
print('filtered:', filtered)
print('> still not evaluated!')

squared = (x ** 2 for x in filtered)
print('squared:', squared)
print('> still not evaluated!')

evaluated = list(squared)
print('evaluated:', evaluated)

filtered: <generator object <genexpr> at 0x7f2fa88d97d8>
> still not evaluated!
squared: <generator object <genexpr> at 0x7f2fa88d9830>
> still not evaluated!
processing... 0
processing... 1
processing... 2
processing... 3
evaluated: [1, 9]


### Anti-pattern: never do this for simple loops!

In [45]:
birds = ('pidget', 'eagle', 'falcon', 'pidget')  # some iterable

# ugly golang-style ;)
results = []
for bird in birds:
    results.append(bird.upper())
    
results

['PIDGET', 'EAGLE', 'FALCON', 'PIDGET']

In [46]:
# or even worse: ugly C-style ;)
results = set()
for i in range(len(birds)):
    results.add(birds[i].upper())
    
results

{'EAGLE', 'FALCON', 'PIDGET'}

> This kind of small loops are:
- less pythonic
- less performant
- more verbose
- less readable

### Another typical examples: welcome `str.join`!

- unlike JS, defined in the string and not the array
- makes sense because not restricted to arrays!
- works on any `Sequence[str]`

In [47]:
'-'.join('ABCDEF')  # a str is a sequence of chars that are themselves str

'A-B-C-D-E-F'

In [48]:
','.join(['First', 'Second', 'Third'])  # a list

'First,Second,Third'

In [49]:
''.join('You are tearing me appart Lisa'.split())  # another list

'YouaretearingmeappartLisa'

In [50]:
# even with a generator comprehension:
','.join(s for s in ['First', 'Second', 'Third'] if 'i' in s)

'First,Third'

### Pro-tip: A lot of functions take any iterable (not a list)

In [51]:
sorted('Awesome')

['A', 'e', 'e', 'm', 'o', 's', 'w']

In [52]:
list(reversed('Awesome'))

['e', 'm', 'o', 's', 'e', 'w', 'A']

In [53]:
# lazy iterator as well:
reversed('Awesome')

<reversed at 0x7f2fa887c860>

#### Exercises

In [54]:
# How many distinct characters in a phrase?
# (should ignore the case: 'A' and 'a' should be counted as the same char)

def distinct_chars(s):
    return 0

distinct_chars('This Is A Really Long String') # should be 13

0

In [55]:
# Remove all words that are smaller than n characters
def remove_small_words(s, n):
    return ''

remove_small_words('What the hell are you doing', 3) # should be 'What hell doing'

''

In [56]:
# Create a string in the shape 1-2-3-4...
def generate_int_list_string(n):
    return ''

generate_int_list_string(5) # should be '1-2-3-4-5'

''