# Agenda

1. Threads + multiprocessing with `map`
2. `asyncio`
3. NumPy

In [1]:
# list comprehension

numbers = range(10)

[one_number ** 2
 for one_number in numbers]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [2]:

numbers = range(10)

[one_number ** 2               # expression ... what we used to use "map" for
 for one_number in numbers
 if one_number % 2]            # condition ... what we used to use "filter" for

[1, 9, 25, 49, 81]

In [5]:
# how does map work?

# we pass map two arguments:
# 1. a function that takes one argument
# 2. an iterable of data

words = 'this is a bunch of words'.split()

list(map(len, words))

[4, 2, 1, 5, 2, 5]

In [7]:
def count_vowels(s):
    total = 0

    for one_character in s:
        if one_character in 'aeiou':
            total += 1

    return total

words = 'this is a fantastic and enticing and superfabulous sentence'.split()

list(map(count_vowels, words))

[1, 1, 1, 3, 1, 3, 1, 6, 3]

In [9]:
print(*map(count_vowels, words))

1 1 1 3 1 3 1 6 3


# Exercise: Longest words

1. Write a function that takes a filename (string) as an argument, and returns the longest word found in that file.
2. Use the Executor.map functions (for threads and processes) to call this function on a list of filenames.
3. Print the longest words that you found.
4. Also: How much time does it take to run these? (You can use `time.time` to get the number of seconds since 1.1.70.)

In [10]:
import time
time.time()

1704269326.356544

# `asyncio`

Reactor model -- one process and one thread.

The way it works:
- We put all of our tasks (functions, basically) as elements on a list
- We run a `for` loop on the list, and let each function run a little bit
- When the function ends, we remove it from our list
- We can always add new functions
- So long as there are tasks on the list, we re-run our `for` loop from the start

## Advantages
- Many more items can run
- We don't have to worry about thread safety
- We know exactly when a function might be stopped
- We have access to global variables

## Disadvantages
- No real I/O
- We don't really, directly run our function
- We add two words to Python's vocabulary:
    - `async` -- `async def` creates a function that can be run via the event loop
    - `await` -- very similiar to `yield`, in that it says: Whatever is to my right will take a long time, so I'll go to sleep waiting for it

# Next up:

- More `asyncio`
- NumPy

Resume at 11:20

# Terms

- `async def` -- syntax used in Python to create a function that, when run, gives us a coroutine back
- coroutine -- the body of a function, along with its stack frame, that we get back from running an `async def`. We can run a coroutine directly with `asyncio.run` or by putting it to the right of `await`.
- task -- a wrapper for a coroutine, placed on the event loop, and executed whenever the event loop wants.

If I really want to see things run concurrently, then I need to:
- Create several tasks, based on coroutines
- Run all of the tasks as a group



# NumPy

NumPy array:
- Python API
- C implementation



In [11]:
import numpy as np

In [12]:
# np.nparray is the data structure we want
# but we'll build it with np.array, passing it a list

a = np.array([10, 20, 30, 40, 50, 60])
a

array([10, 20, 30, 40, 50, 60])

In [13]:
# many standard Python things work with a NumPy array

a[0]

10

In [14]:
a[1]

20

In [15]:
a[-1]

60

In [17]:
a[2:5]

array([30, 40, 50])

In [18]:
for one_item in a:
    print(one_item)

10
20
30
40
50
60


In [19]:
# some other ways to create NumPy arrays

a = np.random.randint(0, 100, 10)    # 10 ints from 0-100
a

array([12, 25, 33, 61, 32, 52, 97, 91, 40, 66])

In [20]:
a = np.random.rand(10)   # 10 floats from 0-1
a

array([0.29683241, 0.93433433, 0.84712284, 0.47561611, 0.74451551,
       0.85910408, 0.86507627, 0.50345427, 0.43890329, 0.6056343 ])

In [21]:
a = np.arange(20)   # all numbers from 0-19
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [22]:
# fancy indexing -- I can request multiple indexes

np.random.seed(0)   
a = np.random.randint(0, 100, 10)
a

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87])

In [23]:
a[  [3, 5, 2]  ]

array([67,  9, 64])

In [24]:
# methods on our NumPy array

a.sum()

525

In [25]:
a.min()

9

In [26]:
a.max()

87

In [27]:
a.mean()

52.5

In [28]:
a.std()

24.3567239176372

In [29]:
a.size   # not a method!

10

In [30]:
a

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87])

In [31]:
# what happens when I add a to itself?
a + a

array([ 88,  94, 128, 134, 134,  18, 166,  42,  72, 174])

In [32]:
a1 = np.array([10, 20, 30, 40, 50])
a2 = np.array([100, 200, 300, 400, 500])
a3 = np.array([25, 50, 75])

a1 + a3

ValueError: operands could not be broadcast together with shapes (5,) (3,) 

In [33]:
a1 / a2

array([0.1, 0.1, 0.1, 0.1, 0.1])

In [34]:
# broadcast

a

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87])

In [35]:
a + a

array([ 88,  94, 128, 134, 134,  18, 166,  42,  72, 174])

In [36]:
a + 3 

array([47, 50, 67, 70, 70, 12, 86, 24, 39, 90])

In [37]:
a ** 3

array([ 85184, 103823, 262144, 300763, 300763,    729, 571787,   9261,
        46656, 658503])

In [40]:
a.__pow__(3)

array([ 85184, 103823, 262144, 300763, 300763,    729, 571787,   9261,
        46656, 658503])

# Exercise: NumPy basics

1. Create two NumPy arrays -- one with the minimum temps for the next 10 days, and the other with the maximum temps for the next 10 days.
2. Find the mean high temp in the next 10 days.
3. Find the mean high temp in the next 3 days.
4. Find the mean of the difference in temperatures in the next few days.
5. Convert the two arrays to Fahrenheit, then find the mean difference between temperatures in the coming days.

In [41]:
min_temps = np.array([13, 12, 12, 11, 11, 11, 10, 12, 10, 8])
max_temps = np.array([19, 20, 18, 22, 24, 22, 19, 18, 16, 15])


In [42]:
max_temps.mean()

19.3

In [44]:
max_temps[:3].mean()

19.0

In [46]:
# fancy indexing
max_temps[range(3)].mean()

19.0

In [47]:
max_temps[[0, 1, 2]].mean()

19.0

In [51]:
(max_temps - min_temps).mean()

8.3

In [52]:
min_temps * (9/5) + 32

array([55.4, 53.6, 53.6, 51.8, 51.8, 51.8, 50. , 53.6, 50. , 46.4])

In [53]:
max_temps * (9/5) + 32

array([66.2, 68. , 64.4, 71.6, 75.2, 71.6, 66.2, 64.4, 60.8, 59. ])

In [56]:
((max_temps * (9/5) + 32) - 
 (min_temps * (9/5) + 32)).mean()

14.940000000000001

In [57]:
a

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87])

In [58]:
a = np.array([10, 20, 30, 40, 50])
a

array([10, 20, 30, 40, 50])

In [59]:
a[[2, 4]]

array([30, 50])

In [60]:
# what happens if I do this?
# this is a boolean index / mask index -- only where we have a True value do we get a result
a[ [True, False, False, True, True] ]

array([10, 40, 50])

In [None]:
# broadca