Welcome to Lesson 3 of the Noisebridge Python class! ([Noisebridge Wiki](https://www.noisebridge.net/wiki/PyClass) | [Github](https://github.com/audiodude/PythonClass))

In this lesson, we will begin studying **algorithms**. An algorithm is a process or sequence of steps used to perform a task or complete a computation. We will start with basic, everyday algorithms and proceed to some more traditional "computer science" algorithms.

Before we jump into that, however, we will try to make sure you can write basic Python scripts on your own. This means we will go over some of the more nitpick-y details of the language that we may have glossed over in previous lessons, such indentation and function definitions.

In part 1, you will learn:

* Specifics about how Python indentation/whitespace works
* Definitions of positional and keyword arguments to functions
* The special Python arguments: `*args` and `*kwargs`
* Using keyword arguments to extend functionality in a backwards compatible way

And in part 2:

* An algorithm for finding the biggest of a list of numbers
* An algorithm for counting characters in a string
* Using the output of one algorithm as input to another

# Part 1: Language Details

## Whitespace
You may have heard that in Python, "whitespace matters". If you've been editing the code examples in these notebooks, or writing code in a Replit, you might have already come across an `IndentationError` or two.

The rule in Python, in general, is that additional levels of indention are used to define increasingly nested blocks of code. Inside a given "block", all of the lines that comprise the block must have the same indentation, ie the same number of tabs or spaces.

In [None]:
# Top level psuedo-block (not actually a block), no indentation
x = 42
y = x * 2
if y > x:
    # The colon above starts a block, inside the if statement
    print('y is greater')
    z = y * 2
    # Note that every line of code in this block aligns
    
for i in range(x):
    if i > 39 and i % 2 == 0:
        # A second block requires a new level of indentation
        print('%s is over 39 and even' % i)
        continue

    # Blank lines don't matter
    if i % 17 == 0:
            # The indentation of a block doesn't need to match
            # the indentation of other sibling blocks. It just
            # needs to be internally consistent
            print('%s is divisible by 17' % i)
            x2 = i + 10
            
    # Outdenting means the end of the block. This line runs after
    # each of the if statements above
    y2 = x - i
    # This is an IndentationError, because there's no new block,
    # but the indentation of the next line doesn't match the previous
    #  y3 = y2 * 3

*(note to self, uncomment the indentation error and demonstrate that)*

---

## Function definitions

Let's move on to function definitions. Functions can have any number of **positional arguments** and **keyword arguments**. Positional arguments are what we have seen so far, they are required when calling a function:

In [None]:
def my_func(pos1, pos2):
    print(pos1, pos2)

my_func(42, 'foo')

Keyword arguments are optional and are defined with a **default value**. If the function is called with a given keyword argument missing, the default value is used inside the function. Otherwise, you can assign a value to a keyword argument when calling a function by specifying the name of the argument with an equal sign, then the value.

In [None]:
def my_func2(pos1, pos2, kw1=42):
    print(pos1, pos2, kw1)
    
my_func2(10, 'red', kw1=100)
my_func2(20, 'blue')

When calling a function, you must specify the keyword arguments *after* the positional arguments. So the following is an error:

In [None]:
my_func2('foo', kw1='bar', 100)

Keyword arguments themselves, however, can be specified in any order.

In [None]:
def my_func3(pos1, pos2, kw1='foo', kw2='bar', kw3='baz'):
    print(pos1, pos2, kw1, kw2, kw3)
    
my_func3(10, 20, kw3='red', kw2='blue')

You can also specify keyword arguments as if they were positional arguments:

In [None]:
my_func3('red', 'yellow', 'blue')

And positional arguments as if they were keyword arguments:

In [None]:
my_func3(pos1=10, pos2=20)

Though in practice, doing so can cause confusion for folks who are reading your code.

---

## Function 'scope'

In programming, scope refers to the places where you can refer to a variable. A variable that you can refer to without a `NameError` is referred to as being "in-scope".

In [None]:
def get_discount(price):
    return round(price * 0.8, 2)

prices = [1.50, 2.10, 3.29]
for p in prices:
    print(get_discount(p))

The scope of the variable `price`, as defined in the `get_discount` function definition, is only within the get_discount function. You can't refer to that variable outside of the function:

In [None]:
discount_multiplier = 0.8

def get_discount(price):
    return round(price * discount_multiplier, 2)

prices = [1.50, 2.10, 3.29]
print(f'Applying discounts with {discount_multiplier}')
for p in prices:
    print(get_discount(p))
    print(price)  # NameError

You can use the same variable name in multiple places, and they will refer to different things:

In [None]:
def get_discount(price):
    return price * 0.8

def apply_discounts():
    price = 1.29
    # The get_discount function doesn't use the price variable
    # we just defined.
    discounted = get_discount(10.59)
    print(discounted)

apply_discounts()

In Python, blocks do not interact with scope. So if you have a variable that is assigned in an `if` or `for` block, it is still available after the block is finished. This can be surprising!

In [None]:
prices = [1.50, 2.10, 3.29]

for price in prices:
    get_discount(price)

# Price still refers to the last thing it was assigned in the for loop!
print(price)

## Adding default arguments

A popular pattern when writing Python code is to use keyword arguments to introduce new features to a function without having to update all of the existing places where it is called.

In [None]:
def find_job(database, cpu):
    workers = []
    for name, cycles in database.items():
        if cycles >= cpu:
            workers.append(name)
    return workers
        
def find_increasing_jobs(database):
    candidates = {}
    for i in range(0, 100, 10):
        candidates[i] = find_job(database, i)
    return candidates
        
db = {
    'alpha': 45,
    'beta': 55,
    'gamma': 91,
    'phi': 27,
}

data = find_increasing_jobs(db)
print(data)

We can add an argument for only returning the first job that meets our criteria. The main thing here to consider is that the default value of the argument should match the behavior before we modified the code. Here we introduce the `first_only` keyword argument, and set it to `False` because the old version of the function behaved as if this value was `False`.

In [None]:
def find_job(database, cpu, first_only=False):
    workers = []
    for name, cycles in database.items():
        if cycles >= cpu:
            workers.append(name)
            if first_only:
                break
    return workers

def find_first_increasing_jobs(database):
    candidates = {}
    for i in range(0, 100, 10):
        candidates[i] = find_job(database, i, first_only=True)
    return candidates

data = find_increasing_jobs(db)
print(data)

print('===')

data2 = find_first_increasing_jobs(db)
print(data2)

## Iterating over dictionaries

We can iterate over dictionaries by calling the dictionary's `.items()` method. It returns an iterable of tuples, with each tuple containing a key and its value:

In [None]:
prices = {
    'apple': 0.99,
    'orange': 1.29,
    'watermelon': {'one': 1.79, 'two': 2.99},
}

for key, value in prices.items():
    print(f'Key is {key}, value is {value}')

---

# Part 2: Algorithms

Let's say we have a list of numbers:

In [9]:
numbers = [10, 42, -5, 17, 23, -12, 34, 35, 8]

How do we find the largest number in the list?

The algorithm looks something like this:

1. Take the first number from the list, and assume it is the biggest.
1. Compare it to the next number in the list. If the next number is bigger, consider that the biggest.
1. Continue in this way, comparing to the next number in the list each time, until you reach the end of the list.

So we "walk" or "iterate" through the list, comparing the current "biggest number candidate" to the next number, until we reach the end of the list.

How would we write code for this algorithm in Python?

In [7]:
# Implement the biggest number algorithm

In [11]:
def find_biggest(nums):
    biggest = numbers[0]
    for n in numbers[1:]:
        if n > biggest:
            biggest = n
    return biggest

print(find_biggest(numbers))

42


Here, on line 1, we assign the first number in the list (`numbers[0]`) to the variable `biggest`. On line 2, we create a for loop for iterating over the rest of the numbers (`numbers[1:]`, starting from index 1 to the end). Remember from a previous lesson that a for loop assigns each element in the list to the loop variable (`n`) in order.

Now, taking each number `n` in order, we compare it to our candidate biggest. If it is bigger than our candidate, it becomes the new candidate for biggest (line 4).

Let's try another algorithm. How would you count the number of occurrences of each letter, number, and punctuation in a string?

In [3]:
# We use \' to "escape" the single quote,
# since we are using single quotes as the string delimiter
s = 'Hello, how are you? I\'m learning Python'

The steps are as follows:

1. Create an empty dict that will hold the mapping between character and number of occurrences.
1. Iterate over each character in the list, for each one:
    1. If there is an entry in the dict for the character, increment the entry by 1.
    1. Otherwise, create an entry in the dictionary for the character and set it to 1.
  
How would we write *this* algorithm in Python?

In [12]:
# Implement the occurrences counting algorithm

In [4]:
from pprint import pprint

def count_occurrences(string):
    answer = {}  # Empty dictionary for holding our character -> occurrences mapping
    for char in string:
        if char in answer:
            answer[char] += 1 # `foo += 1` is the same as `foo = foo + 1`
        else:
            answer[char] = 1
    return answer

pprint(count_occurrences(s))

{' ': 6,
 "'": 1,
 ',': 1,
 '?': 1,
 'H': 1,
 'I': 1,
 'P': 1,
 'a': 2,
 'e': 3,
 'g': 1,
 'h': 2,
 'i': 1,
 'l': 3,
 'm': 1,
 'n': 3,
 'o': 4,
 'r': 2,
 't': 1,
 'u': 1,
 'w': 1,
 'y': 2}


The important thing to take away from these exercises is that **an algorithm is different than the code that implements it**. The algorithm is the abstract set of steps that lead to a solution in the general case. The code *implements* the algorithm, but it's possible there are multiple ways of implementing the same algorithm. Think especially of implementing an algorithm in different programming languages. The code is of course different but the algorithm is the same.

Sometimes the output of one algorithm can be used for a different purpose, such as implementing a different algorithm. For example, consider an **isogram**, which is a word with no repeating letters or numbers, whether in a row or not. How would we use the output of our occurrences counting algorithm to create an algorithm which determines if a given string is an isogram?

In [1]:
'''
1. Go through the steps of the occurrence counting algorithm
2. Take the final dictionary, and check if any of the values are greater than 1
'''
print('(Steps described here)')

(Steps described here)


And what would the resulting code look like?

In [25]:
def is_isogram1(string):
    occurrences = count_occurrences(string)
    for v in occurrences.values():
        if v > 1:
            return False
    return True

def is_isogram2(string):
    return not any([x > 1 for x in count_occurrences(string).values()])

s = 'Tower'
print(is_isogram1(s))
print(is_isogram2(s))

True
True


These are two ways of implementing an algorithm for checking if a string is an isogram. Notice that both functions call the `count_occurrences` function (which we defined above). The output of that algorithm is the input to this algorithm. If you didn't have a `count_occurrences` function already defined, you could put the implementation "inline" inside the `is_isogram` functions: 

In [4]:
def is_isogram3(string):
    # Let's assume that this function doesn't exist
    # occurrences = count_occurrences(string)
    
    # We can include its code in our is_isogram function:
    occurrences = {}  # Empty dictionary for holding our character -> occurrences mapping
    for char in string:
        if char in occurrences:
            occurrences[char] += 1 # `foo += 1` is the same as `foo = foo + 1`
        else:
            occurrences[char] = 1

    for v in occurrences.values():
        if v > 1:
            return False
    return True

print(is_isogram3('Water'))

True


Before we move on to other algorithms, lets quickly introduce a new kind of loop. So far, we have seen the `for` loop, which iterates over a given list (or other iterable data structure), and assigns each item in the list to a "loop variable":

In [5]:
stuff = [3, 5, 2, 4]
for s in stuff:
    print(s)

3
5
2
4


There is also a construct called a `while` loop. A `while` loop continues executing the body of the loop over and over as long as the condition of the `while` loop evaluates to `True`. Here is an example:

In [6]:
x = ''
while len(x) < 30:
    x += 'hello '
print(x)

hello hello hello hello hello 


The most important thing in a `while` loop is that you have to update the data that leads to it ending. That means in the above example, the loops ends based on the length of x, and the body of the loop makes x bigger every time. If your loop body does not influence the *condition* being tested, you could end up with an **infinite loop**, which is a common programming error where you program just "hangs" and runs forever.

In [None]:
idx = 0
while idx < len(stuff):
    print(stuff[idx])
    # This is an infinite loop. It will print the item at stuff[0] forever
    # What we need is an assignment statement for `idx` so that it grows
    # every time and eventually is not less than the length of `stuff`.
    # idx += 1

Finally, let's consider an algorithm like those taught in computer science classes. Here we will look at the problem of sorting a list, in this case a list of numbers (you can also sort lists of strings, or anything that has a defined order).

Given the following input, we would like to produce the indicated output:

In [2]:
def sort(input_list):
    pass
    # The special keyword 'pass' is a placeholder. Python will not
    # allow you to have an empty block after a function definition, if
    # statement or for loop, so if you haven't figured out what goes
    # somewhere yet, or if you've commented out all of the actual statements
    # you will need to use 'pass'.

input_numbers = [10, 42, -5, 17, 23, -12, 34, 35, 8]
output_numbers = sort(input_numbers)
# [-12, -5, 8, 10, 17, 23, 34, 35, 42]

There are many (dozens) of algorithms for sorting lists. Some are more efficient than others, based on the number of steps it takes to sort the list as the list grows. The study of how many steps it takes to sort a list, or perform any algorithm, based on the size of the input is the study of [Big O Notation](https://www.freecodecamp.org/news/big-o-notation-why-it-matters-and-why-it-doesnt-1674cfa8a23c/), which is covered extensively in computer science undergrad courses, but not particularly important for a practical coder.

We will be looking at an algorithm called **insertion sort**. The general idea is to iterate through the list and build another list *inside* the list being sorted, that contains all of the numbers that we have already sorted. At each step, we grab the next value in `input_numbers` and place it in the location in the sorted list where it belongs.

The steps are as follows:

1. Assume the first number of the list is sorted, relative to itself. A list with one number can always be considered already "sorted".
2. For each remaining number in the list:
    1. Assign the variable `candidate` to the new number.
    1. Compare `candidate` to the number to its left. (So if candidate is at index `input_numbers[j]`, compare it to the value at `input_numbers[j-1]`.
    2. If it is smaller than that number, swap them
    3. Continue until you reach a number that is not smaller, or the left end of the list.
    
Let's look at the python code that implements this algorithm.

In [16]:
def sort(numbers):
    for idx in range(1, len(numbers)):
        j = idx
        while j > 0 and numbers[j - 1] > numbers[j]:
            temp = numbers[j]
            numbers[j] = numbers[j - 1]
            numbers[j - 1] = temp
            j = j - 1
        idx += 1

In [17]:
sort(input_numbers)
print(input_numbers)

[-12, -5, 8, 10, 17, 23, 34, 35, 42]


That's it for this lesson! It was expected that this might be a tough one compared to what we have covered before, so it's okay if you got a little bit lost. For extra help, be sure to come to the upcoming **review session**. Here is a [great lesson](https://www.freecodecamp.org/news/what-is-an-algorithm-definition-for-beginners/) on algorithms from Free Code Camp if you'd like to learn more.

---

# Appendix: *args and **kwargs

Python provides the special parameters `*args` and `**kwargs`, that capture all of the remaining positional (`*args`) and keyword (`**kwargs`) arguments to a function. Let's see this in practice.

In [None]:
def color_them(color, *args):
    for arg in args:
        print('%s: %s' % (color, arg))
        
color_them('red', 1, 2, 3)

The first argument, `'red'` is assigned to the argument `color`. Then the next positional arguments, as many as we want, are assigned to `args`, which is a list. Notice that when referring to `args` in the code, we omit the asterisk (`*`), which is only used in the function definition to indicate that `args` is a special variable that is capturing all of the remaining positional arguments.

We can define keyword arguments after `*args` if we like.

In [None]:
def color_them2(color, *args, print_twice=False):
    for arg in args:
        i = 1
        if print_twice:
            i = 2
        # We use a single underscore, '_', to indicate that
        # we're not using a variable. It doesn't have any
        # special meaning, it's just a convention.
        for _ in range(i):
            print('%s: %s' % (color, arg))
            
color_them2('blue', 10, 20, 30, 40, 50, print_twice=True)

What if we want to use a variable length list of `*args` to call a function that can take a variable length list of `*args`?

In [None]:
def color_with_header(color, *args):
    print('=== %s ===' % color)
    color_them(color, *args)
    
color_with_header('green', 100, 150, 200, 250)

Here, we again using the asterisk (`*`) but it has a different meaning. When we use it on line 5 above in our call to `color_them`, we are using it as the **unpacking operator**. This means, take an actual list of items, and extract each one, rather than just passing it as a list.

You may be wondering why we would use `*args` instead of just passing a single item that represents a list. We'll get back to that, promise.

In [None]:
def color_them3(color, things):
    for thing in things:
        print('%s: %s' % (color, thing))

color_them3('yellow', [3, 5, 7])

Just like we have a way to capture any variable number of positional args, we can also capture keyword args using `**kwargs`.

In [None]:
def print_prices(header, multiplier=1, **kwargs):
    print(header)
    for thing, price in kwargs.items():
        print('%s costs %s' % (thing, price * multiplier))
        
print_prices('The prices:', apple=1.29, orange=1.59, banana=0.89)

We can pass literally any valid python identifier to the `print_prices` function, and they will all be captured in the dictionary `kwargs`. Notice that there is still a positional argument (we can have as many of those as we like) and a named keyword argument (`multiplier`) that can be specified as well and will be captured outside of `kwargs` (so `multiplier` won't be part of the `kwargs` dictionary).

In [None]:
print_prices('Toy prices:', train=5.50, multiplier=2, blocks=1.00)

Like `*args`, we can use the dictionary destruction operator `**` to pass a dictionary to a function as keyword arguments.

In [None]:
def turn_the_car(direction='left', speed=30):
    print(direction, speed)
    
my_kwargs = {'direction': 'right'}
turn_the_car(**my_kwargs)

It's important to note that you can't call the function `turn_the_car` with an arbitrary destructured dictionary, because it's not set up to accept arbitrary keyword arguments.

In [None]:
my_kwargs2 = {'direction': 'up', 'brake': True, 'foo': 'bar'}
turn_the_car(**my_kwargs2)

So what's the point of all this? The main reason to capture `*args` and `**kwargs` is so that you can confidently delegate to or wrap helper functions. Let's say we had a function that performs some task. Maybe we want to print out a logging message before and after the task.

In [None]:
def perform_task(data, instruction, preference=False, num_rows=100):
    # Doesn't actually do anything, left to your imagination
    print(data, instruction, preference, num_rows)

def log_perform_task(*args, **kwargs):
    print('About to run perform_task')
    perform_task(*args, **kwargs)
    print('Done with perform_task')
    
perform_task([1,2,3], 'foo')
log_perform_task([4,5,6], 'bar', num_rows=50)

Here, what we're basicially saying is: "Whatever positional arguments and keyword arguments were passed to this function, pass those same arguments to the function we're calling". So the `*args` and `**kwargs` arguments in the definition of `log_perform_task` capture the positional and keyword arguments, which are then **destructured** and passed as the positional and keyword arguments of `perform_task`.

We could also modify or remove parameters:

In [None]:
def perform_twice_as_many_rows(*args, **kwargs):
    if 'num_rows' in kwargs:
        kwargs['num_rows'] *= 2
    perform_task(*args, **kwargs)
    
perform_twice_as_many_rows([1,2,3], 'foo', num_rows=500)

We could have also explicitly defined the necessary parameters for our utility function:

In [None]:
def log_perform_task_worse(data, instruction, preference=False, num_rows=100):
    print('About to run perform_task')
    perform_task(data, instruction, preference=preference, num_rows=num_rows)
    print('Done with perform_task')

The problem with that approach is that we have to update all of our utility functions (and we already have two of them!) whenever the definition of `perform_task` updates. So if we add a new parameter to `perform_task`, the function `log_perfrom_task_worse` will also need to be updated.

In [None]:
def perform_task(data, instruction, preference=False, num_rows=100, capture=True):
    print(data, instruction, preference, num_rows, capture)
    
def log_perform_task_worse(data, instruction, preference=False, num_rows=100, capture=True):
    print('About to run perform_task')
    perform_task(data, instruction, preference=preference, num_rows=num_rows, capture=capture)
    print('Done with perform_task')

Instead, the `*args`/`**kwargs` approach let's us basically say "We don't care what the arguments to the delegated function are, pass them".