# STA 141B Data & Web Technologies for Data Analysis

### Lecture 2, 10/3/23, Basics of Python

### Announcements 

- HW 1 is online and due October 13, 2023 by 11:59 PM
- [HackDavis](https://hackdavis.io/)! 

### Last week's topics

- Course Organization
- Basics of Python

#### Types
- Numeric: <kbd>int</kbd>, <kbd>floats</kbd>, <kbd>complex</kbd>
- Boolean: <kbd>bool</kbd>
- String: <kbd>str</kbd>
- Sequence: <kbd>list</kbd>, <kbd>tuple</kbd>, <kbd>range</kbd>
- Mapping: <kbd>dict</kbd>

__Adhere to the principles of proper programming!__

- K.I.S.S. (Keep It Simple, Stupid): Functions should perform one task, and one task only. 
- Rule of Three (avoid code duplication): Duplication is a bad programming habit because it makes code harder to maintain. 
- Clarity before Efficiency: Never sacrifice clarity for some perceived efficiency. Donald Knuth: "Premature optimization is the root of all evil."
- Naming: Stick to consistency and conventions. 

### Today's topics

<style>
    font-size: 40x;
</style>

- Basics of Python (cont.)

#### Set

A <kbd>set</kbd> is an unordered collection of unique items. It is instantiated with curly brackets. Since the items are unique, they must be inmutable!

In [None]:
x = {"apple", True, 2} # display order changed, they are unordered
x

In [None]:
{"apple", [2,3], 2}

Sets are unordered. Hence, they do not support indexing. 

In [None]:
x[1] 

In [None]:
x.add("new item")
x

In [None]:
x.add("new item") # the items are unique
x

In [None]:
x.remove("new item")
x

#### Functions

We have defined functions already in the previous lecture. The function name follows `def`, and an optional return argument is passed via `return`. 

In [None]:
def myfun(x): 
    return x**2

In [None]:
myfun(3)

Default values for arguments are passed in the function definition: 

In [None]:
def myfun(x, n = 2): 
    return x**n

In [None]:
myfun(3)

In [None]:
myfun(3,2)

In [None]:
myfun(3,3)

A well-written function contains a *docstring* that explains what the function does: 

In [None]:
def myfun(x, n = 2): 
    '''Takes in a number x, returns the n-th power of n'''
    return x**n

In [None]:
help(myfun)

(Short) anonymous functions in Python are calles *lambda expressions*. They can be used when function objects are required, e.g., when a function is to evaluate comprehension (see below). 

In [None]:
def make_power(n): 
    return lambda x, m = 2: x**n + m

In [None]:
f2 = make_power(2)
type(f2)

In [None]:
f2(2, 1)

In the example below, the lambda expression ensures that the ordering is on the item value, not the key value!

In [None]:
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
type(pairs)

In [None]:
type(pairs[0])

In [None]:
pairs.sort(key=lambda pair: pair[1])
pairs

In [None]:
def mysort(pair): 
    return pair[1]
pairs.sort(key=mysort)
pairs

In [None]:
lst = ['one', 'two', 'three', 'four']
lst.sort()
lst

#####  `if`

Python's `if` statement allows us to change the behavior of our code depending on whether a condition is met. Conditions must be Boolean expressions (<kbd>bool</kbd>).

Indentation determines whether code is inside or outside of a control flow statement! Be careful to get it right!

In [None]:
x = 1
if x > 10:
    print("x is greater than 10")
elif x == 1:
    print("x is one!")
else:
    print("x is less than or equal to 10, and not 1")


#####  `for`

Python's `for` loop allows us to iterate over elements of a string, tuple, list, or other object.

Objects that can be iterated over are iterable. We'll learn more about iterables later.

In [None]:
for i in [1, 2, 3]:
    print(i)

In [None]:
# A weird way to convert to lowercase that shows a non-trivial loop:
for letter in 'STA 141B':
    # Computers compute on numbers, so each letter is represented by a number in memory.
    # ord() gets the number that represents a letter
    num = ord(letter)
    if 65 <= num <= 90: # A-Z are represented by 65-90
        # a-z are represented by 97-122, so a 32 number offset
        new_letter = num + 32
        # chr() converts a number that represents a letter back to the letter
        new_letter = chr(new_letter)
    else:
        new_letter = letter
        
    print(new_letter, end = "") # replaces default line break at end

In [None]:
ord('T')

In [None]:
chr(88)

In [None]:
# In practice, we can just use a built-in method to convert to lowercase
'STA 141B'.lower()
'sta 141b'.upper()
# Behind the scenes, .lower() is implemented in pretty much the same way as our loop above.

### Iterables

The four most important methods to repeat code for identical or similar tasks are:

 1. Loops (`while` and `for`)
 2. Recurson
 3. Comprehensions, Generators, and `map()`
 4. Vectorization (`NumPy` arrays and functions)
    
These methods have tradeoffs. In general:

 1. Loops are the most flexible -- particularly `while` loops
 2. Complicated code and suscebtible to infinite recursion
 3. Generators tend to use the least memory
 4. Vectorization tends to be fastest 

#### 1. Loop tips and tricks

An iteratable object is a object that can be iterated over, element-by-element, like <kbd>tuple</kbd>, <kbd>list</kbd>, <kbd>range</kbd>, <kbd>string</kbd>.

Python's `for`-loops can automatically retrieve elements from iterable objects.

In [None]:
# bad code
x = 'hello'
for i in [0, 1, 2, 3, 4]:
    print(x[i], end = '')

In [None]:
# good code
for x in 'hello':
    print(x, end = '') # we can use .index method for strings! 

You can use `list` to recast <kbd>range</kbd> objects to <kbd>list</kbd> objects. As we have already established, this is computationally intensive and should generally avoided. You may only need to do this for visual inspection. 

In [None]:
list(range(5))

You can make the keys and values in a <kbd>dict</kbd> objtect iterable with the `items()` method.

In [None]:
x = {'hello': 1, "goodbye": 2}

for i in x:
    print(i, x[i])

In [None]:
x.items()

In [None]:
for key, val in x.items():
    print(key, val)

*Zipping* two sequences together means combining them into a <kbd>list</kbd> objtect of <kbd>tuble</kbd> objtects where:

- The first element of each tuple is an element from the first sequence
- The second element of each tuple is an element from the second sequence

Usually it only makes sense to zip sequences that are the same length.

The `zip` function zips two or more sequences. Use it to iterate over multiple sequences at the same time.

In [None]:
z = zip(x, y)
type(z)

In [None]:
list(z)

In [None]:
x = [1, 2, 3]
y = [4, 5, 6]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

In [None]:
list(zip(x, y, [7, 8, 9]))

In [None]:
x = [1, 2, 3]
y = [4, 5]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

The `enumerate` function zips together index numbers and a sequence. In other words, the function enumerates a sequence.

In [None]:
# If you absolutely must use index numbers, at least use enumerate() to get them
x = 'hello'

enumerate(x)
list(enumerate(x))

In [None]:
for i, x_elt in enumerate(x):
    print("Position", i, "is", x_elt)

#### 2. Recursion

A recursion occurs if a function calls itself. It is useful for iterative processes. 

In [None]:
def factorial(n): 
    '''This function computes the factorial of n via recursion.'''
    if n == 0: 
        return 1
    else: 
        recurse = factorial(n-1)
        result = n * recurse
        return result

In [None]:
help(factorial)

In [None]:
factorial(3) 

Here, infinite recursion is can occur. Luckily, my Python interpreter guards against it.  

In [None]:
factorial(4.3)

#### 3. Comprehensions and generators

A comprehension is a Python expression that transforms a sequence, element-by-element.

In [None]:
[x**2 for x in range(5)]

Think of this as Pythons `lapply`. You can include a condition in a comprehension:

In [None]:
# Get all squares of even numbers from 0...10
# [x for x in Z if W]

x = [x**2 for x in range(11) if x % 2 == 0]
x

You can also iterate over subelements.

In [None]:
x = [[1, 2, 3], [4, 5, 6]] # print 1, 2, 3, 4, 5, 6

In [None]:
# somewhat clumsy
for sublist in x:
    for elt in sublist:
        print(elt)

In [None]:
[y for sublist in x for y in sublist]

Be aware that `sublist in x` is the top loop and subloops are right thereof. In other words, the outermost iterables always come first in the comprehension.

A comprehension surrounded by `[ ]` is called a list comprehension and produces a <kbd>list</kbd>. A comprehension surrounded by `{ }` and including `:` is called a dictionary comprehension and produces a <kbd>dict</kbd>. Else it is called set comprehension. 

In [None]:
x = ["hello", "goodbye"]

lens = {len(name): (name) for name in x} # print the length of names
lens

Remember that <kbd>dict</kbd> does not support equal keys and <kbd>set</kbd> does not support equal items, but <kbd>list</kbd> does. 

In [None]:
{x**2 for x in [-1, 0, 1]} # set # uniqueness of sets is checked with ==, not is

There's no such thing as a tuple comprehension. Instead, a comprehension surrounded by `( )` is called a generator expression.

In [None]:
y = (x**2 for x in range(1001) if x % 2 == 0)
type(y)

In [None]:
import sys
sys.getsizeof(y)

In [None]:
sys.getsizeof([x**2 for x in range(1001) if x % 2 == 0]) # produces a list, i.e., is evaluated

Operating on a generator forces its evaluation. 

In [None]:
sum(y)

This code does not produce any sensible result, because *a generator can only be used once*. Once iterated through, it is exhausted. Since this saves memory it is *much* more efficient than <kbd>list</kbd>.

In [None]:
for i in y:
    print(i, end=" ")

In [None]:
y = (x**2 for x in range(101) if x % 2 == 0)

In [None]:
for i in y:
    print(i, end=" ")

 The economics of memory show when we time operations. 

In [None]:
import timeit

In [None]:
print(timeit.timeit('''list_com = [i for i in range(100) if i % 2 == 0]''', number=1000000))
print(timeit.timeit('''gen_exp = (i for i in range(100) if i % 2 == 0)''', number=1000000))

A generator is a special kind of iterable which computes its elements on demand. Examples are ranges and generator expressions. 
Generators are especially useful for working with data that are __too large__ to fit in memory. While making a huge list (say $10^9$ elements) might use enough memory to crash Python, making a generator with the same number of elements uses almost no memory. See more examples [here](https://zacks.one/python-generators/). 

Python's `itertools` module has functions for manipulating generators and iterable objects