# STA 141B Data & Web Technologies for Data Analysis

### Lecture 2, 1/10/24, Basics of Python

### Announcements
- Class size will not be increased. 
- First homework public tomorrow, to be due in two weeks. 

### Today's topics

<style>
    font-size: 40x;
</style>

- Basics of Python (cont.)

#### Set

A <kbd>set</kbd> is an unordered collection of unique items. It is instantiated with curly brackets. Since the items are unique, they must be inmutable!

In [2]:
x = {"apple", True, 2} # display order changed, they are unordered
x

{2, True, 'apple'}

In [3]:
x[0]

TypeError: 'set' object is not subscriptable

In [4]:
{"apple", [2,3], 2}

TypeError: unhashable type: 'list'

Sets are unordered. Hence, they do not support indexing. 

In [None]:
x[1] 

In [12]:
x.add("new item")
x

{2, True, 'apple', 'new item'}

In [8]:
x.add('''new item''') # the items are unique
x

{2, True, 'apple', 'new item'}

In [13]:
x.remove("new item")

In [17]:
y = [23, 4, 5, 4]
y.pop()

4

In [16]:
y

[23, 4, 5]

#### Functions

We have defined functions already in the previous lecture. The function name follows `def`, and an optional return argument is passed via `return`. 

In [18]:
def myfun(x): 
    return x**2

In [19]:
myfun(3)

9

Default values for arguments are passed in the function definition: 

In [None]:
def myfun(x, n = 2): 
    return x**n

In [41]:
myfun = lambda x, n = 2: x**n

In [42]:
myfun(3)

9

In [43]:
myfun(3,2)

9

In [44]:
myfun(3,3)

27

A well-written function contains a *docstring* that explains what the function does: 

In [39]:
def myfun(x, n = 2): 
    '''
    Takes in a number x, returns the n-th power of n
    '''
    return x**n # make additional comments

In [40]:
help(myfun)

Help on function myfun in module __main__:

myfun(x, n=2)
    Takes in a number x, returns the n-th power of n



(Short) anonymous functions in Python are calles *lambda expressions*. They can be used when function objects are required, e.g., when a function is to evaluate comprehension (see below). 

In [45]:
def make_power(n): 
    return lambda x, m = 2: x**n + m

In [46]:
f2 = make_power(2)
type(f2)

function

In [48]:
make_power(2)(2, 1)

5

In the example below, the lambda expression ensures that the ordering is on the item value, not the key value!

In [200]:
pairs = ['onzXfasdf', [3, 'two'], {'three', 2}, (4, 'four')]
type(pairs)

list

In [202]:
{'three', 2}[0]

  {'three', 2}[0]


TypeError: 'set' object is not subscriptable

In [87]:
tup = pairs[0]
isinstance(tup[0], int)

True

In [201]:
pairs.sort(key=lambda tup: tup[0] if isinstance(tup[0], str) else tup[1])
pairs

TypeError: 'set' object is not subscriptable

In [68]:
def _(pair): 
    return pair[0]
pairs.sort(key=_)
pairs

[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]

In [None]:
lst = ['one', 'two', 'three', 'four']
lst.sort()
lst

#####  `if`

Python's `if` statement allows us to change the behavior of our code depending on whether a condition is met. Conditions must be Boolean expressions (<kbd>bool</kbd>).

Indentation determines whether code is inside or outside of a control flow statement! Be careful to get it right!

In [75]:
x = 1
if x > 10:
    print("x is greater than 10")
elif x == 1:
    pass
else: 
    print("x is less than or equal to 10, and not 1")


#####  `for`

Python's `for` loop allows us to iterate over elements of a string, tuple, list, or other object.

Objects that can be iterated over are iterable. We'll learn more about iterables later.

In [95]:
{2, 1, 3} == {1, 2, 3}

True

In [134]:
def f(): 
    global count
    count += 1

In [135]:
count = 0

In [139]:
f()
count

4

In [140]:
for i in {1, 2, 3}:
    print(i)

1
2
3


In [102]:
# A weird way to convert to lowercase that shows a non-trivial loop:
for letter in 'StA 141B':
    # Computers compute on numbers, so each letter is represented by a number in memory.
    # ord() gets the number that represents a letter
    num = ord(letter)
    if 65 <= num <= 90: # A-Z are represented by 65-90
        # a-z are represented by 97-122, so a 32 number offset
        new_letter = num + 32
        # chr() converts a number that represents a letter back to the letter
        new_letter = chr(new_letter)
    else:
        new_letter = letter
        
    print(new_letter, end = "") # replaces default line break at end

sta 141b

In [96]:
ord('T')

84

In [98]:
chr(84 + 32)

't'

In [103]:
# In practice, we can just use a built-in method to convert to lowercase
'STA 141B'.lower()
'sta 141b'.upper()
# Behind the scenes, .lower() is implemented in pretty much the same way as our loop above.

'STA 141B'

### Iterables

The four most important methods to repeat code for identical or similar tasks are:

 1. Loops (`while` and `for`)
 2. Recurson
 3. Comprehensions, Generators, and `map()`
 4. Vectorization (`NumPy` arrays and functions)
    
These methods have tradeoffs. In general:

 1. Loops are the most flexible -- particularly `while` loops
 2. Complicated code and suscebtible to infinite recursion
 3. Generators tend to use the least memory
 4. Vectorization tends to be fastest 

#### 1. Loop tips and tricks

An iteratable object is a object that can be iterated over, element-by-element, like <kbd>tuple</kbd>, <kbd>list</kbd>, <kbd>range</kbd>, <kbd>string</kbd>.

Python's `for`-loops can automatically retrieve elements from iterable objects.

In [141]:
# bad code
x = 'hello'
for i in [0, 1, 2, 3, 4]:
    print(x[i], end = '')

hello

In [None]:
# good code
for x in 'hello':
    print(x, end = '') # we can use .index method for strings! 

In [154]:
[i for i in enumerate('hello')]

[(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')]

In [158]:
for i, v in enumerate('hello'): 
    print(str(i) + v)

0h
1e
2l
3l
4o


You can use `list` to recast <kbd>range</kbd> objects to <kbd>list</kbd> objects. As we have already established, this is computationally intensive and should generally avoided. You may only need to do this for visual inspection. 

In [145]:
list(range(5))

[0, 1, 2, 3, 4]

In [144]:
range(5)

range(0, 5)

You can make the keys and values in a <kbd>dict</kbd> objtect iterable with the `items()` method.

In [146]:
x = {'hello': 1, "goodbye": 2}

for i in x:
    print(i, x[i])

hello 1
goodbye 2


In [147]:
x.items()

dict_items([('hello', 1), ('goodbye', 2)])

In [148]:
for key, val in x.items():
    print(key, val)

hello 1
goodbye 2


*Zipping* two sequences together means combining them into a <kbd>list</kbd> objtect of <kbd>tuble</kbd> objtects where:

- The first element of each tuple is an element from the first sequence
- The second element of each tuple is an element from the second sequence

Usually it only makes sense to zip sequences that are the same length.

The `zip` function zips two or more sequences. Use it to iterate over multiple sequences at the same time.

In [180]:
print(x)
print(y)

{'hello': 1, 'goodbye': 2}
['four', 'one', 'three', 'two']


In [181]:
len(y)

4

In [186]:
z = zip(x, y)

In [188]:
list(z)

[]

In [190]:
y

['four', 'one', 'three', 'two']

In [189]:
list(enumerate(y))

[(0, 'four'), (1, 'one'), (2, 'three'), (3, 'two')]

In [192]:
list(zip(range(len(y)), y))

[(0, 'four'), (1, 'one'), (2, 'three'), (3, 'two')]

[('hello', 'four'), ('goodbye', 'one')]

In [None]:
x = [1, 2, 3]
y = [4, 5, 6]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

In [None]:
list(zip(x, y, [7, 8, 9]))

In [None]:
x = [1, 2, 3]
y = [4, 5]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

The `enumerate` function zips together index numbers and a sequence. In other words, the function enumerates a sequence.

In [None]:
# If you absolutely must use index numbers, at least use enumerate() to get them
x = 'hello'

enumerate(x)
list(enumerate(x))

In [None]:
for i, x_elt in enumerate(x):
    print("Position", i, "is", x_elt)

#### 2. Recursion

A recursion occurs if a function calls itself. It is useful for iterative processes. 

In [None]:
def factorial(n): 
    '''This function computes the factorial of n via recursion.'''
    if n == 0: 
        return 1
    else: 
        recurse = factorial(n-1)
        result = n * recurse
        return result

In [None]:
help(factorial)

In [None]:
factorial(3) 

Here, infinite recursion is can occur. Luckily, my Python interpreter guards against it.  

In [None]:
factorial(4.3)

#### 3. Comprehensions and generators

A comprehension is a Python expression that transforms a sequence, element-by-element.

In [None]:
[x**2 for x in range(5)]

Think of this as Pythons `lapply`. You can include a condition in a comprehension:

In [None]:
# Get all squares of even numbers from 0...10
# [x for x in Z if W]

x = [x**2 for x in range(11) if x % 2 == 0]
x

You can also iterate over subelements.

In [None]:
x = [[1, 2, 3], [4, 5, 6]] # print 1, 2, 3, 4, 5, 6

In [None]:
# somewhat clumsy
for sublist in x:
    for elt in sublist:
        print(elt)

In [None]:
[y for sublist in x for y in sublist]

Be aware that `sublist in x` is the top loop and subloops are right thereof. In other words, the outermost iterables always come first in the comprehension.

A comprehension surrounded by `[ ]` is called a list comprehension and produces a <kbd>list</kbd>. A comprehension surrounded by `{ }` and including `:` is called a dictionary comprehension and produces a <kbd>dict</kbd>. Else it is called set comprehension. 

In [None]:
x = ["hello", "goodbye"]

lens = {len(name): (name) for name in x} # print the length of names
lens

Remember that <kbd>dict</kbd> does not support equal keys and <kbd>set</kbd> does not support equal items, but <kbd>list</kbd> does. 

In [None]:
{x**2 for x in [-1, 0, 1]} # set # uniqueness of sets is checked with ==, not is

There's no such thing as a tuple comprehension. Instead, a comprehension surrounded by `( )` is called a generator expression.

In [None]:
y = (x**2 for x in range(1001) if x % 2 == 0)
type(y)

In [None]:
import sys
sys.getsizeof(y)

In [None]:
sys.getsizeof([x**2 for x in range(1001) if x % 2 == 0]) # produces a list, i.e., is evaluated

Operating on a generator forces its evaluation. 

In [None]:
sum(y)

This code does not produce any sensible result, because *a generator can only be used once*. Once iterated through, it is exhausted. Since this saves memory it is *much* more efficient than <kbd>list</kbd>.

In [None]:
for i in y:
    print(i, end=" ")

In [None]:
y = (x**2 for x in range(101) if x % 2 == 0)

In [None]:
for i in y:
    print(i, end=" ")

 The economics of memory show when we time operations. 

In [None]:
import timeit

In [None]:
print(timeit.timeit('''list_com = [i for i in range(100) if i % 2 == 0]''', number=1000000))
print(timeit.timeit('''gen_exp = (i for i in range(100) if i % 2 == 0)''', number=1000000))

A generator is a special kind of iterable which computes its elements on demand. Examples are ranges and generator expressions. 
Generators are especially useful for working with data that are __too large__ to fit in memory. While making a huge list (say $10^9$ elements) might use enough memory to crash Python, making a generator with the same number of elements uses almost no memory. See more examples [here](https://zacks.one/python-generators/). 

Python's `itertools` module has functions for manipulating generators and iterable objects