# STA 141B Data & Web Technologies for Data Analysis

### Lecture 2, 10/3/23, Basics of Python

### Announcements 

- Wait list. 
- HW 1 is online and due October 13, 2023 by 11:59 PM
- [HackDavis](https://hackdavis.io/)! 
- Office hours: 
    * Peter Kramlinger	R	10:45-11:45 AM
    * Xiangbo Mo	T	7-8 PM ([Zoom](https://ucdstats.zoom.us/j/96433938896?pwd=cmFiTVN6RzI4TDRaVnU5aVFxWVlqZz09))
    * Jingwei Xiong	F	1-2 PM ([Zoom](https://canvas.ucdavis.edu/courses/823714/external_tools/8022))

### Last week's topics

- Course Organization
- Basics of Python

#### Types
- Numeric: <kbd>int</kbd>, <kbd>floats</kbd>, <kbd>complex</kbd>
- Boolean: <kbd>bool</kbd>
- String: <kbd>str</kbd>
- Sequence: <kbd>list</kbd>, <kbd>tuple</kbd>, <kbd>range</kbd>
- Mapping: <kbd>dict</kbd>

In [48]:
x = {"key": "value", 5: "value", 5.0: 4}

In [52]:
5.0 == 5

True

In [30]:
x

[3, 2, 5, 2.5, 4]

In [33]:
type(x)

tuple

__Adhere to the principles of proper programming!__

- K.I.S.S. (Keep It Simple, Stupid): Functions should perform one task, and one task only. 
- Rule of Three (avoid code duplication): Duplication is a bad programming habit because it makes code harder to maintain. 
- Clarity before Efficiency: Never sacrifice clarity for some perceived efficiency. Donald Knuth: "Premature optimization is the root of all evil."
- Naming: Stick to consistency and conventions. 

### Today's topics

<style>
    font-size: 40x;
</style>

- Basics of Python (cont.)

#### Set

A <kbd>set</kbd> is an unordered collection of unique items. It is instantiated with curly brackets. Since the items are unique, they must be inmutable!

In [53]:
x = {"apple", True, 2} # display order changed, they are unordered
x

{2, True, 'apple'}

In [55]:
{"apple", (2,3), 2}

{(2, 3), 2, 'apple'}

Sets are unordered. Hence, they do not support indexing. 

In [56]:
x[1]    

TypeError: 'set' object is not subscriptable

In [58]:
x.add(4)
x

{2, 4, True, 'apple', 'new item'}

In [62]:
x.add("new item") # the items are unique
x

{2, 4, True, 'apple', 'new item'}

In [63]:
x.remove("new item")
x

{2, 4, True, 'apple'}

#### Functions

We have defined functions already in the previous lecture. The function name follows `def`, and an optional return argument is passed via `return`. 

In [68]:
def myfun(x): 
    return x**2

In [69]:
myfun(3)

9

Default values for arguments are passed in the function definition: 

In [70]:
def myfun(x, n = 2): 
    return x**n

In [71]:
myfun(3)

9

In [72]:
myfun(3,2)

9

In [75]:
myfun(n=3, x=3)

27

A well-written function contains a *docstring* that explains what the function does: 

In [86]:
def myfun(x, n = 2): 
    '''Takes in a number x, returns the n-th power of n
    
    I can not add another line'''
    return x**n

In [87]:
help(myfun)

Help on function myfun in module __main__:

myfun(x, n=2)
    Takes in a number x, returns the n-th power of n
    
    I can not add another line



(Short) anonymous functions in Python are calles *lambda expressions*. They can be used when function objects are required, e.g., when a function is to evaluate comprehension (see below). 

In [None]:
def make_power(n): 
    return lambda x, m = 2: x**n + m

In [99]:
def make_power2(n):
    def f(x, m=2): x**n + m
    return f

In [90]:
f2 = make_power(2)
type(f2)

function

In [None]:
g = make_power(2)

In [103]:
type(g)

function

In the example below, the lambda expression ensures that the ordering is on the item value, not the key value!

In [104]:
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
type(pairs)

list

In [105]:
type(pairs[0])

tuple

In [107]:
pairs.sort(key=lambda pair: pair[1])
pairs

[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]

In [108]:
def mysort(pair): 
    return pair[1]
pairs.sort(key=mysort)
pairs

[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]

In [109]:
lst = ['one', 'two', 'three', 'four']
lst.sort()
lst

['four', 'one', 'three', 'two']

#####  `if`

Python's `if` statement allows us to change the behavior of our code depending on whether a condition is met. Conditions must be Boolean expressions (<kbd>bool</kbd>).

Indentation determines whether code is inside or outside of a control flow statement! Be careful to get it right!

In [110]:
x = 1.0
if x > 10:
    print("x is greater than 10")
elif x == 1:
    print("x is one!")
else:
    print("x is less than or equal to 10, and not 1")


x is one!


#####  `for`

Python's `for` loop allows us to iterate over elements of a string, tuple, list, or other object.

Objects that can be iterated over are iterable. We'll learn more about iterables later.

In [113]:
for i in range(1, 4, 1):
    print(i)

1
2
3


In [127]:
# A weird way to convert to lowercase that shows a non-trivial loop:
for letter in 'STA 141B':
    # Computers compute on numbers, so each letter is represented by a number in memory.
    # ord() gets the number that represents a letter
    num = ord(letter)
    if 65 <= num <= 90: # A-Z are represented by 65-90
        # a-z are represented by 97-122, so a 32 number offset
        new_letter = num + 32
        # chr() converts a number that represents a letter back to the letter
        new_letter = chr(new_letter)
    else:
        new_letter = letter
        
    print(new_letter, end = "") # replaces default line break at end

sta 141b

In [126]:
ord(' ')

32

In [124]:
chr(65 + 32)

'a'

In [129]:
# In practice, we can just use a built-in method to convert to lowercase
'STA 141B'.lower()

'sta 141b'

In [131]:
'sta 141b'.upper()
# Behind the scenes, .lower() is implemented in pretty much the same way as our loop above.

'STA 141B'

### Iterables

The four most important methods to repeat code for identical or similar tasks are:

 1. Loops (`while` and `for`)
 2. Recurson
 3. Comprehensions, Generators, and `map()`
 4. Vectorization (`NumPy` arrays and functions)
    
These methods have tradeoffs. In general:

 1. Loops are the most flexible -- particularly `while` loops
 2. Complicated code and suscebtible to infinite recursion
 3. Generators tend to use the least memory
 4. Vectorization tends to be fastest 

#### 1. Loop tips and tricks

An iteratable object is a object that can be iterated over, element-by-element, like <kbd>tuple</kbd>, <kbd>list</kbd>, <kbd>range</kbd>, <kbd>string</kbd>.

Python's `for`-loops can automatically retrieve elements from iterable objects.

In [132]:
# bad code
x = 'hello'
for i in [0, 1, 2, 3, 4]:
    print(x[i], end = '')

hello

In [134]:
# good code
for x in 'hello':
    print(x, end = '') # we can use .index method for strings! 

hello

In [136]:
'hello'.index('l')

2

You can use `list` to recast <kbd>range</kbd> objects to <kbd>list</kbd> objects. As we have already established, this is computationally intensive and should generally avoided. You may only need to do this for visual inspection. 

In [137]:
list(range(5))

[0, 1, 2, 3, 4]

You can make the keys and values in a <kbd>dict</kbd> objtect iterable with the `items()` method.

In [138]:
x = {'hello': 1, "goodbye": 2}

for i in x:
    print(i, x[i])

hello 1
goodbye 2


In [139]:
x.items()

dict_items([('hello', 1), ('goodbye', 2)])

In [140]:
for key, val in x.items():
    print(key, val)

hello 1
goodbye 2


*Zipping* two sequences together means combining them into a <kbd>list</kbd> objtect of <kbd>tuble</kbd> objtects where:

- The first element of each tuple is an element from the first sequence
- The second element of each tuple is an element from the second sequence

Usually it only makes sense to zip sequences that are the same length.

The `zip` function zips two or more sequences. Use it to iterate over multiple sequences at the same time.

In [142]:
x = [1, 2, 3, 4]
y = 'days'

In [None]:
z = zip(x, y)
type(z)

In [144]:
z

<zip at 0x7fee000d6080>

In [145]:
list(z)

[(1, 'd'), (2, 'a'), (3, 'y'), (4, 's')]

In [146]:
x = [1, 2, 3]
y = 'day'

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

1 d
2 a
3 y


In [147]:
list(zip(x, y, [7, 8, 9]))

[(1, 'd', 7), (2, 'a', 8), (3, 'y', 9)]

In [149]:
x = [1, 2, 3]
y = [4, 5]
list(zip(x, y))

[(1, 4), (2, 5)]

In [150]:
for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

1 4
2 5


The `enumerate` function zips together index numbers and a sequence. In other words, the function enumerates a sequence.

In [153]:
# If you absolutely must use index numbers, at least use enumerate() to get them
x = 'hello'

enumerate(x)

<enumerate at 0x7fee000dc680>

In [154]:
x.index('l')

2

In [152]:
list(enumerate(x))

[(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')]

In [155]:
for i, x_elt in enumerate(x):
    print("Position", i, "is", x_elt)

Position 0 is h
Position 1 is e
Position 2 is l
Position 3 is l
Position 4 is o


#### 2. Recursion

A recursion occurs if a function calls itself. It is useful for iterative processes. 

In [162]:
def factorial(n): 
    '''This function computes the factorial of n via recursion.'''
    if n <= 0: 
        return 1
    else: 
        recurse = factorial(n-1)
        result = n * recurse
        return result

In [159]:
help(factorial)

Help on function factorial in module __main__:

factorial(n)
    This function computes the factorial of n via recursion.



In [160]:
factorial(3) # 1 * 1 * 2 * 3

6

Here, infinite recursion is can occur. Luckily, my Python interpreter guards against it.  

In [163]:
factorial(4.3)

12.728429999999989

#### 3. Comprehensions and generators

A comprehension is a Python expression that transforms a sequence, element-by-element.

In [165]:
[x**2 for x in range(5)]

[0, 1, 4, 9, 16]

In [169]:
x = [4, 3, 1]

In [171]:
[y + 12 for y in x]

[16, 15, 13]

Think of this as Pythons `lapply`. You can include a condition in a comprehension:

In [172]:
# Get all squares of even numbers from 0...10
# [x for x in Z if W]

y = [x**2 for x in range(11) if x % 2 == 0]
y

[0, 4, 16, 36, 64, 100]

You can also iterate over subelements.

In [None]:
x = [[1, 2, 3], [4, 5, 6]] # print 1, 2, 3, 4, 5, 6

In [None]:
# somewhat clumsy
for sublist in x:
    for elt in sublist:
        print(elt)

In [None]:
[y for sublist in x for y in sublist]

Be aware that `sublist in x` is the top loop and subloops are right thereof. In other words, the outermost iterables always come first in the comprehension.

A comprehension surrounded by `[ ]` is called a list comprehension and produces a <kbd>list</kbd>. A comprehension surrounded by `{ }` and including `:` is called a dictionary comprehension and produces a <kbd>dict</kbd>. Else it is called set comprehension. 

In [None]:
x = ["hello", "goodbye"]

lens = {len(name): (name) for name in x} # print the length of names
lens

Remember that <kbd>dict</kbd> does not support equal keys and <kbd>set</kbd> does not support equal items, but <kbd>list</kbd> does. 

In [None]:
{x**2 for x in [-1, 0, 1]} # set # uniqueness of sets is checked with ==, not is

There's no such thing as a tuple comprehension. Instead, a comprehension surrounded by `( )` is called a generator expression.

In [None]:
y = (x**2 for x in range(1001) if x % 2 == 0)
type(y)

In [None]:
import sys
sys.getsizeof(y)

In [None]:
sys.getsizeof([x**2 for x in range(1001) if x % 2 == 0]) # produces a list, i.e., is evaluated

Operating on a generator forces its evaluation. 

In [None]:
sum(y)

This code does not produce any sensible result, because *a generator can only be used once*. Once iterated through, it is exhausted. Since this saves memory it is *much* more efficient than <kbd>list</kbd>.

In [None]:
for i in y:
    print(i, end=" ")

In [None]:
y = (x**2 for x in range(101) if x % 2 == 0)

In [None]:
for i in y:
    print(i, end=" ")

 The economics of memory show when we time operations. 

In [None]:
import timeit

In [None]:
print(timeit.timeit('''list_com = [i for i in range(100) if i % 2 == 0]''', number=1000000))
print(timeit.timeit('''gen_exp = (i for i in range(100) if i % 2 == 0)''', number=1000000))

A generator is a special kind of iterable which computes its elements on demand. Examples are ranges and generator expressions. 
Generators are especially useful for working with data that are __too large__ to fit in memory. While making a huge list (say $10^9$ elements) might use enough memory to crash Python, making a generator with the same number of elements uses almost no memory. See more examples [here](https://zacks.one/python-generators/). 

Python's `itertools` module has functions for manipulating generators and iterable objects