##### STA 141B Data & Web Technologies for Data Analysis

## Lecture 2, 1/8/26, Basics of Python

## Today's topics

<style>
    font-size: 40x;
</style>

- Basics of Python (cont.)

## Discussion

The discussion sections take place on Wednesday, 2:10 - 3:00 PM and 3:10 - 4:00 PM at the Olson Hall 147.

This week's topics are:
- Install Python
- Install Jupyter notebook
- Introduction to Jupyter notebook

## Basics of Python (cont.)

### Sets

A <kbd>set</kbd> is an unordered collection of unique items. It is instantiated with curly brackets. Since the items are unique, they must be inmutable!

In [None]:
x = {"apple", True, 2} # display order changed, they are unordered
x

In [None]:
x[0]

In [None]:
{"apple", [2,3], 2}

Sets are unordered. Hence, they do not support indexing. 

In [None]:
x[1] 

In [None]:
x.add("new item")
x

In [None]:
x.add('''new item''') # the items are unique
x

In [None]:
x.remove("new item")

In [None]:
y = [23, 4, 5, 4]
y.pop()

In [None]:
y

### Functions

We have defined functions already in the previous lecture. The function name follows `def`, and an optional return argument is passed via `return`. 

In [13]:
def myfun(x):
    return x**2

In [14]:
myfun(3)

9

Default values for arguments are passed in the function definition: 

In [15]:
def myfun(x, n = 2): 
    return x**n

In [16]:
myfun(3)

9

In [17]:
myfun(3,3)

27

In [18]:
myfun(3, n=3)

27

In [19]:
myfun(n=3, x=2)

8

In [21]:
myfun(n=3, x=4)

64

In [30]:
def myfun(x,n=2):
    # do something
    return(x**n)
    

In [25]:
polynomials = [lambda x: x**i for i in range(5)]

In [26]:
type(polynomials)

list

In [27]:
type(polynomials[0])

function

In [29]:
myfun = lambda x, n = 2: x**n

TypeError: can only concatenate list (not "int") to list

In [23]:
type(myfun)

function

In [None]:
myfun(3,3)

A well-written function contains a *docstring* that explains what the function does: 

In [33]:
def some_other_fun(x):
    pass

In [34]:
help(some_other_fun)

Help on function some_other_fun in module __main__:

some_other_fun(x)



In [31]:
def myfun(x, n = 2): 
    '''
    Takes in a number x, returns the n-th power of n
    '''
    return x**n # make additional comments

In [32]:
help(myfun)

Help on function myfun in module __main__:

myfun(x, n=2)
    Takes in a number x, returns the n-th power of n



(Short) anonymous functions in Python are calles *lambda expressions*. They can be used when function objects are required, e.g., when a function is to evaluate comprehension (see below). 

In [35]:
def make_power(n): 
    return lambda x, m = 2: x**n + m

In [36]:
f2 = make_power(2)
type(f2)

function

In [37]:
f2(1)

3

In [38]:
make_power(2)(2, 1)

5

In the example below, the lambda expression ensures that the ordering is on the item value, not the key value!

In [None]:
# N
myfun(n = 1,x = 2)

In [None]:
myfun(2, n=1)

In [None]:
# N
# if one argument is specified, the following must be specified as well!
myfun(1, 2)

In [None]:
# N
# in contrst, the following is fine:
myfun(2, n = 1)

In [40]:
pairs = ['onzXfasdf', [2, 'two'], ['three', 3], (4, 'four')] # N
type(pairs)

list

In [None]:
tup = pairs[0]
print(tup)
isinstance(tup[0], int)

In [41]:
def fun(tup):
    if isinstance(tup[0], str):
        return tup[0]
    else:
        return tup[1]

In [44]:
pairs.sort(key=fun)
pairs

[(4, 'four'), 'onzXfasdf', ['three', 3], [2, 'two']]

In [45]:
pairs.sort(key=lambda tup: tup[0] if isinstance(tup[0], str) else tup[1])
pairs

[(4, 'four'), 'onzXfasdf', ['three', 3], [2, 'two']]

In [46]:
def _(pair):
    return pair[0]
pairs.sort(key=_)
pairs

TypeError: '<' not supported between instances of 'str' and 'int'

In [47]:
lst = ['one', 'two', 'three', 'four']
lst.sort()
lst

['four', 'one', 'three', 'two']

####  `if`

Python's `if` statement allows us to change the behavior of our code depending on whether a condition is met. Conditions must be Boolean expressions (<kbd>bool</kbd>).

Indentation determines whether code is inside or outside of a control flow statement! Be careful to get it right!

In [48]:
x = 1
if x > 10:
    print("x is greater than 10")
    # this is happening if the condition is true
elif x == 1:
    print('x is one.')
else:
    print("x is less than or equal to 10, and not 1")

x is one.


In [52]:
# note that it is called elif rather than else if:
x = 1
if x > 10:
    print("x is greater than 10")
elif x == 1:
     pass # lets deal with this later
else: 
    print("x is less than or equal to 10, and not 1")

####  `for`

Python's `for` loop allows us to iterate over elements of a string, tuple, list, or other object.

Objects that can be iterated over are iterable. We'll learn more about iterables later.

In [54]:
{2, 1, 3} == {1, 2, 3}

True

In [55]:
def f(): 
    global count
    count += 1

In [56]:
count = 0

In [57]:
f()
count

1

In [53]:
for i in {1, 2, 3}:
    print(i)

1
2
3


In [67]:
# A weird way to convert to lowercase that shows a non-trivial loop:
for letter in 'StA 141B':
    # Computers compute on numbers, so each letter is represented by a number in memory.
    # ord() gets the number that represents a letter
    num = ord(letter)
    if 65 <= num <= 90: # A-Z are represented by 65-90
        # a-z are represented by 97-122, so a 32 number offset
        new_letter = num + 32
        # chr() converts a number that represents a letter back to the letter
        new_letter = chr(new_letter)
    elif 97 <= num <= 122:
        new_letter = chr(num-32)
    else:
        new_letter = letter
        
    print(new_letter, end = "") # replaces default line break at end

sTa 141b

In [68]:
ord('A')

65

In [69]:
chr(84 + 32)

't'

In [70]:
# In practice, we can just use a built-in method to convert to lowercase
print('STA 141B'.lower())
print('sta 141b'.upper())
# Behind the scenes, .lower() is implemented in pretty much the same way as our loop above.

sta 141b
STA 141B


## END OF SECOND LECTURE

### Iterables

The four most important methods to repeat code for identical or similar tasks are:

 1. Loops (`while` and `for`)
 2. Recursion (e.g., using the function in itself)
 3. Comprehensions, Generators, and `map()`
 4. Vectorization (`NumPy` arrays and functions)
    
These methods have tradeoffs. In general:

 1. Loops are the most flexible -- particularly `while` loops
 2. Complicated code and suscebtible to infinite recursion
 3. Generators tend to use the least memory
 4. Vectorization tends to be fastest 

#### 1. Loop tips and tricks

An iteratable object is a object that can be iterated over, element-by-element, like <kbd>tuple</kbd>, <kbd>list</kbd>, <kbd>range</kbd>, <kbd>string</kbd>.

Python's `for`-loops can automatically retrieve elements from iterable objects.

In [71]:
# bad code
x = 'hello'
for i in [0, 1, 2, 3, 4]:
    print(x[i], end = '')

hello

In [72]:
for i in range(5):
    print(x[i], end = '')

hello

In [74]:
# good code
for x in 'hello':
    print(x, end = '+') # we can use .index method for strings! 

h+e+l+l+o+

In [75]:
[i for i in enumerate('hello')]

[(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')]

In [77]:
for i, v in enumerate('hello'): 
    print(str(i+1) + v)

1h
2e
3l
4l
5o


You can use `list` to recast <kbd>range</kbd> objects to <kbd>list</kbd> objects. As we have already established, this is computationally intensive and should generally avoided. You may only need to do this for visual inspection. 

In [78]:
print(range(5))

range(0, 5)


In [79]:
print(list(range(5)))

[0, 1, 2, 3, 4]


In [None]:
range(5)

In [80]:
# the difference can also be seen by its size
import sys

print(sys.getsizeof(range(0,5)))
print(sys.getsizeof(list(range(0,5))))

48
104


You can make the keys and values in a <kbd>dict</kbd> objtect iterable with the `items()` method.

In [81]:
x = {'hello': 1, "goodbye": 2}

for i in x:
    print(i, x[i])

hello 1
goodbye 2


In [82]:
x.items()

dict_items([('hello', 1), ('goodbye', 2)])

In [84]:
for key, val in x.items():
    print(key, val)

hello 1
goodbye 2


*Zipping* two sequences together means combining them into a <kbd>list</kbd> object of <kbd>tuble</kbd> objects where:

- The first element of each tuple is an element from the first sequence
- The second element of each tuple is an element from the second sequence

Usually it only makes sense to zip sequences that are the same length.

The `zip` function zips two or more sequences. Use it to iterate over multiple sequences at the same time.

In [85]:
y = ['four', 'one', 'two', 'three']
print(x)
print(y)

{'hello': 1, 'goodbye': 2}
['four', 'one', 'two', 'three']


In [86]:
len(y)

4

In [87]:
len(x)

2

In [92]:
z = zip(x, y)

In [94]:
list(z)

[]

In [93]:
list(z)

[('hello', 'four'), ('goodbye', 'one')]

In [None]:
y

In [None]:
list(enumerate(y))

In [None]:
list(zip(range(len(y)), y))

In [5]:
x = [1, 2, 3]
y = [4, 5, 6]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

1 4
2 5
3 6


In [6]:
list(zip(x, y, [7, 8, 9]))

[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

In [10]:
import itertools # in contrast, this gives all combinations of the two lists
list(itertools.product(x, y))

[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

In [None]:
# N
# zip stops when the first list has come to an end
x = [1, 2, 3]
y = [4, 5]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

The `enumerate` function zips together index numbers and a sequence. In other words, the function enumerates a sequence.

In [None]:
# If you absolutely must use index numbers, at least use enumerate() to get them
x = 'hello'

enumerate(x)
list(enumerate(x))

In [None]:
for i, x_elt in enumerate(x):
    print("Position", i, "is", x_elt)

#### 2. Recursion

A recursion occurs if a function calls itself. It is useful for iterative processes. 

In [None]:
def factorial(n): 
    '''This function computes the factorial of n via recursion.'''
    if n == 0: 
        return 1
    elif n < 0:
        print("n is not an integer!")
        return 0
    else: 
        recurse = factorial(n-1)
        result = n * recurse
        return result

In [None]:
help(factorial)

In [None]:
factorial(3) 

Here, infinite recursions can occur. Luckily, my Python interpreter guards against it.  

In [None]:
factorial(4.3)

#### 3. Comprehensions and generators

A comprehension is a Python expression that transforms a sequence, element-by-element.

In [None]:
[x**2 for x in range(5)]

Think of this as Pythons `lapply`. You can include a condition in a comprehension:

In [None]:
# Get all squares of even numbers from 0...10
# [x for x in Z if W]

x = [x**2 for x in range(11) if x % 2 == 0]
x

You can also iterate over subelements.

In [None]:
x = [[1, 2, 3], [4, 5, 6]] # print 1, 2, 3, 4, 5, 6

In [None]:
# somewhat clumsy
for sublist in x:
    for elt in sublist:
        print(elt)

print(sublist)

In [None]:
[y for y in third_list for third_list in x]

In [None]:
[y for sublist in x for y in sublist]

Be aware that `sublist in x` is the top loop and subloops are right thereof. In other words, the outermost iterables always come first in the comprehension.

A comprehension surrounded by `[ ]` is called a list comprehension and produces a <kbd>list</kbd>. A comprehension surrounded by `{ }` and including `:` is called a dictionary comprehension and produces a <kbd>dict</kbd>. Else it is called set comprehension. 

In [None]:
x = ["hello", "goodbye"]

lens = {len(name): (name) for name in x} # print the length of names
lens

In [None]:
lens = {name: len(name) for name in x}
print(lens)
lens["hello"]

Remember that <kbd>dict</kbd> does not support equal keys and <kbd>set</kbd> does not support equal items, but <kbd>list</kbd> does. 

In [None]:
{x**2 for x in [-1, 0, 1]} # set # uniqueness of sets is checked with ==, not is

There's no such thing as a tuple comprehension. Instead, a comprehension surrounded by `( )` is called a generator expression.

In [None]:
y = (x**2 for x in range(1001) if x % 2 == 0)
type(y)

In [None]:
import sys
sys.getsizeof(y)

In [None]:
sys.getsizeof([x**2 for x in range(1001) if x % 2 == 0]) # produces a list, i.e., is evaluated

Operating on a generator forces its evaluation. 

In [None]:
sum(y)

This code does not produce any sensible result, because *a generator can only be used once*. Once iterated through, it is exhausted. Since this saves memory it is *much* more efficient than <kbd>list</kbd>.

In [None]:
for i in y:
    print(i, end=" ")

In [None]:
y = (x**2 for x in range(101) if x % 2 == 0)

In [None]:
for i in y:
    print(i, end=" ")

In [None]:
# N
# now do it again:
for i in y:
    print(i, end=" ")

 The economics of memory show when we time operations. 

In [None]:
import timeit

In [None]:
print(timeit.timeit('''list_com = [i for i in range(100) if i % 2 == 0]''', number=1000000))
print(timeit.timeit('''gen_exp = (i for i in range(100) if i % 2 == 0)''', number=1000000))

A generator is a special kind of iterable which computes its elements on demand. Examples are ranges and generator expressions. 
Generators are especially useful for working with data that are __too large__ to fit in memory. While making a huge list (say $10^9$ elements) might use enough memory to crash Python, making a generator with the same number of elements uses almost no memory. See more examples [here](https://zacks.one/python-generators/). 

Python's `itertools` module has functions for manipulating generators and iterable objects

In [None]:
x = [1, 2, 3]
y = [4, 5, 6]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

Note that the zip command is zipping together the first elements of the lists with each other, afterwards the second elements, and so on. 
If you want to have all possible combinations, you should either use the `itertools` package or list comprehension as follows:

In [None]:
import itertools

list1 = ['a', 'b', 'c']
list2 = [1, 2]

combinations = list(itertools.product(list1, list2))
print(combinations)

Alternatively, you could also have used list comprehension:

In [None]:
combinations2 = [(first_el, second_el) for first_el in list1 for second_el in list2]
print(combinations2)
print(combinations == combinations2)

Note that the lists are equal, but not identical:

In [None]:
print(combinations is combinations2)

#### Map functions

Map allows to apply a function to each element.

It takes a function as the first argument and an iterator as the second argument.

In [None]:
# ever wondered how to interpret fahrenheit?

fahrenheit = range(0, 110, 25)
print(list(fahrenheit))

celsius = map(lambda x: (x-32)*5/9, fahrenheit)
print(celsius) 

Note that a map object is returned, rathern then the results you might have anticipated.

To get the actual values, use the list command.

We could also use map again to get the rounded values.

Alternatively, we just could have used:

In [None]:
list(map(lambda x: round((x-32)*5/9,1), fahrenheit))

#### Filters



In [None]:
fahrenheit = range(0, 110, 10)
hot_tmp = filter(lambda x: x > 85, fahrenheit)

In [None]:
print(hot_tmp)

In [None]:
print(list(hot_tmp))

In [None]:
lt = list(filter(lambda x: x < 35, fahrenheit))

for tmp in lt:
    print(tmp, " F is quite cold.")

Also works for lists.

In [None]:
fahrenheit = list(fahrenheit)
hot_tmp = filter(lambda x: x>85, fahrenheit)
print(hot_tmp)