# PhysCat Spring 2021, Homework 1
## Part 2: Python Building Blocks

v1.0 (2021 Spring): Aditya Sengupta, Aled Cuda

## Introduction

In this part of the homework, we'll recap the essential Python concepts and syntax that we'll build on throughout the rest of the course.

This notebook is designed for a time budget of 1 hour, assuming it isn't your first exposure to these ideas. There are no graded questions: just run and understand everything. If you skim this and feel like you're already comfortable with everything, feel free to skip to Part 3, although we recommend you still run everything as there's a few subtleties/fun tricks. On the other hand, if you're ever unsure about something we use later, you can come back here and check out example usage.

You don't have to submit this on Gradescope, but feel free to ask us questions about it!

In [1]:
import tqdm
import random

It's possible that in a future homework or lecture we occasionally use a feature we forgot to explicitly introduce here and don't explain at the time. If we do that, feel free to ask us about it in the lecture/office hours/Discord! This section was partially adapted from the ULAB Physics and Astronomy CS modules from Fall 2020.

There's surprisingly few pieces you need to express any possible computation using a programming language (which is a theoretical concept known as "Turing completeness" - Google it!) Technically, all you need to know is

- how to store data (assigning to variables and using them) 
- conditional statements (`if`, `else`); and 
- loops (`for`, `while`). 

Everything else, including everything you'll learn in this entire class, is extra - you *could* program anything you want out of just these basic building blocks. But that would take way too much time, way too much computer power, and wouldn't be very fun as you'd have to reinvent thousands to millions of wheels. 

Instead of this, we'll spend this semester exploring how other people have cleverly built their own building blocks to make research programming faster and easier (right down to [the Jupyter notebook you're looking at right now](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/)), and how you can do it too! Before we do that, though, let's review the basic elements of Python to make sure we know how everything after this works.

### Section 2.1: Storing Data and Basic Operations

Python data essentially comes in several basic types, or in custom types that are constructed using these basic ones. In programming languages, a variable is a reserved spot in computer memory that stores a particular value or a reference to an object. All languages have variables, but variables in Python are *dynamically* typed (unlike other statically-typed languages like Java). This means that you can define a variable `x` to be a certain type (e.g. int) and then later in your code assign `x` to a different type (e.g. string).

You can always check the type of an object`x` by running `type(x)`.

In [2]:
type(6)

int

The types we'll look at here are Booleans (true/false), ints, floats, strings, lists, and dictionaries.

**Booleans** just store `True` or `False`. This will be useful when we look at conditionals and loops in a bit: we'll be able to tell Python to do something if a certain expression is `True`, and not to do it if it's `False`.

In [3]:
to_be = True
not to_be

False

Multiple Booleans can be combined in arbitrary patterns, so you can check multiple conditions at once, using the `and` and `or` keywords. Generally, Python usage follows how you would use this in real life: if this is true "or" that is true, you go ahead.

In [4]:
the_question = to_be or not to_be
the_question

True

**Ints** represent the positive and negative integers. Unlike some other languages, [integers in Python 3](https://docs.python.org/3.1/whatsnew/3.0.html#integers) are unbounded, that is, there is no theoretical maximum integer that Python can store. **Floats** (floating-point numbers) extend this to decimals, so that we can do algebra using the entire real line.

In [5]:
x = 1
y = 2.3
print("the type of x is {0}, while the type of y is {1}".format(type(x), type(y)))
z = x + y

the type of x is <class 'int'>, while the type of y is <class 'float'>


With ints and floats (which together with complex numbers, which we won't discuss here, are called "numeric types"), we can start doing arithmetic. Along with the four basic operations (`+`, `-`, `*`, `/`), we can
- check if numbers are equal (`==`) or not equal (`!=`)
- compare numbers using the greater-than (`>`) or less-than (`<`) operators, or the equivalents with equality (`>=`, `<=`)
- divide integers without a remainder (`//`)
- divide integers and retain *only* the remainder (`%`)
- take an absolute value (`abs`).

Refer to the [Python documentation on built-in types](https://docs.python.org/3/library/stdtypes.html) for more!

Floats are slightly limited by the finite space assigned to them in computer memory, meaning they're prone to rounding errors. This is relevant in a few places we'll discuss: for example, in probabilistic programming, you often make decisions based on very small values, where floating-point errors may influence the actual effects you're trying to see. We'll discuss how to work around this as and when it becomes a problem: for now, just keep this idea in the back of your mind.

In [6]:
0.1 + 0.2

0.30000000000000004

**Strings** (the `str` type) are objects that contain any text you give them.

In [7]:
type("Hi!")

str

The major operations we'll use on strings are indexing into them to get the characters at a certain position, and their length using `len`. "Indexing" a string means referring to the individual characters making up a string based on their position. For example, the index expression below extracts the word "California" by looking at the 14th-23rd characters of the string (indexing in Python starts from 0, is inclusive on the left and exclusive on the right).

In [8]:
univ_name = "University of California, Berkeley"
state = univ_name[14:24]
state

'California'

We can get the length of the original string and the resulting one using `len`:

In [9]:
len(univ_name)

34

In [10]:
len(state)

10

Strings can be manipulated using many of the same rules as numeric types (for example, you can add strings together to combine them, using the `+` operator), but this doesn't come up as much. We'll revisit this when we talk about dataframes, which are structures used to contain a large amount of data: we'll use strings to name the columns of a dataframe.

Strings have a large number of built-in operations, or *methods*, that may be useful. Generally if you have a string `s` and a string method `f`, you call it by running `s.f()`. Sometimes the method may need inputs `q`, in which case you can run `s.f(q)`.

For a more concrete example, let's look at the `islower` method. This checks whether all characters within the string are lowercase.

In [11]:
"all lowercase".islower()

True

In [12]:
"This has capital letters!".islower()

False

Generally, it's a better idea to look up operations and methods as needed than to try and remember everything that strings (or anything else you encounter) can do. If you're ever doing something with strings that you think someone may have done before, or it seems like a common problem, it's a good idea to look up the [Python documentation](https://docs.python.org/3/library/stdtypes.html#str) for string methods to see if it exists.

For instance, suppose we have a string with a lot of trailing and leading space.

In [13]:
spaced = "            there's so much empty space                "

If we want to get rid of it, we could try and find the index where the actual text starts, where it ends, and index into it. This sounds annoying and labor-intensive. However, you might expect that this is a common use case; for example, if you're reading text from a file, you might carry over some extra spaces. So, you might look up the documentation, or Google something like "python delete spaces in string", and you'll find the `strip` function. Try it:

In [14]:
spaced.strip()

"there's so much empty space"

**Lists** and **dictionaries** are the last types we'll talk about. 

Lists give us a way to structure any datatype and let us index into them, similar to strings. You can make a list by putting any other objects in square brackets `[...]`:

In [15]:
num_lst = [3, 8, 10]
num_lst[1]

8

Lists can mix types and can even have other lists in them.

In [16]:
mixed_lst = [3, 1.8, "python", [10, 2.4]]

In practice, you're unlikely to use lists very much: it's usually better to use a numpy `array` or a pandas `DataFrame`, both of which we'll discuss later. However, both of these structures build on the idea of a list.

A dictionary is similar to a list, except instead of accessing elements by their order, we access elements through "key" values. Dictionaries are useful to save intermediate results, or to keep track of interrelationships between data.

In [17]:
squares = {3: 9, 2: 4, -1: 1}

In [18]:
squares[3]

9

### Section 2.2: Conditionals

Conditional statements let us execute different actions based on a certain condition: if something is true, do X, and if it's not, do Y. The statement we use for this is `if...elif...else`.

In [19]:
if True:
    print("this is true")

this is true


In [20]:
if False:
    print("you'll never see this")
else:
    print("you'll always see this")

you'll always see this


The conditions are often based on other variables:

In [21]:
x = 3
if x == 1:
    print("not here")
elif x == 2:
    print("still not yet...")
else:
    print("here!")

here!


In [22]:
x = 3 
print("x is even:")
if x % 2 == 0:
    print(True)
else:
    print(False)

x is even:
False


In [23]:
x = 4
print("x is even:")
if x % 2 == 0:
    print(True)
else:
    print(False)

x is even:
True


`if` statements are useful for decision-making. For example (and you'll see this almost immediately in the next part of this homework) if you want to "flip a coin" using only the `random.random()` function, which gives us a number uniformly chosen between 0 and 1 (like in the first line below), you can do this:

In [24]:
# run me a few times!
r = random.random()
if r < 0.5:
    print("heads")
elif r > 0.5:
    print("tails")
else: # it's basically impossible to get this
    print("coin fell on its side")

tails


An aside: the even-checking code can be written more to be more "pythonic" (a bit of a subjective term that basically means "elegant and uses Python features") in a way that avoids the `if` altogether.

In [25]:
x = 4
print("x is even: {}".format(not bool(x % 2)))

x is even: True


For more on what constitutes pythonic code, run the cell below to get more questions than answers:

In [26]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Section 2.3: Loops

Loops are how we tell Python to do something many times, up until some condition is met. Similar to `if`, we can have Python do something *until* a condition is met with the `while` statement.

In [27]:
y = 0
while y <= 10:
    y += 1

print("after breaking out, y = {}".format(y))

after breaking out, y = 11


The more commonly used loop structure in Python, however, is `for`. This takes in an iterable (usually a list or a `range` object) and lets a variable run over every element of that iterable.

In [28]:
all_messages = ["welcome to physcat!", "we hope you learn something cool", "join the discord"]
for message in all_messages:
    print(message)

welcome to physcat!
we hope you learn something cool
join the discord


The `range` object takes in the start and end points of a range of integers, and returns them one by one (including the first, excluding the last). This is better than making a list of them, because Python doesn't have to store all of them in memory. If you only provide one input $n$, it runs $0, 1, \dots, n - 1$.

In [29]:
list(range(5)) # the `list` function takes in an iterable and stores everything in it in a list

[0, 1, 2, 3, 4]

In [30]:
list(range(3, 7))

[3, 4, 5, 6]

In [31]:
for n in range(5, 10):
    print(n ** 2)

25
36
49
64
81


If you ever see a loop taking a long time, you can create a progress bar to see how far along it is using the `tqdm` package. This provides a progress bar both in a notebook and in terminal Python (go to your terminal, run `python3` and copy and paste the cell below to see what it looks like there as well).

In [32]:
# multiplying a list by a number repeats it that many times
# here we're just doing it to make more things to loop over
for i in tqdm.tqdm(all_messages * 1000000):
    pass # Python-speak for "do nothing"
print("done")

100%|██████████| 3000000/3000000 [00:00<00:00, 3196853.68it/s]

done





`tqdm.tqdm` wraps any iterable. If you're wrapping a `range`, you can also use `tqdm.trange`:

In [33]:
for i in tqdm.trange(10000000):
    pass
print("done")

100%|██████████| 10000000/10000000 [00:02<00:00, 3488683.27it/s]

done





We'll often use "list comprehensions" or "dictionary comprehensions" as ways of defining lists or dictionaries in a single line, which use a slightly rearranged `for` loop. A list comprehension follows the syntax

`[new_item_expression for item in iterable]`.

For example:

In [34]:
[x - 3 for x in range(5)]

[-3, -2, -1, 0, 1]

A dictionary comprehension is similar, except we specify both a key and value:

In [35]:
{x : x ** 2 for x in range(5)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

### Section 2.4: Functions

Often, we want to *encapsulate* a particular piece of code so we can use it multiple times, or so we can separate it from the rest of a program. In this case we use a function. You can think of this in the same way you think of mathematical functions in algebra class:

In [36]:
def square(x):
    return x * x

def cube(x):
    return x * x * x

In [37]:
square(3), cube(4)

(9, 64)

Functions can have multiple inputs and outputs:

In [38]:
def power(x, n):
    # this is essentially the same as the builtin function 'pow'
    return x ** n

def divide(N, m):
    return N // m, N % m

In [39]:
power(3, 4) # multiple inputs

81

In [40]:
divide(5, 2) # multiple inputs and multiple outputs

(2, 1)

Or they can have no inputs (like `random.random()`) or no outputs (like `print`). Let's look at what happens if we try to assign a variable to the result of printing something:

In [41]:
printval = print("5")

5


You see the 5 from the print statement, but what about `printval`?

In [42]:
printval

Looks like it's nothing! In fact, it's a Python object called `None`:

In [43]:
printval is None 
# 'is' means they're the same object, but two things can be different objects with the same value
# this is a subtle point we won't pay much attention to, but ask us if you're curious!

True

This isn't just restricted to math operations; functions can take in and return anything in Python.

In [44]:
def fifth_char_of_string(s):
    return s[4]

fifth_char_of_string(univ_name)

'e'

Usually when we write code (at least in Python or any other "functional programming" language), we want to write a set of functions that each do a part of what we want to do, and then combine them all somehow.  

If you ever see a function and you don't know what it does, try `help`:

In [45]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [46]:
help(pow)

Help on built-in function pow in module builtins:

pow(base, exp, mod=None)
    Equivalent to base**exp with 2 arguments or base**exp % mod with 3 arguments
    
    Some types, such as ints, are able to use a more efficient algorithm when
    invoked using the three argument form.



If you're writing a function, you can add your own "docstring", which will be returned when you call `help` on the function you wrote. In your docstring, it's a good idea to clearly explain what the function does and what the inputs and outputs are. For instance, let's try writing `power` again:

In [47]:
def power(x, n):
    """
    Raises x to the power n.
    Arguments
    ---------
    x : float
        The base to exponentiate.
    n : float
        The power to which x should be raised.
        
    Returns
    -------
    x ** n : float
        The result of exponentiating.
    """
    # This is a docstring: it should be in triple quotes at the top of the function.
    return x ** n

In [48]:
help(power)

Help on function power in module __main__:

power(x, n)
    Raises x to the power n.
    Arguments
    ---------
    x : float
        The base to exponentiate.
    n : float
        The power to which x should be raised.
        
    Returns
    -------
    x ** n : float
        The result of exponentiating.



Finally, we may use "lambda functions" from time to time. These are just normal functions that we can define in one line instead of with a `def`.

In [49]:
square = lambda x: x ** 2

In [50]:
square(3)

9

Lambdas can't have docstrings, and generally aren't a good idea if you can't just immediately read off what they're supposed to do; however, they're sometimes useful to concisely express a simple idea.

We're now at the end of the module! Feel free to move on to part 3, after you've run the last cell below.

In [51]:
import antigravity