# Basics (Part 1): Jupyter Notebook and Python

Welcome to my sequence of ML tutorials! In coming lessons we'll cover all kinds of important topics related to machine learning. Before diving into that stuff, however, I'd like to get some preliminaries out of the way for those who have a weak or rusty background in coding or math. You don't need to be an advanced coder to do machine learning, but you do need to know some common techniques. Similarly, you don't need to be any kind of expert at math. No need to remember everything you learned in calculus or linear algebra, or certainly anything from higher level math courses than those. In this particular tutorial we'll talk about how Jupyter Notebook and Python work at a basic level.

## Jupyter Notebook

Before delving into how Python works, we need to talk about the environment we're currently in. The page you have up is running in what's called a **Jupyter Notebook**. Essentially, a Jupyter Notebook (or notebook) is a sequence of **cells**. In each cell you can perform some task or another. 

To render any Jupyter cell (including this one), you just need to type `SHIFT-ENTER` while highlighting that particular cell. You can also click the `Run` button in the upper toolbar, but that's slower and more annoying.

To move between cells you can either click on the one you want, or use the up and down arrows on your keyboard to toggle up or down cells using `ESC-UP` or `ESC-DOWN`.

### Markdown Cells

The text you're currently reading is all contained in what's called a **markdown cell**. A markdown cell is essentially a cell for writing text. In a markdown cell, you can type whatever text you want and it'll render on the screen as text. The text is written in **Markdown**, which is a subset of HTML that lets you quickly do basic styling.

To turn any cell in a notebook into a markdown cell, just type `CMD-m` (or `CTRL-m` for Windows/Linux).

Here are some basic examples of things you can do in Markdown:
- If you want to bold the world `hey`, you'd type `**hey**`.
- If you want to italicize `hey`, you'd type `*hey*`.
- If you want to create a large heading out of `Hello`, you'd type `# Hello`.
    - A subheading one level down would use two `#` instead of one, so `## Hello`.
    - A sub-subheading two levels down would be `### Hello`. Etc.
- If you want to highlight a word `link` with a hyperlink to `https://url.com`, you'd type `[link](https://url.com)`.
- If you want to create bullets of some text `a, b, c`, each its own bullet, you'd type:

```
- a
- b
- c
```

- If you want to number some text `a, b, c`, you'd instead type:

```
1. a
2. b
3. c
```
- If you wanted to render a table like this, you'd type:
|some text|other text|
|--|--|
|1|a|
|2|b|

```
|some text|other text|
|--|--|
|1|a|
|2|b|
```

See [this](https://www.markdownguide.org/basic-syntax/) Markdown style guide for more specifics on things you can do. We'll use many of them over the course of these tutorials.

It turns out that Jupyter also supports a useful extension to Markdown that lets us render math using LaTeX. If we want to render an equation, all we need to do is sandwich it with `$` on each side, and make sure we type proper LaTeX math commands inside it. For examples of how to work with basic LaTeX in math mode, see [here](https://linuxhint.com/use-latex-jupyter-notebook/).

As a simple example, suppose you wanted to render the following equation as proper, beautifully-rendered math:
$$\frac{a^2+b^2}{\alpha_1 + \beta_2} + 4\log(x) + e^{i\pi} = \frac{dy}{dx}.$$
You'd just type the following command into a Jupyter markdown cell and it'll render as LaTeX:
```
$$\frac{a^2+b^2}{\alpha_1 + \beta_2} + 4\log(x) + e^{i\pi} = \frac{dy}{dx}.$$
```

### Python Cells

Jupyter is much more powerful than a text renderer though. The most important functionality of a Jupyter notebook is its Python support. To type Python in Jupyter, you just need to specify a Python cell using `CMD-y` (or `CTRL-y` for Windows/Linux). You'll rarely need to do this though, since any new cell in a notebook is a Python cell by default. The cell below is an example of a Python cell. The input is typed into the cell. Running the cell will produce any output beneath the cell. For example, running `2+2` in the python cell below produces an output of `4`.

Note that only the *last* line in a cell renders its output, unless it's wrapped in a `print` statement.

In [1]:
2 + 2

4

In [2]:
2 + 2
1 + 1

2

In [3]:
print(2 + 2)
print(1 + 1)

4
2


## Python

Let's now start introducing Python, the language we'll be coding in. Python is a "high level" programming language. That essentially means it makes it easy to do abstract operations without having to think about boring, low-level details like how bits move around, how unused memory (called garbage) is freed, defining what types of object you're using, etc. Python lets you easily work with the objects you want to work with at a high level, with minimal fuss. It's perhaps for this reason that Python has become the de facto language of machine learning. Indeed, it's hard to imagine doing *any* machine learning nowadays in any other programming language.

### Arithmetic
As shown above, we can do any kind of usual arithmetic in python just by typing what we'd expect. There are a few major exceptions to note:
- To take the power of a number, we use the `**` operator, not the `^` you might be used to.
- There's a subtle difference between *integer division* and *floating point* division. 
    - Floating point division is exactly what you'd expect your calculator to do, and is done using the `/` operator.
    - Integer division is division without the remainder, i.e. division rounded down to the nearest integer. This is done with the `//` operator. It may seem weird we'd want an operator for this, but you'll see examples where we might going forward.
- Modulo arithmetic, i.e. getting the *remainder* of two divided numbers, uses the `%` operator. For example, we'd render the remainder of $3/2$, written in math notation as $3 \text{ mod } 2$, as `3 % 2`, which turns out to be `1`.
- Though we won't use it in these tutorials, the imaginary number $i$ is denoted `1j` in python. Any complex number would then be written as usually. For example, $1+2i$ would be written `1 + 2*1j`, or simply `1 + 2j`.

In [6]:
1 + 2

3

In [7]:
1 - 2

-1

In [8]:
1 * 2

2

In [25]:
2 ** 3

8

In [26]:
2 ** (1/2)

1.4142135623730951

In [13]:
3 / 2

1.5

In [14]:
3 // 2

1

In [15]:
3 % 2

1

In [16]:
1 + 2j

(1+2j)

### Variables and Types

It would be nice to be able to save the state of our operations so we can use them later to do other things, analogous to how calling something $x$ in algebra lets us use it as a placeholder that we can later plug values into. In python we can do this via **assignment**. Suppose we wanted to assign the value of `2` to a variable called `x`. We can do that by writing `x = 2`. This means "take the value on the right of the equals sign, `2`, and assign it to the variable name defined on the left of the equals sign, `x`". Then `x` can be used as a drop-in replacement for the number `2`.

In this example, `x` is called a **variable**. Just like in algebra, a variable is a shorthand for some kind of value, which we can then pass to other operations.

In [17]:
x = 2
x + 2

4

In [18]:
x ** 3

8

In [24]:
y = 2.3 * x + 7.5
y

12.1

In the statements above, the **type** of `2` is called an integer, or **int** for short. The type of `7.5` is called a floating point number, or **float** for short. It may seem pedantic to make a distinction between the two types, but there are many cases where keeping track of this is important. On a computer, integers are represented differently than floats, and the two types of representations have very different properties. In particular, floats have unusual subtleties to deal with, which we'll see in future tutorials.

In [31]:
type(2), type(7.5)

(int, float)

While we can print the value of a variable by placing it on the last line of a cell, what if we wanted to print it on some other line? To do this we'd use the `print` statement on the variable. Calling print will print the value of whatever is inside it.

In [22]:
print(x)

2


4

In [27]:
print(y ** 2 + 100 * x)

346.40999999999997


By default, the print operation takes in a string and outputs a string. A **string** is any sequence of text wrapped in quotation marks. For example, if we wanted to assign the variable `a` the sequence of text `this is a string` we'd type the following.

Note that using either `'` or `"` works in python for defining a string. My personal preference is `'` since it's quicker to type, but many (especially folks from other languages) prefer to use `"`.

In [28]:
a = 'this is a string'
print(a)

this is a string


Newer versions of python allow us to mix strings and variables using **format strings**. To create a format string, we prefix the string with an `f`. Any variables we want to pass into the string is placed inside a `{}`. Format strings are useful for printing more complex statements than just a single variable's output.

In [30]:
x = 500
print(f'The value of x is {x}')

The value of x is 500


What if we wanted to work with a sequence of numbers (or some other type) instead of sequences of characters? We can do that in python using a list. A **list** is a type for representing arrays, i.e. sequences of things. To create a list of some sequence, say `1, 2, 3, 4, 5`, we'd just wrap it with brackets like this.

In [32]:
array = [1, 2, 3, 4, 5]
array

[1, 2, 3, 4, 5]

For both arrays and strings, we can **index** into them to get particular values. Suppose we wanted the first element of the above array, i.e. `1`. We could get it as follows. Note that python is **zero-indexed**, which means indexes always start from 0, not 1. That can take some getting used to if you haven't done much programming before.

In [33]:
array[0]

1

In [34]:
array[1]

2

One of the neat things about python is it makes it easy to get the last elements of the list too using negative indexing. To get the last element of the array, use the index `-1`. To get the second to last element, use `-2`. Etc.

In [36]:
array[-1], array[-2]

(5, 4)

To get the **length** or size of an array, python has a `len` function that will work for lists, strings, and many other data types. Note that since python is zero-indexed, the last index in a list is actually `len(array) - 1`, not `len(array)`.

In [41]:
len(array)

5

We can also get subsets of a list using **slicing**. Suppose we wanted the subset of elements in a list from index 2 to index 4. Then we could simply type `array[2:4]`. Note that by convention the lower index is inclusive, while the upper index is exclusive. This means that `array[2:4]` will contain `array[2], array[3]`, but *not* `array[4]`.

In [39]:
array[2:4]

[3, 4]

One shortcut to remember is that if slicing from the beginning of a list, we can omit the `0`. So `array[0:2]` means the same thing as just leaving out the `0` and writing `array[:2]`.

Similarly, with the end of a list we can leave off the last index, so in the above example `array[2:5]` means the same thing as `array[2:]`.

In [42]:
array[:2]

[1, 2]

In [43]:
array[2:]

[3, 4, 5]

The same tricks worth with strings as well, which can be thought of as lists of characters.

In [37]:
string = 'this is a string'
string[0]

't'

In [38]:
string[-1]

'g'

In [40]:
string[0:5]

'this '

In [44]:
len(string)

16

Other useful data types we'll see are the **dictionary** and the **set**. A dictionary, or **dict**, is basically a list that allows arbitrary "indexes". Suppose you didn't want to use the indexes `0, 1, 2, 3` for some object, but wanted to use something like `a, b, c, d`. You'd use a dictionary whose **keys** are `a, b, c, d`, and whose **values** are the actual values. In python, a dictionary is represented using the `{}` symbols. To specify the key, value pairs we type `key: value` for each pair. Indexing is done the usual way, but using the keys, so `dictionary[key] = value`.

Note that we don't have to use characters for keys. We can use *any* "hashable" objects we want, including ints, floats, strings, tuples, etc.

In [45]:
dictionary = {'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1}
dictionary

{'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1}

In [46]:
dictionary['a']

5

A **set** is basically a dictionary with only keys and no values. The defining difference between a set and a list is that the elements of a set *must* be unique, while a list can have items repeated as many times as we wish. To create a set, we just wrap a list of items inside the `set` function.

In [47]:
array = [1, 1, 2, 2, 3, 3]
array

[1, 1, 2, 2, 3, 3]

In [48]:
set(array)

{1, 2, 3}

### Logic and Conditionals

Frequently when programming we want to only perform operations in certain cases but not others. To compare objects we can use the **logical** operators. The output of any logical statement will always be either `True` or `False`, called a Boolean, or **bool**, in python.

Suppose we want to see if two variables are equal. To do that we use the `==` operator (*not* the `=` operator, which is already used for assignment, something very different!). To check if two variables are *not* equal, we can use the `!=` operator.

For types that can be ordered, like ints or floats, we can use the usual `<`, `>`, `<=`, `>=` operators as well.

In [50]:
x = 2
y = 2
x == y

True

In [51]:
x != y

False

In [52]:
x < y

False

In [53]:
x >= y

True

Statements like these can also be chained together if we need more complex logic. Suppose we want to test, for example, whether x and y are equal, *and* whether both variables are positive. We can do that using the `and` operator.

Similarly, we can test whether x and y are equal, *or* x is less than y by using the `or` operator.

If we want to negate a statement, we can use the `not` operator. If `a` is true, then `not a` will be false.

In [54]:
(x == y) and (x > 0) and (y > 0)

True

In [55]:
(x == y) or (x < y)

True

In [56]:
not (x == y)

False

Conditional statements are statements that make some kind of decision based on some input. Conditionals are frequently formulated something like "if x is true then do y". We can create conditionals using `if` statement blocks. For example, suppose we want to add 1 to x if x is less than y. We can do the following. Note if we run the conditional again nothing will happen, since x is no longer less than y.

Note the use of the `#` inside the cell below. These are **comments**. Anything placed after a `#` in python gets ignored by the parser. We use comments to explain what a particular piece of code is doing to a human reader.

In [58]:
x = 1
y = 2

if x < y:
    x = x + 1 # set x = 2
    
if x < y:     # note that x == y
    x = x + 1
    
print(x, y)

2 2


We can create more complex conditionals by using if-else-if type statements. Suppose we now want to:
- add 1 to x if x is less than y
- if x and y are equal, square both x and y
- otherwise, add 1 to y

We can do this using the `if`, `elif` (short for else if), and `else` sequence. The `elif` block will only execute if the `if` block is false. The `else` block will only execute if both the `if` and `elif` blocks are false.

In [61]:
x = 1
y = 1

if x < y:
    print('the if got executed')
    x = x + 1
elif x == y:
    print('the elif got executed')
    x = x ** 2
    y = y ** 2
else:
    print('the else got executed')
    y = y + 1
    
print(x, y)

the elif got executed
1 1


### Loops and Comprehensions

Frequently we'll want to operate on an iterable object like a list or a string in succession. For example, we might want to go through a list of numbers and try to find the maximum element in the list. Or create a new list based on input from another list. We can do this using a **loop**. The most important loop in python is the **for loop**. A for loop iterates one at a time across a sequence. At each step, whatever is inside the loop block is performed.

Suppose we want to iterate over an list `[3, 4, 5, 3, 2, 1]` to find the maximum element (clearly `5`). We can do that as follows. We'll set a `maximum` value to some low number, say `-1` in this case. We'll loop over the array. For each element in the array, if that element is larger than `maximum`, we'll replace `maximum` with that value. If we do this all the way across the list we're guaranteed to find the max.

Note python actually has a shortcut to find the max of a list, namely the `max` function. Similarly it also has a `min` function.

In [63]:
array = [3, 4, 5, 3, 2, 1]

maximum = -1
for x in array:
    print(x)
    if x > maximum:
        maximum = x
        
print(f'The maximum is {maximum}')

3
4
5
3
2
1
The maximum is 5


In [64]:
max(array), min(array)

(5, 1)

Sometimes we might want to loop over the indexes of an array instead of its values. Or we might want to loop over both its indexes and values.

To loop over just the indexes we can use the python `range` function. To create an array that goes from a min to a max in integer steps, we'd write something like `range(min, max)`. If we omit the min, python will assume the list starts from 0. Just like list slices, the range is inclusive at the min, and exclusive at the max.

Note that in python a `range` is an **iterator** type, not a list. Basically, an iterator is a list that you can't index into, only iterate over. If you want to turn an interator into a list, just wrap it with the `list` function.

In [65]:
range(1, 5)

range(1, 5)

In [66]:
list(range(1, 5))

[1, 2, 3, 4]

To iterate over the indexes of a list `array`, we can just iterate over `range(0, len(array))`, or more simply over `range(len(array))`.

In [67]:
array = [3, 4, 5, 3, 2, 1]

for i in range(len(array)):
    value = array[i]
    print(f'index: {i}, value: {value}')

index: 0, value: 3
index: 1, value: 4
index: 2, value: 5
index: 3, value: 3
index: 4, value: 2
index: 5, value: 1


What if we wanted to loop over *both* the indexes and values of a list (or any other iterable)? The most natural way to do this in python is by wrapping the list inside `enumerate`, which iterates over the *pair* `(index, value)` across the array.

In [69]:
for i, x in enumerate(array):
    print(f'index: {i}, value: {x}')

index: 0, value: 3
index: 1, value: 4
index: 2, value: 5
index: 3, value: 3
index: 4, value: 2
index: 5, value: 1


Sometimes we might want to do multiple, nested for loops. For example, suppose we want to iterate over a **list of lists**. Consider the following list, where each element of the outer list is a list of numbers. To iterate over this list of lists, we can iterate first over the outer list, and then over the inner list inside the loop body of the outer list.

In [71]:
array = [[1, 2, 3], [4, 5], [6, 7], [8]]
len(array) # there are only 4 "elements" in array, the 4 lists inside it

4

In [72]:
array[0]

[1, 2, 3]

In [74]:
for row in array:
    print(row)

[1, 2, 3]
[4, 5]
[6, 7]
[8]


In [73]:
for row in array:
    for x in row:
        print(x)

1
2
3
4
5
6
7
8


One common use case of loops is to create a new list (or other iterable) from an old list. In python, we can do that most easily by creating an empty new list `new_array`, and then **appending** a `value` to it inside the loop using `new_array.append(value)`. Note that appending is an "in-place" operation, which means you must not set it equal to anything. Just calling `new_array.append(value)` on its own line will automatically add `value` to `new_array`.

As an example, suppose we have a list `array = [1, 2, 3, 4, 5]`, and want to create a new list whose values are the squares of `array`. We can do that as follows.

In [76]:
array = [1, 2, 3, 4, 5]

new_array = []               # initialize empty new_array
for x in array:
    new_x = x ** 2
    new_array.append(new_x)  # append is done in place
    
new_array

[1, 4, 9, 16, 25]

Oftentimes we find when doing loops that we're just performing a single operation in the loop body. In such cases, python has a nifty shorthand called a **list comprehension**. A list comprehension lets you perform a loop operation in one line and assign each value in the loop body to a new list.