# Introduction to Python

_Some of this content is based on WDTS RENEW notebook authored by Dakota Blair (BNL)._

In this introductory tutorial, we will introduce the basics of Python. This is by no means ment to be a complete introduction, as the Python programming language is vast and much more powerful than what will be discussed here. Many of these more powerful features will be discussed later on in the bootcamp, but for this very first section, we'll stick with the foundational aspects of the language.

We'll also take this opportunity to introduce [Google Colaboratory](https://colab.research.google.com/) ("Google Colab"), the environment in which these tutorials will take place. Google Colab is essentially a cloud instance for running these `.ipynb` notebooks. It's a "batteries included" environment, so most things should "just work." This also means that we're leaving out a lot of foundational _development_ knowledge from these tutorials, so do keep that in mind. There is a difference between writing production code (not what we're doing) and understanding how to quickly leverage existing data science tools and various library APIs to solve scientific problems quickly and efficiently (this _is_ what we're doing).

Note, you will need a Google/Gmail account to save notebooks that you make changes to. If you haven't done so yet and don't have one, you should make one!

## Learning Objectives

By the end of this tutorial you should be able to

* Open, run and save a notebook in Google Colab.
* Use basic Python commands and syntax.
* Make a function that opens another file.
* Download contents of a git repository.
* Set up a conda virtual envorinment.

# Basic syntax and data types

The core of any object-oriented language are its objects. In Python _everything_ is an object, from a function to a class to a variable. The simplest object is, in fact, a **variable**. Let's define the first variable here, and print its type.

In [None]:
x = 2
print(type(x))

We've done a few things already. First, we instantiated the variable `x` and set its value to 2. Then we printed its type using `print`, which is a function that takes one or more arguments. It prints the values of varaibles. You may have also noticed that `type` is also a function, and outputs the type of the variable fed to it.

There are other types of variables, such as strings, floats, and booleans. Let's see a few examples of these.

In [None]:
# String
x = "hi there"

# Float
y = 6.2

# Boolean
z = True

In [None]:
print(x)
print(y)
print(z)

Note: The `#` symbol indicates the start of a **comment**. Comments are in-line notes that developers make for themselves and anyone else who is reading their code. They are not interpreted by Python (Python "skips" these lines). Use comments to better document what your code is doing!

The basic data types, in summary, are:
* Strings (`str`) - store characters
* Integers (`int`) - store integers only
* Floats (`float`) - stores real floating point numbers with 15 digits of precision
* Booleans (`bool`) - stores True or False

Variables are useful because of the operations we can perform on them:

## Basic Operators

Python contains many **operators** for performing operations on variables. Operators evaluate left-to-right, and obey the "PEMDAS" rules. A quick summary (which is not exhaustive) is as follows.
* You've actually already learned about the **assignment** above: `=`. The assignment operator assigns a value to a variable.
* The double equals sign `==` checks for **equality**.
* `+`, `-` are the **addition** and **subtraction** operators, respectively.
* `*`, `/` are the **multiplication** and **division** operators, respectively.
* Double asterisks `**` is used for **exponentiation**.

Let's see how a few of these work by computing the expression for degrees Celcius in terms of degrees Fahrenheit. Note that the expression mathematically is

$$ C = 5(F - 32)/9.$$

In code, this is:

In [None]:
F = 96  # it's quite hot out in America
C = 5 * (F - 32) / 9
print(f"It's equivalently hot in Europe: {C:.02f} degrees C!")

We've combined a few pieces of knowledge already while introducing another: [f-strings](https://docs.python.org/3/tutorial/inputoutput.html). This allows us to format variables into our print statements. For now, know that `C:.02f` formatted the `C` variable into a float with two decimal points of precision! f-strings can be quite complex and the formatting is largely optional. However, it will make your outputs look more professional. It's highly recommended to check out the documentation for more details!

# Basic Python data structures

In science, organizing our data is extremely important. Python offers many ways in which we can organize our data into different objects, which each consist of other objects. In this section, we'll go over quite a few of these.

## Lists

The most intuitive way to list a bunch of objects is, of course, a `list`! Say we have three variables that all carry some common meaning. For example, perhaps these are the first five elements of the Fibonacci sequence: 0, 1, 1, 2, 3. Logically, we might want to group these together into a list. Here's how to do it:

In [None]:
first_fib = [0, 1, 1, 2, 3]
print(first_fib)

Super easy, right? Python also knows how to print a list - definitely experiment with how Python prints different data structures!

Now, how do we access a number in a list? This is where things might get a little confusing. Python, like many (but not all!) programming languages, is "zero-indexed". That means the "first" element of the array is indexed by the number `0`. So for example, to get the first Fibonacci number in that array, I would use `first_fib[0]`, and **not** `first_fib[1]`. To see this, we can print each of these elements:

In [None]:
print(first_fib[0])
print(first_fib[1])

Note how the "0th" element of `first_fib` is of course 0 and is actually accessed via the index `0`. Be sure to keep this in mind. It is such a common mistake, even amongst programmers who use languages that use different indexing conventions, that it has its own name:

![image.png](attachment:d5a6d659-256e-46d7-95e5-408229908fa8.png)

Now, what happens if we try to access an element that is outside of the array? There are 5 elements in `first_fib`, which can be accessed via indices 0, 1, 2, 3 and 4 (not 5!). But what happens if we try to access element 5?

In [None]:
first_fib[5]

You see that Python has thrown an `Exception` called an `IndexError`. Python is telling you that you tried to access an element of the array that doesn't exist. More on `Exception`s later. For now though, note that this error will stop your program from running, and needs to be addressed in some manner. The length of the list can be accessed using `len`, and for the zero-indexing reasons we mentioned above, the last index of the list is actually `len(first_fib) - 1`.

In [None]:
first_fib[len(first_fib) - 1]

## Dictionaries

See how a list can be accessed by its index? A list is an _ordered_ collection of items. A dictionary, or `dict`, is an unordered collection of data indexed by "keys". There are quite a few differences fundamentally between lists and dictionaries, but for now, we're going to focus on the most important two.
* Lists are _ordered_, dictionaries are _not ordered_
* Lists are accessed by integers starting from 0, dictionaries are indexed by keys of an arbitrary type

This second property especially makes them useful as lookup tables. For example, the following demonstrates how dictionaries are constructed and accessed.

In [None]:
staff_ids = {
    "Matt": 1234,
    "Mike": 1235,
    "Mary": 5431,
}

In [None]:
staff_ids["Matt"]

If you try to access a key that does not exist, you will get (somewhat expectedly) a `KeyError` exception:

In [None]:
staff_ids["Nate"]

This is Python telling you that the key `Nate` does not exist!

## Other objects

There are many other objects you can use to keep objects organized in python. Two others are called **Tuples** and **Sets**. These are similar to lists but offer additional functionality and serve different purposes. As always, you should consult the documentation to learn about some of the things we don't cover in this tutorial.

# Control flow

Read the docs here: https://docs.python.org/3/tutorial/controlflow.html!

**Control flow** is the process of creating "branches" in your code. If some condition is met, some block of code may be executed, if some other condition is met, some other condition may be executed. By far the most common types of control flow are **if/elif/else** statements and **for/while loops**. These are the two concepts we'll discuss here, using some neat examples in the process.

## If statements

The `if` keyword is used in python to create code branches. If an _if_ statement's conditione evaluates to `True`, the interpreter proceeds to execute the code in that code block. There can be zero or more `elif` parts, and the `else` part is optional, executing only if none of the prior conditions were met. The keyword `elif` is short for `else if`, and is useful to avoid excessive indentation. It's best to see this as an example.

In [None]:
my_variable = 3
if not isinstance(my_variable, int):
    print("This isn't even a number!")
elif my_variable % 2 == 0:
    print("This number is even")
else:
    print("This number is odd")

Play around with this! Note we've combined things you've already seen (`==`, `print` statements, etc.) with things you haven't! Try to figure out what each of these new functions and operators do. For example, the `%` is the modulo operator. The `isinstance` function takes two inputs, a variable and a type, and checks to see if that variable is of the same instance as that type.

## For loops

For loops are essentially syntax for moving, one step at a time, through ai "iterable". An interable is anything that can be "iterated through". This includes lists and dictionaries. There is quite a bit "under the hood" here, but for all intents and purposes, that is what a for loop is for. Once again it's best to see this through an example:

In [None]:
min_number = 2
max_number = 30

for index in range(min_number, max_number + 1):
    if index % 2 == 0:
        print(f"The number {index} is even")
    if index == 27:
        print("You've encountered the number 27. Break out of the loop!")
        break

Once again we've combined some concepts while introducing some new ones. The `range` function takes one, two or three arguments. The full syntax of `range` is as follows: `range(start, stop, step)`.
* If only one argument is provided, such as `range(3)` that is equivalent to `range(0, 3, 1)`. I.e., the `range` iterable will step through indices 0, 1 and 2.
* If two arguments are provided, such as `range(2, 5)`, this is equivalent to `range(2, 5, 1)`. I.e., the `range` iterable will step through indices 2, 3, 4.
* Providing all three arguments explicitly specifies the step size. Try it out!

It is also helpful to note that you can access an object's **docstring** in a Jupyter or Colab environment by simply using the `help` function, or by putting a `?` after the object. For example:

In [None]:
range?

For loops can also be used to iterate through some of the lists you created. For example,

In [None]:
for fib_number in first_fib:
    print(fib_number)

# A note on the standard libray

The Python [standard library](https://docs.python.org/3/library/index.html) consists of a variety of importable modules which can be used to accomplish tasks that are more complicated then what can be solved with "clean" Python code. 

One example of this is how one would go about generating (pseudo) random numbers. This is a non-trivial task, and there's really no reason to reinvent the wheel when Python has its own implementation of it! Here's how we can go about generating [pseudo-random numbers](https://en.wikipedia.org/wiki/Pseudorandom_number_generator) on the $[0,1)$ half-interval. First, we have to `import` the `random` module. Before doing this, the Python interpreter has no idea what `random` is. We have to load that module (and the code, so to speak) into the current working session with an `import` statement.

In [None]:
import random

Once we have `random` (the module) loaded, we can use `random` (the function) which is present in that module:

In [None]:
random.random()

Here's another example. The `Counter` is basically a dictionary, but with a few special properties. For the following, we'll also utilize `random.randint` to generate a list of random integers and showcase list comprehension (see the aside below).

In [None]:
from collections import Counter  # "collections" is also a part of the standard library

In [None]:
random_integers = [random.randint(1, 5) for _ in range(20)]
random_integers

In [None]:
Counter(random_integers)

What did `Counter` do? It efficiently counted the number of times each of the integers occured in the list! Pretty neat.

# Aside: List comprehensions

The venerable for loop is a main stay of computer programming, but in python it holds extra functionality that is not always required. In many applications what is really desired is to take a list of items and create another list with the items transformed or filtered in some way. Python has a specific pattern for this type of transformation called a list comprehension. List comprehensions have benefits over for loops in that they are expressions, as opposed to statements, which means they are faster in many cases. Consider this article from 2004 on [efficient string concatenation](https://waymoot.org/home/python_string/) which compares the speed of using several solutions. The trade off is that you lose control flow abilities such as `break` and `continue`, but most of the use cases for these can also be included in a list comprehension using its `if` clause. Here is an example of a list comprehension computing all the even squares less than 100:

In [None]:
list_1 = [x**2
    for x in range(12)
    if (x**2 % 2) == 0
    if x**2 < 100
]

list_2 = []
for x in range(12):
    if (x**2 % 2 == 0) and (x**2<100): 
        list_2.append(x**2)

The two methods above for constructing the lists are equivalent.

In [None]:
list_1 == list_2

# Functions

In python, functions are objects which execute the same code, with possibly different arguments (or, inputs or parameters). In the same way that a mathematical function such as $f(x,y)$ can give you 5 or 9, depending on what `x` and `y` you give `f` as an input, functions in python operate similarly.

_If you ever find yourself repeating code many times in your program then it is a good idea to put it in a function!_ In particular, if any changes need to be made, they can be made in one place which reduces the possibility of introducing errors.

To make a function, you use the `def` keyword, and specify its signature using `()`, and it may return a value using the `return` keyword. There are some, hopefully helpful, examples below.

Arguments passed to functions can also be given default values, by assigning the default value when you define the function. This means when that function is called, that variable does not need to be included, it will be given that default value. This can enhance readability and reduce development time if used judiciously.

In [None]:
def do_addition(x, y):
    return x + y

def open_file(file_path):
    with open(file_path, 'r') as f:  # a context manager using the `with` keyword
        opened_file = f.read()
    return opened_file

def tell_me_what_im_thinking_of(what_im_thinking_about, polite=False):
    oracles_repsonse = f"You are thinking about {what_im_thinking_about}. "
    if polite:
        oracles_repsonse += "Thanks!"
    return oracles_repsonse

Above, we sneakily introduced a few new concepts!
* The [with statement](https://docs.python.org/3/reference/compound_stmts.html#with) is used to wrap a block of code in a special way. The context manager that generally comes after `with` has special behaviors. For example, `with open(...) as f` will first store the context manager object as `f`, but it will also define certain behaviors that automatically execute before and after the code block is run. In this case, the context manager opens the specified file, allowing you to do certain things with it (such as reading it via `f.read()`). Then, even though it's not explicitly stated, the context manager will actually close the file for you once the code block is done executing.
* Default arguments, such as `polite=False`, are exactly what they sound like. If the `polite` keyword argument is provided, it will override the default value of `False`, but if `polite` is not provided, it will default _to_ `False`.
* The `return` statement tells the function what value to produce at the end of it.

In [None]:
one_plus_two = do_addition(1, 2)
print(one_plus_two)
print(do_addition(6, 7))
print(tell_me_what_im_thinking_of("a nice lunch", polite=True))

# Extra content

TK