# A Beginner's Guide to Programming in Python

Welcome to python, the second language of QBIO490! This document will take you through the basics of python before we jump right into our analyses. Let's get started!

## Jupyter Notebook

This is Jupyter Notebook! It is a 'notebook' style integrated development environment (IDE) for Python scripts. There are comment blocks (like this)

In [None]:
# And there are code blocks (like this)
print('This is a code block!')

Run code blocks by pressing the 'run this cell' arrow button to the right of the codeblock execution counter (this is the 'In [ ]') on the top left of each cell. Everytime you run a cell, the execution counter will increase for all programs by 1. This lets you easily see the order that cells were executed in. <br />    
You can also run the current cell selected by pressing the run button at the top or pressing shift and enter.<br/>
<br/>
You can change the type of block in the top menu by switching between <u>Markdown</u> and <u>Code</u> <br/>
<br/>

### This is currently a markdown cell. Change it to code and run the code
print("I won't run unless I'm in a code block")

You can create a new code block, or <u>cell</u>, underneath the currently selected one with the plus icon at the top. You can move cells up and down with the arrows next to run. Delete a cell by selecting the entire cell and then pressing X.

## Python

Python as a language has a lot of similarities to R with regards to syntax and execution, but some differences as well. Let's explore the basic syntax of python.

## Setting up your working directory

Just like in R, if you want to use relative file paths, you need to know where you are in terms of your directory. Run the following code to set your working directory to the analysis_data folder.

In [None]:
import os

print("Current working directory: {0}".format(os.getcwd()))

os.chdir('PATH/TO/analysis_data')

print("New working directory: {0}".format(os.getcwd()))

## (1) Indentation

In the other programming languages you've used before, such as R, you have defined code blocks using curly braces. Python is completely different, in that it uses indentation to demark a new code block. You'll see this in the looping, control flow, and function parts of the guide. 

## (2) Indexing

Python uses zero-based indexing, which means that the first element in a data-structure has the index of 0, and the second element has an index of 1, and so on. So to access the first thing in a list called fruits, you would do: `fruits[0]`.

## (3) Variables

In Python, like in R, variables are not typed; meaning you don't declare a variable as a specific type. To assign to variables in Python, you use the equals sign.

In [None]:
my_int = 4
my_float = 4.3
my_bool = True
my_char = '4'
my_string = "hello"
# Notice when using quotes it doesn't matter whether you use single or double quotes

### Accessing and Modifying Variables

There are two ways to modify variables. For example, to add 2 to some variable x, we can either do the traditional way: `x = x + 2`, or with a special operator `x += 2`. There are equivalent operators for subtraction (-=), multiplication (*=), division (/=), etc. <br/>

<br/>**Exercise 3.1**

Below, write the short version for the following variable assignments.

In [None]:
x = 4
y = 2

#1. y = y / x (example is filled in below)
y /= x

# 2. y = y * 3

# 3. x = x - y

print(x,y)

## (4) Printing

Printing is pretty straightforward in Python. To print, you use the print() function, where you put what you want to print in the parentheses. E.g. `print("This is the word: ", word)`

### Special Print Formatting

Sometimes, printing strings and variables together can get clunky and hard to read. If you put f in front of the string (i.e. single/double quotes) and put variables in curly braces, it automatically substitutes that variable in the string!

In [None]:
my_var1 = 'red'
my_var2 = 'blue'

print(f'My first variable is {my_var1}, my second variable is {my_var2}.')

## (5) Functions
Functions are user-defined bits of code that can be called with arguments to run a specific line of code or return a value. You can declare a function with `def`.

In [None]:
def my_function():
    print('I am a function!')

my_function()

You can implement parameters by adding local variables (variables that only exist within the scope of the function) and then give a return value with `return`.

In [None]:
def square_function(my_input):
    my_output = my_input**2 # **2 is equivalent to raising it to the second power
    return my_output

print(square_function(8))

**Exercise 5.1**

Write a function, `print_args(a, b)` that prints two variables, a and b, using the string formatting trick and then call it on two variables. For example, `print_args("red", "blue")` will print "a is red, b is blue".

In [None]:
# write code here

## (6) Objects
In Python, everything is an object, including packages and functions. Very abstractly, an object is a specially-defined data type, and it has the following two attributes (i.e. it stores the following information):

+ Data attributes: these store variables.
+ Methods: these are functions.

To access data attributes, use object_name.attribute (note the lack of parentheses). To call a function from an object, use object_name.function() (note that these have parentheses).

We'll use this notation in the next section when we introduce lists (which are a great example of objects).

## (7) Data Structure: Lists
Lists are the standard array data structure in Python (being ordered and changeable). Lists will be the main in-built data structure we use in python. You declare a list using square brackets: `my_list = [1, 2, 3]`. 

In [None]:
# Lists can be defined over multiple lines as well.
my_list = [8,
          'three',
          7]

print (my_list)

**Exercise 7.1**

Declare a list called `example_list` that contains your age, name, and a boolean value for if you are a first-year student.

Print the following: "Here is some info about me: \<`example_list` goes here\>"

In [None]:
# write code here

### Accessing Values in a List

Just like in R, we can use bracket notation [] to access value(s) within a list.

In [None]:
greetings_list = ["hola", "bonjour", "hallo", "ciao", "你好", "olá", "أهلا", "こんにちは", "안녕하세요", "привет"]

In [None]:
print(greetings_list[5]) # outputs the value at index 5 (the 6th value in the list)

To access a set of values, we can use a colon (:) and specify the first and last+1 indices of that set. Note that the range is inclusive of the first index, but not of the second (which is why we must specify the last+1 index as our second input). This is called splicing.

In [None]:
print(greetings_list[3:10]) # outputs the values from index 3 to index 9 (the 4th through 10th values)

If you don't specify an index when using the colon and bracket notation, Python will default to the beginning/end, depending on which index you omit (and if you omit both, it will give the entire array).

In [None]:
print(greetings_list[:5]) # outputs all values up to index 5 (the 1st through 6th values)

print(greetings_list[5:]) # outputs all values starting at index 5 (the 6th through nth values)

print(greetings_list[:]) # outputs all values in the list

print(greetings_list[-1:]) # using a negative number results in starting from the end

In [None]:
test = [1, 2, 3, 4, 5, 6]

**Exercise 7.2** Access the following from `test`:

1. 5th value only (5)
2. First through 4th values (1, 2, 3, 4)
3. Last two values (5, 6)
4. Create a new list called `list2` which contains the last three values of `test`.

In [None]:
# write code here

There are many more ways to splice a list, but we won't go into them here. Feel free to look up python list splicing to explore more on your own time!

### List Functions
There are many functions we can use on lists, here are just a few particularly helpful ones:

`len()`: This function gives us the length of the list. Note that this is not a method, `object.method()`, it is just a regular function, `function(args)`.

`.append()`: This method allows you to add an element to the back of the list.

`.count()`: This method returns the number of elements with the specified value within your list.

`.index()`: This method returns the index of the first element with the specified value within your list.

`.sort()`: This method sorts your list.

In [None]:
test = [21, 1, 1, 2, 3, 5, 8]

**Exercise 7.3**

Do to following things to `test`.

1. Count the number of times the value "1" appears within our list.
2. Print the index of "8".
3. Append "13" to the back of our list.
4. Print the length of our list.
5. Sort the list.
6. Print the newly sorted list.

In [None]:
# write code here

### Side Note on Other Data Structures
There are a few other data structures in Python that we generally will not use but it's still worth
going over them.


<u>Tuples</u> are like lists except they are immutable (they can't be changed once defined). They are more memory efficient and can be computationally advantageous but otherwise work the exact same. They are defined with parentheses instead of square brackets.

In [None]:
my_tuple = (1, 'test')
print(my_tuple)
print(type(my_tuple))

<u>Sets</u> are immutable and unordered. Duplicate values are not allowed. They are defined with curly braces.

In [None]:
my_set = {1, 3, 3, 'test', False}
print(my_set)

Notice how the order is not the same as originally defined.
<br></br>
<u>Dicts</u> are a very unique data structure. If you are familiar with hashmaps in other languages, they work pretty much the same. In a dict, each element consists of a value and a key. They are defined with curly braces and colons.

In [None]:
my_dict = {
    'name' : 'SpongeBob',
    'species' : 'Spongiforma squarepantsii',
    'hobbies' : ['Jellyfishing', 'Frycooking', 'Blowing bubbles'],
    'square pants' : True
}

print(f'Hi my name is {my_dict["name"]}. I am a {my_dict["species"]}.')
print(f'My hobbies include {my_dict["hobbies"]}')
if my_dict['square pants']:
    print('I have square pants!')
else:
    print('My pants are not square :(')

As you can see, each key, value pair in a dictionary can be any type of variable. Dictionaries are very powerful data structures and we will encounter them within Pandas dataframes but for this course you will generally not need to know how to use them.

## (9) Importing Packages

Like R, we can perform a lot more advanced things using our code by using packages. Importing packages in Python uses the `import` keyword (vs. library() in R). Let's import the first package we're going to use, numpy. We'll use the `as` keyword to give it the shorthand `np` to save typing, which is a standard abbreviation you will see practically everywhere. You'll see that other Python packages also have standard abbreviations.
<br></br>
Note: It's good practice to import packages only once. In scripts, they are generally put at the very top before everything else. Here, we will import the packages as we need them.

In [None]:
import numpy as np

As mentioned previously, you have to prefix everything from numpy with `np`. For example, numpy includes the constant pi and the sine function. Here's how you would call the sine of pi radians using np. 

In [None]:
np.sin(np.pi/2)

This line is the same as using `numpy.sin(numpy.pi/2)` but again, importing using a standard abbreviation saves us a lot of typing. 

**The two main takeaways of importing packages are:**
1. Always use the `import` statement. This is your library() function in R. 
2. Put the package name before the period in front of any function that is specific to the package. 

There are more complicated ways to import packages. 

In [None]:
import matplotlib.pyplot as plt

`pyplot` is the plotting functionality of `matplotlib`, so this import statement would only import pyplot and any of its dependencies in matplotlib. 

An easier way to do this if you just want a specific function(s) in a package is using the `from` keyword. 

In [None]:
from numpy import pi
from numpy import sin

In this case, you would only get `pi` and `sin` from numpy. You wouldn't get something like cos, since we only imported pi and sin. Now, pi and sin are imported as a function and a float so we don't have to call numpy to use them.

In [None]:
sin(pi/2) # this does not work unless we specifically import these two functions/variables

## (10) Numpy Arrays

While numpy has a bunch of useful functions, the real meat of numpy are the (multidimensional) arrays it implements, called the `ndarray`. It has the following properties:

* A fixed size.
* A shape (dimension).
* Its contents must be the same data type.

First, let's look at a 1D array. You can declare one by calling passing a list into the function `np.array()`.

In [None]:
arr = np.array([1, 2, 3])
arr

Why is the ndarray (and the numpy package in general) important? For one, we can use vectorized functions on them. For example, you can quickly perform mathematical operations on the entire array:

In [None]:
print(arr + 1)

Another benefit is that you get extra methods that you can apply on the arrays. For example, you can quickly find the mean and variance of the values in your array without having to write those functions yourself.

In [None]:
arr = np.arange(0, 501, 10) # we can get a list of every 10th number from 0 to 500 using the arange function

print(arr.mean()) # np.mean(arr) is the equivalent function, but it is much slower
print(arr.var()) # np.var(arr) is the equivalent function, but it is much slower

Accessing values from a 1D array is the same as accessing values from a python list.

In [None]:
print(arr[2])
print(arr[:])
print(arr[0:2])

You can also create 2D arrays with numpy (not quite data frames, we'll cover that in the pandas section). The way you declare one is very similar to making the 1D array, except you pass it a list of lists.

In [None]:
arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr2d

2D arrays support all of the functionality of 1D arrays (vectorized functions, `.mean()`, `.var()`, accessing values/splicing) and also have some additional attribute functionality.

* `.shape` returns the dimensions of our 2D array
* `.T` returns the transposed version of our 2D array (note that this is a capitalized T!)

**Exercise 10.1**

1. What are the dimensions of `arr2d`?
2. Create a new array called `t_array` with the transposed version of `arr2d`.

In [None]:
# write code here

There's not too much else you need to know about numpy arrays, since most of your data will be in a data frame. Let's move on to pandas!

## (13) Matplotlib

Matplotlib is the main plotting package in Python. Specifically, we will be using the `pyplot` module from matplotlib (the package is massive, so it's faster to just get the specific module you need (plt is the go-to common shortening for matplotlib.pyplot). Here's how you typically would import it.

In [None]:
import matplotlib.pyplot as plt

The workflow behind pyplot is somewhat familiar to plotting with R: you create the plot, then show the plot (or alternatively, save it to a file). For example, let's plot a simple sine wave:

In [None]:
# generates values from 0 to 10 in 0.1 intervals to plot
x_vals = [i/10 for i in range(0, 100)]
y_vals = np.sin(x_vals)

# sets up the plot area
# note that one function can have 2 return values in Python
fig, ax = plt.subplots(1, 1)  # this controls the number of subplots and how they're placed

# use ax to plot the data
ax.plot(x_vals, y_vals)
plt.show() # show the most recent plot created

Let's break down all the objects we made:

* `fig` doesn't really ever get used.
* `ax` controls the axes -- in short, it controls the variables you plot, the plot labels, etc.
* `plt`  is the plot module you imported, which you can think of as a "plot window." Basically, the plots you made get saved to plt, and from there you can see the plots you made.

You can look into the matplotlib.figure module on your own time to see all the options, though here's an example with using subplots and labeling the axis labels (using the set function):

In [None]:
# good to know: constrained_layout spaces the plots out so plot titles don't overlap

fig, ax = plt.subplots(2, 2, constrained_layout=True)

# ax is an list of lists (2D list) -- you need two brackets to access the data
ax[0][0].plot(x_vals, y_vals, color='red') 
ax[0][0].set(title = "sin(x)", ylabel='y', xlabel='x')

ax[0][1].hist(x_vals, color='green')  # a very boring histogram
ax[0][1].set(title = "just x values", ylabel='counts', xlabel='x values')

ax[1][0].scatter(x_vals, y_vals, color='purple', s=0.1)
ax[1][0].set(title = "scatter plot", ylabel='y', xlabel='x')

ax[1][1].plot(x_vals, y_vals, color='pink')
ax[1][1].plot(x_vals, np.cos(x_vals), color='black')
ax[1][1].set(title = "sin(x) and cos(x)", ylabel='y', xlabel='x')
ax[1][1].legend(['sin(x)','cos(x)'], loc='upper right')

plt.show()

To save a figure to your computer, you can either copy and paste it from this notebook, or (the better way), use plt.savefig() function.

For creating figures that will eventually be seen by others, you'll want to use the following arguments:
* `dpi=300`   - this ensures the saved pic will be clear even if blown up
* `bbox_inches='tight'`   - you probably won't need this unless you are playing with the axes positions or adding multiple subplots but it's still useful as it prevents different parts from being cutoff

## (14) Exercises
Only this section will be graded for credit!

### 1. Check for understanding (Functions, variables, and operations)

Define a function which takes in three arguments, a, b, and c, and prints any real roots of the polynomial function ax^2 + bx + c. Test it out on (x^2 + 5x + 6), (4x^2 - 1), and (5x^2 + 14x - 4). 

Hints:
> You can use np.sqrt( input ) to find the square root.

> Python follows PEMDAS order of operations.

> Raise something to the power using ** e.g. 4 raised to the power of 2 is 4**2

Use this if you are stuck:
https://www.cuemath.com/algebra/quadratic-equations/

In [None]:
# write code here

### 2. Guided exercise (Lists, numpy, matplotlib)
You're trying to build a house but unfortunately all of your coordinates got scrambled. There are 7 different lines which you must plot but you must fix the coordinate pairs first. Follow the directions below.

WITHOUT changing the code in the cell below, create code to change the lists to fix the house.

Part 1) Lists
1. x1 is missing its last x-coordinate. Add -8 to the end of it.
2. The value of y2 at index 3 is the incorrect sign. Replace the negative with positive.
3. Add the index of the first -4 in y3 to the end of x3.
4. The value at index 3 in x4 should be added to the end of x4.
5. Count the number of 6 in x5 and add the value to the end of y5
6. Reverse the orders of x6 and y6
7. Change x7 to only be the first 5 values and y7 to only be the last 5 values.

Part 2) Numpy
1. Change y1 to a numpy array and divide it by 1.5.
2. Oh no! Someone accidentally took the arctan of y2. Use the corresponding numpy command to set y2 to the np.tan() of itself.
3. Set x6 equal to the floor (round down to integer) using np.floor() and set y6 to the ceiling (round up to integer) using np.ceil().

Part 3) Matplotlib
1. Create a new plot
2. Use the ax.plot(x, y) command to plot each x-coordinate array against its corresponding y-coordinate array
3. Add a title describing the image

In [None]:
# DO NOT CHANGE THIS
x1=[-8, 0, 8]
y1=[9, 21, 9, 9]

x2=[-8, -8, 8, 8]
y2=[1.40564765, -1.44644133, -1.44644133, -1.40564765]

x3=[-1, -1, 1]
y3=[-8, -4, -4, -8]

x4=[-6, -3, -3, -6]
y4=[3, 3, 0, 0, 3]

x5=[6, 3, 3, 6, 6]
y5=[3, 3, 0, 0]

x6=[-5.4, -5.1, -2.4, -2.8, -5.1]
y6=[-2.4, -5.8, -5.4, -2.3, -2.3]

x7=[6, 3, 3, 6, 6, 8, 4, 3, 9]
y7=[5, 3, 5, 1, -2, -2, -5, -5, -2]

In [None]:
# write code here

### 3. Challenge Exercise (BONUS)
This exercise is optional and is worth extra credit.

Your goal is to duplicate this image as best as you can. The more closely your final figure looks, the more points you will get (up to 5). The entire figure can be created using just the matplotlib package and you shouldn't need to use any other plotting libraries. The data that will be plotted is given in its final form.

This will be helpful: https://matplotlib.org/stable/users/index

![fractal%20immitation.png](attachment:fractal%20immitation.png)

In [None]:
# data to be plotted
x=0
y=0
x_list=[]
y_list=[]
for i in range(10000):
    x_new=-.8+.1*x+1.1*x**2-1.2*x*y+.3*y+1.1*y**2
    y_new=.3*x+.4*x**2+.2*x*y-1.1*y+.7*y**2
    x = x_new
    y = y_new
    x_list.append(x)
    y_list.append(y)

In [None]:
# write code here