# A Beginner's Guide to Programming in Python

Welcome to python, the second language of QBIO490! This document will take you through the basics of python before we jump right into our analyses. Let's get started!

## Setting up your working directory

Just like in R, if you want to use relative file paths, you need to know where you are in terms of your directory. Run the following code to set your working directory to the analysis_data folder.

In [None]:
import os

print("Current working directory: {0}".format(os.getcwd()))

os.chdir('/PATH/TO/ANALYSIS_DATA')

print("New working directory: {0}".format(os.getcwd()))

## Deliverables
Before turning in the tutorial, do the following to make sure your code works properly:
1. Uncomment the `assert` statements at the end of the exercises.
2. Restart the kernel and run all of the exercises (Kernel > Restart & Run All).
3. Make sure that you pass the `assert` statements at the end of each exercise. 

As usual, turn in the tutorial by GitHub!

## Indentation

In the other programming languages you've used before, such as R, you have defined code blocks using curly braces. Python is completely different, in that it uses indentation to demark a new code block. You'll see this in the looping, control flow, and function parts of the guide. 

## Indexing

Python uses zero-based indexing, which means that the first element in a data-structure has the index of 0, and the second element has an index of 1, and so on. So to access the first thing in a list called fruits, you would do: `fruits[0]`.

## Defining Variables

In Python, like in R, variables are not typed; meaning you don't declare a variable as a specific type. To assign to variables in Python, you use the equals sign.

In [3]:
x = 10
words = "hello world!"

x

10

## Accessing and Modifying Variables

There are two ways to modify variables. For example, to add 2 to some variable x, we can either do the traditional way: x = x + 2, or with a special operator x += 2. There are equivalent operators for subtraction (-=), multiplication (*=), division (/=), etc. 

**Exercise:**

Below, write the short version for the following variable assignments.

In [None]:
x = 4
y = 2

# 1. y = y / x (example is filled in below)


# 2. y = y * 3


# 3. x = x - y


print(x,y)

# assert((x, y) == (2.5, 1.5))

## Printing

Printing is pretty straightforward in Python. To print, you use the print() function, where you put what you want to print in the parentheses. Ie. print("This is the word: ", word)

## Special formatting

Sometimes, printing strings and variables together can get clunky and hard to read. If you put f in front of the string (i.e. single/double quotes) and put variables in curly braces, it automatically substitutes that variable in the string!

In [63]:
def count_letters_2(word):
    word_len = len(word)
    print(f"{word} has {word_len} letters in it!")
    # as opposed to print(word, "has", word_len, "letters in it!")
    
count_letters_2("bananas")

bananas has 7 letters in it!


Exercise. Write a function, print_args(a, b) that prints a and b using the string formatting trick. For example, print_args("red", "blue") will print a is red, b is blue.

In [None]:
# write your function here

## Objects
In Python, everything is an object, including packages and functions. Very abstractly, an `object` is a specially-defined data type, and it has the following two attributes (i.e. it stores the following information):

* Data attributes: these store variables.
* Methods: these are functions.

To access data attributes, use object_name.attribute (note the lack of parentheses). To call a function from an object, use object_name.function() (note that these have parentheses).

We'll use this notation in the next section when we introduce lists (which are a great example of objects).

## Data Structure: Lists
Lists are the standard array data structure in Python (being ordered and changeable). Lists will be the main in-built data structure we use in python. You declare a list using square brackets (my_list = [1, 2, 3]).

Declare a list called example_list that contains your age, name, and a boolean value for if you are a first-year student.

Print the following: "Here is some info about me: \<example_list goes here\>"

In [None]:
# write code here

### Accessing Values in a List

Just like in R, we can use bracket notation [] to access value(s) within a list.

In [None]:
list = ["hola", "bonjour", "hallo", "ciao", "你好", "olá", "أهلا", "こんにちは", "안녕하세요", "привет"]

In [None]:
print(list[5]) # outputs the value at index 5 (the 6th value in the list)

To access a set of values, we can use a colon (:) and specify the first and last+1 indeces of that set. Note that the range is inclusive of the first index, but not of the second (which is why we must specify the last+1 index as our second input).

In [None]:
print(list[3:10]) # outputs the values from index 3 to index 9 (the 4th through 10th values)

If you don't specify an index when using the [:] notation, Python will default to the beginning/end, depending on which index you omit (and if you omit both, it will give the entire array).

In [None]:
print(list[:5]) # outputs all values up to index 5 (the 1st through 6th values)

print(list[5:]) # outputs all values starting at index 5 (the 6th through nth values)

print(list[:]) # outputs all values in the list

In [None]:
test = [1, 2, 3, 4, 5, 6]

**Exercise:** Access the following from `test`:

1. 5th value only (5)
2. First through 4th values (1, 2, 3, 4)
3. Last two values (5, 6)
4. Create a new list called `list2` which contains the last three values of `test`.

In [None]:
# write code here, uncomment the assert statement to check your work

# assert list2 == [4, 5, 6]

There are many more ways to splice a list, but we won't go into them here. Feel free to look up python list splicing to explore more on your own time!

### List Functions
There are many functions we can use on lists, here are just a few particularly helpful ones:

len(): This function gives us the length of the list. Note that this is not a method, `object.method()`, it is just a regular function, `function(args)`.

.append(): This method allows you to add an element to the back of the list.

.count(): This method returns the number of elements with the specified value within your list.

.index(): This method returns the index of the first element with the specified value within your list.

.sort(): This method sorts your list.

In [None]:
test = [21, 1, 1, 2, 3, 5, 8]

**Exercise:** Do to following things to `test`.

1. Count the number of times the value "1" appears within our list.
2. Print the index of "8".
3. Append "13" to the back of our list.
4. Print the length of our list.
5. Sort the list.

In [None]:
# write code here, uncomment the assert statement to check your work

# assert test == [1, 1, 2, 3, 5, 8, 13, 21]

## Control Flow

### If, Elif, Else

Python uses if statements like R, with three main differences.

Instead of curly brackets, you have colons and indents.
You don't put the if statement in parentheses.
Instead of else if, you have the abbreviated elif.

In [1]:
x = -10

if x > 0:
    print('x is positive!')
elif x == 0:
    print('x is 0!')
else:
    print('x is negative!')

x is negative!


Because there are no brackets like R, Python relies on the indentations to decide what goes in and out of an if/elif/else statement. If there are problems with indentations, or if your indentations are not the same number of spaces (let's say, 3 spaces vs. 4 spaces), the statements will not execute. 

### Loops
#### For loops

Also like R, there are for and while loops. Like R, all for loops in Python are "for-each" loops, meaning you have to go through a list. For example, the following chunk of code prints each element in a_new_list on a separate line. Like the if statements, you do not use parentheses around the for condition:

In [2]:
a_new_list = [1, 'fish', 2, 'fish']
for x in a_new_list:
    print(x) 

1
fish
2
fish


If you know the certain amount of times you want to repeat something, use the range() function like so. 

In [None]:
# this loop will print 10 times
for i in range(10):
    print("looping")

However, Python indexes at 0 instead of 1 like R. So if you run this line: 

In [None]:
for i in range(10):
    print(i)

You'll see it prints 0-9 instead of 1-10! Many other languages like C++ follow this zero-based indexing. 

**Exercise:** Fill in the ellipses to calculate the mean of the elements in nums.

In [None]:
nums = [1,2,3,4,5,6]
total = 0

for i in ...:
    total += ...

mean_value = total / ... # DO NOT fill in 6 (use a function instead)

# assert(total == 3.5)

**Exercise:** Add every element from a_new_list onto the end of num_list using a for loop using append() and the range() function. Hint: for this to work, you'll have to get the length of a_new_list.

In [None]:
a_new_list = [1, 'fish', 2, 'fish']
num_list = [0,1,2,3,4,5,6]

## Put your code here

# assert(num_list == [0,1,2,3,4,5,6,1,'fish',2,'fish'])

**Exercise:** Given the following list of strings string_list, copy all strings that start with the letter "A" into starts_A_list using append(). Hint: you can get the first letter of a string just by treating it as an array of characters.

In [None]:
# example of string indexing
my_string = "Tree"
print(my_string[0])

In [None]:
string_list = ["Apple", "Banana", "Alligator", "Anteater", "Potato", "Water", "Aardvark"]
starts_A_list = []


# assert(starts_A_list == ["Apple", "Alligator", "Anteater", "Aardvark"])

#### While loops

While loops in Python are the same as in R, except again without curly brackets and with colons instead. Again, like if/elif/else statements and for loops, Python relies on indents to figure out what's in the loop and what isn't. 

In [None]:
i = 1
while i < 64:
    i *= 2  # note: this is equivalent to writing i = i * 2
    print(i)

## Importing Packages

Like R, we can perform a lot more advanced things using our code by using packages. Importing packages in Python uses the import keyword (vs. library() in R). Let's import the first package we're going to use, numpy. We'll use the "as" keyword to call it np to save typing, which is a standard abbreviation. You'll see that other Python packages also have standard abbreviations.

In [None]:
import numpy as np

As mentioned previously, you have to prefix everything from numpy with np. For example, numpy includes the constant pi and the sine function. Here's how you would call the sine of pi radians using np. 

In [None]:
np.sin(np.pi)

This line is the same as using numpy.sin(numpy.pi) but again, importing using a standard abbreviation saves us a lot of typing. 

**The two main takeaways of importing packages are:**
1. Always use the import statement. This is your library() function in R. 
2. Put the package name before the period in front of any function that is specific to the package. 

There are more complicated ways to import packages. 

In [None]:
import matplotlib.pyplot as plt

pyplot is the plotting functionality of matplotlib, so this import statement would only import pyplot and any of its dependencies in matplotlib. 

An easier way to do this if you just want a specific function(s) in a package is using the "from" keyword. 

In [None]:
from numpy import pi
from numpy import sin

In this case, you would only get pi and sin from numpy. You wouldn't get something like cos, since we only imported pi and sin.

## Numpy Arrays

While numpy has a bunch of useful functions, the real meat of numpy are the (multidimensional) arrays it implements, called the ndarray. It has the following properties:

* A fixed size.
* A shape (dimension).
* Its contents must be the same data type.

First, let's look at a 1D array. You can declare one by calling passing a list into the function np.array().

In [33]:
arr = np.array([1, 2, 3])
arr

array([1, 2, 3])

Why is the ndarray (and the numpy package in general) important? For one, we can use vectorized functions on them. For example, you can quickly perform mathematical operations on the entire array:

In [42]:
print(arr + 1)

[2 3 4]


Another benefit is that you get extra methods that you can apply on the arrays. For example, you can quickly find the mean and variance of the values in your array without having to write those functions yourself.

In [44]:
print(arr.mean()) # np.mean(arr) is the equivalent function, but it is much slower
print(arr.var()) # np.var(arr) is the equivalent function, but it is much slower

2.0
0.6666666666666666


Accessing values from a 1D array is the same as accessing values from a python list.

In [46]:
print(arr[2])
print(arr[:])
print(arr[0:2])

3
[1 2 3]
[1 2]


You can also create 2D arrays with numpy (not quite data frames, we'll cover that in the pandas section). The way you declare one is very similar to making the 1D array, except you pass it a list of lists.

In [47]:
arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

2D arrays support all of the functionality of 1D arrays (vectorized functions, .mean(), .var(), accessing values/splicing) and also have some additional attribute functionality.

* .shape returns the dimensions of our 2D array
* .T returns the transposed version of our 2D array (note that this is a capitalized T!)

**Exercise:**

1. What are the dimensions of `arr2d`?
2. Create a new array called `t_array` with the transposed version of `arr2d`.

In [61]:
# write code here, uncomment the assert statement to check your work

# assert np.all(t_array == [[1,4,7], [2,5,8], [3,6,9]])

There's not too much else you need to know about numpy arrays, since most of your data will be in a data frame. Let's move on to pandas!