# T02: Python Basics

## Why Use Python?

Python is a versatile high-level programming language. It is free, unlike MATLAB, and packages are open-source, which means that anybody can use and create Python tools. Python is ideal for manipulating data in dataframes, performing statistical analyses, and creating plots and graphs (including interactive ones!). It is also very useful in machine learning and image analysis. Because of its intuitive and easy-to-read syntax, Python is a good language for programming beginners.

## Python vs. R

Python and R are the languages of choice for data scientists. Many scientists use R because it has been optimized for statistical analysis and data visualization. Python, on the other hand, is a general-purpose language with more functionality and is easier to customize because of its clear syntax. I prefer to use Python and have therefore chosen to write this bootcamp in it. There will be some mentions of easy-to-use features in R, such as plotting phylogenetic trees and running linear models.

## Arithmetic and Operators

Basic arithmetic can be easily performed in Python using the following operators:

| Operation| Python symbol|
|:--------:|:------------:|
|addition |+|
|subtraction|-       |
|multiplication  |*      |
|division | / |
|exponentiation | ** |
|modulo | % |

The operations follow standard order of operations:

In [1]:
2 + 3 * 4 - 5**2

-11

In [2]:
(2 + 3) * 4 - 5**2

-5

If you run the code above, the values are shown. If you run multiple lines in the same code cell, you must use the `print()` function to view them all. If not, only the last output is shown. Functions are called with arguments (inputs) inside parentheses like so:

In [3]:
2 + 3 * 4 - 5**2
(2 + 3) * 4 - 5**2

-5

In [4]:
print(2 + 3 * 4 - 5**2)
print((2 + 3) * 4 - 5**2)

-11
-5


## Types of Python objects

Python objects have different types. The main types are strings, integers (ints), floats, and booleans. 

<ul>
    <li><b>Strings</b> are sequences of characters and are surrounded by quotes. Characters can include not only letters, but also numbers and symbols such as punctuation and spaces. Examples of strings include 'cat', 'The dog is happy!', and 'asdf1234!! !@#^&'. </li>
    <li><b>Ints</b> are integers such as 1, 2, 3, 512, 87743, and -3432178. </li>
    <li><b>Floats</b> are decimals, also known as floating point numbers. Examples of floats include 1.5, 3.141592654, 112.0, and -32.018. </li>
    <li><b>Booleans</b> can be either True or False.</li>
</ul>

If you convert a float to an integer, it will be rounded:

In [5]:
int(4.3)

4

In [6]:
round(4.3)

4

We can see the type of an object using the `type()` function.

In [7]:
type('hi')

str

In [8]:
type(15.8)

float

In [9]:
type(10)

int

Strings can be added together, which concatenates them. This will include all special characters and any whitespace in them. The other operations can not be performed on strings because they don’t make sense.

In [10]:
'Hello' + ' world'

'Hello world'

Strings can be denoted by single, double, or triple quotes. This allows you to nest quotes inside of strings

In [11]:
print('This is a sentence with single quotes')
print("This is a sentence with double quotes")
print('''This is a sentence with triple quotes''')
print("""This is also a sentence with triple quotes \n""")

# create a string with quotes inside it
print("I told my friend, 'learn Python!' \n")

This is a sentence with single quotes
This is a sentence with double quotes
This is a sentence with triple quotes
This is also a sentence with triple quotes 

I told my friend, 'learn Python!' 



In the above cell, I wrote a <b>comment</b>, which starts with the <b>#</b> character. Comments allow you to write notes or descriptions within code cells, but they are not run as code. Writing comments is a good practice so that others (or future you) know why you wrote certain code. 

I also included <b>newline characters</b> in the last 2 quotes. As the name suggests, this creates a line break and improves the readability of your code.

## Lists, Arrays, and Dictionaries

When programming, we often work with many related pieces of data. To be organized and efficient, we store them within a single variable called a <b>data structure.</b> This way, we can efficiently perform operations on the data structure as a whole rather than having to operate on each piece of data as a separate variable.

We will primarily be using 3 types of data structures: lists, dictionaries, and arrays.
<ul>
<li><b>Lists</b> store pieces of data in a sequence, so that data can be referenced by its index in the list.</li>
  <li><b>Dictionaries</b> store pieces of data under labels called <i>keys</i>, so that data can be referenced its key in the dictionary.</li>
<li><b>Arrays</b> store data in matrices and higher-dimensional matrices called <i>tensors</i>, so that data can be referenced by its index in the matrix or tensor.</li>
</ul>

A list can contain elements of different types, but an array can only have elements with the same type. Arrays are most commonly created using the `numpy` (pronounced num-pie) package. A <b>variable</b> is a symbolic name used to store an object. To create a variable, simply assign an object to a name.

Lists are set off by brackets and dictionaries with braces.

In [12]:
my_list = [1, 2, 'apple', False]

Here, I have created a list with multiple data types. We can call the `type()` function on it.

In [13]:
type(my_list)

list

Lists can be concatenated by adding them together:

In [14]:
my_other_list = ["where", 50, True]
big_list = my_list + my_other_list
big_list

[1, 2, 'apple', False, 'where', 50, True]

To create a numpy array, first import the numpy package, then create the array. Most people assign the `numpy` package to the short form `np`.

In [2]:
# first import the package
import numpy as np

# next, create an array and assign it to a variable
my_array = np.array([1, 2, 3])
type(my_array)

numpy.ndarray

If you create an array with objects of different types, then the array will force them to be of the same type.

In [5]:
my_array_2 = np.array([1, 2, "apple", False])
my_array_2

array(['1', '2', 'apple', 'False'], dtype='<U21')

See how all of the elements in `my_array` became strings, unlike in the list. The integers and boolean are now strings.

Numpy arrays are optimized for efficient calculations. The calculations are implemented in C in the package, so they are more efficient than writing the same calculations in Python. Numpy performs the calculations element-wise for all operations:

In [6]:
array_1 = np.array([1, 2, 3, 4])
array_2 = np.array([10, 9, 8, 7])

array_1 + array_2

array([11, 11, 11, 11])

In [8]:
array_1 * array_2

array([10, 18, 24, 28])

Lists can not be multiplied in this way. You can multiply a list by a number, but this copies the list by the given number, instead of multiplying the individual elements. 

In [9]:
# copies the given list 3 times and makes a new list
[1, 3, 4] * 3

[1, 3, 4, 1, 3, 4, 1, 3, 4]

Numpy has some other handy functions like making an array of only ones or zeros. The argument to these functions is the number of ones or zeros that should be in the array. 

In [11]:
# array of 5 zeroes
print(np.zeros(5))

# array of 3 ones
print(np.ones(3))

[0. 0. 0. 0. 0.]
[1. 1. 1.]


## Tuples

Tuples are another type of data structure. They are denoted by parentheses, and unlike lists and arrays, they are <b>immutable</b>. If you create a list or array, you can change elements of them without a problem. Tuples, however, can not be changed after they are created. You can only generate a new tuple.

In [13]:
my_tuple = (3, 5, 10, 34)

# try to change the third element. Throws an error
my_tuple[2] = 13

TypeError: 'tuple' object does not support item assignment

Like lists, tuples can contain elements of different types. You can store strings, integers, and floats into the same tuple. Tuples are useful for storing paired data, like x, y, and z coordinates for a point. Often, you can pass a list of tuples into a plotting function. 

## Indexing Data Structures

To get items from a list or array, we use <b>indexing</b>. Python uses <i>0-indexing</i>. This means that the first element in a list or array is at position 0. In a list of 10 elements, the last element is at position 9. We use indexing to extract elements at particular positions.

Indexing can also run in the reverse direction. The index -1 is the last element, -2 is the second to last element, and so on and so forth. So we can get the last element using indices -1 or 2.

In [16]:
# prints the 4th element
print(big_list[3])

# prints the second to last element
print(big_list[-2])

False
50


You can also get multiple elements at a time, instead of just one. This is done using <b>slicing</b>. To slice a data structure, separate the first and last indices by a `:` character. But in 0-indexing, THE END POINT IS EXCLUSIVE. So if you slice `3:9`, you will get elements from the 3rd index to the 8th index (4th through 9th elements). 

**Note: R uses 1-indexing, where the first element has an index of 1, and both start and end points are inclusive. Beware of this difference between Python and R.**

In [17]:
# 3rd through 5th elements
big_list[2:5]

['apple', False, 'where']

If you omit either the start or end points, slicing will by default start at the beginning or end of the data structure. You can also reverse lists and arrays and get every nth element using slicing syntax.

In [18]:
# get all elements starting at the beginning and ending at the 3rd index
big_list[:4]

[1, 2, 'apple', False]

In [19]:
# reverse an array
my_array[::-1]

array([3, 2, 1])

In [20]:
# get every 3rd element starting with the first element
print(big_list[::3])

# get every 3nd element starting with the first element, then reverse it
print(big_list[::2][::-1])

[1, False, True]
[True, 'where', 'apple', 1]


A dictionary is a set of pairs of data in a lookup table. Each pair is a <b>key, value</b> pair, and you access a value with its corresponding key.

In [21]:
my_dict = {1:"January", 2:"February", 3:"March", 4:"April", 5:"May", 6:"June"}

# get the third value
my_dict[3]

'March'

You add key, value pairs and modify existing values using similar syntax:

In [22]:
# add another key, value pair
my_dict[7] = "July"

# change an existing one
my_dict[1] = "December"

print(my_dict)

{1: 'December', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June', 7: 'July'}


## Logic and Conditional Statements

Conditional statements are important for deciding how to proceed when a certain condition is met. For example, you may want to compute a mean and standard deviation if your data is normally distributed, and compute the median and range if the data are not. 

To perform comparisons, there are several types of operators, which return <b>booleans</b>. The following table shows <b>comparison operators</b>, which compare two values:

| Operator| Meaning|
|:--------:|:------------:|
|== | equal to|
|!= | not equal to |
|>| greater than       |
|<  | less than      |
|>= | greater than or equal to |
|<= | less than or equal to |

In [23]:
x = 10
y = 5
x > 10

False

You can also compare two statements using <b>and</b>, <b>or</b>, and <b>not</b>.

In [24]:
x > 5 and y > 5

False

This following statement returns False because both statements are not true. But the next one returns True because at least one statement needs to be true for OR logic to be true.

In [25]:
x > 5 or y > 5

True

The not operator returns the negative of the statement. So we can convert the AND statment to `True` and the OR statement to `False`:

In [28]:
print(not(x > 5 and y > 5))
print(not(x > 5 or y > 5))

True
False


## If Statements

Now that we can perform comparisons, we are ready to build conditional statements. In Python, this is usually done with <b>if statements</b>, which are structured as follows:

In [33]:
if y > x:
    print("y is larger than x!")

The statement above doesn't print anything because the if statement was not true. If we want an output if the statement is false, then we combine it with <b>else</b> and <b>elif</b> statements:

In [34]:
if y > x:
    print("y is larger than x!")
elif x > y:
    print("x is larger than y!")
else:
    print("x and y are equal!")

x is larger than y!


In an if-elif-else statement, <b>if</b> the first condition is met, the code in the indented block after it is executed. <b>elif</b> is short for <b>else-if</b>. If the conditional in the preceding <b>if</b> statement is False, then the Python interpreter evaluates the conditional in the <b>elif</b> statement. If neither statement is true, then the code in the <b>else</b> block is evaluated. 

You can include many <b>elif</b> blocks in a single if-elif-else statement, but only a single else block. You can also have an if-elif or an if-else statement:

In [36]:
if y > x:
    print("y is larger than x!")
elif x > y:
    print("x is larger than y!")

x is larger than y!


In [37]:
if y > x:
    print("y is larger than x!")
else:
    print("y is less than or equal to x!")

y is less than or equal to x!


If you have multiple if statements, they are evaluated as separate conditionals:

In [39]:
# check if y is less than or equal to x
if y <= x:
    print("y is less than or equal to x!")
    
# see if y is an int
if type(y) == int:
    print("y is an integer")

y is less than or equal to x!
y is an integer


## Built-in Functions

We have already been using the `type()` and `print()` functions, which are <b>built-in</b> Python functions. `len()` is another example, which returns the length of an object.

In [40]:
# length of a string is the number of characters, including whitespace and special characters
print(len("hello, how are you?"))

# length of an array or list is the number of elements
print(len(big_list))

# length of a dictionary is the number of key, value pairs
print(len(my_dict))

19
7
7


`range()`, `enumerate()`, `zip()`, and `list()` are other common built-in functions. In the next tutorial, we will learn how to write your own functions.

## Getting Help

Because Python is open source and is used by so many people, it is very easy to get help on performing different tasks. 

As mentioned above, <a href="https://stackoverflow.com/" target="_blank">Stack Overflow</a> is a great resource, where developers ask and answer questions. Nearly all common questions have been answered, so if you search how to do a particular task, even if it's as simple as "how to switch the rows and columns of a dataframe?", you will get many answers. 

You will inevitably encounter errors while coding in Python. Python prints out reasonably good error messages that almost always identify the line of code that is causing the issue. Sometimes the true issue is in another line of code, but it only manifests when a different line is run. 

If, after debugging on your own, you can't determine the cause of a bug, searching the web with the exact error message is often  helpful. Sometimes it can even give you suggestions on something that went wrong that you may not have thought of. 