# T02: Python Basics

## Why Use Python?

Python is a versatile high-level programming language. It is free, unlike MATLAB, and packages are open-source, which means that anybody can use and create Python tools. Python is ideal for manipulating data in dataframes, performing statistical analyses, and creating plots and graphs (including interactive ones). It is also very useful in machine learning and image analysis. Because of its intuitive and easy-to-read syntax, Python is a good language for programming beginners.

## Python vs. R

Python and R are the languages of choice for data scientists. Many scientists use R for data cleaning and visualization because it has been optimized for statistical analysis and data visualization. Python, on the other hand, is a general-purpose language with more functionality and is easier to customize because of its clear syntax. I prefer to use Python and have therefore chosen to write this bootcamp in it. There will be some mentions of easy-to-use features in R, such as plotting phylogenetic trees and running linear models.

## Arithmetic and Operators

Basic arithmetic can be easily performed in Python using the following operators:

| Operation| Python symbol|
|:--------:|:------------:|
|addition |+|
|subtraction|-       |
|multiplication  |*      |
|division | / |
|exponentiation | ** |
|modulo | % |

The operations follow standard order of operations, like in the rest of mathematics. For example:

In [1]:
2 + 3 * 4 - 5**2

-11

In [2]:
(2 + 3) * 4 - 5**2

-5

If you run the code above, the values are shown. If you run multiple lines in the same code cell, you must use the `print()` function to view them all. If not, only the last output is shown. Functions are called with arguments (inputs) inside parentheses like so:

In [3]:
2 + 3 * 4 - 5**2
(2 + 3) * 4 - 5**2

-5

In [4]:
print(2 + 3 * 4 - 5**2)
print((2 + 3) * 4 - 5**2)

-11
-5


## Types of Python objects

Python objects have different types. The main types are strings, integers (ints), floats, and booleans. 

<ul>
    <li><b>Strings</b> are sequences of characters and are surrounded by quotes. Characters can include not only letters, but also numbers and symbols such as punctuation and spaces. Examples of strings include 'cat', 'The dog is happy!', and 'asdf1234!! !@#^&'. </li>
    <li><b>Ints</b> are integers such as 1, 2, 3, 512, 87743, and -3432178. </li>
    <li><b>Floats</b> are decimals, also known as floating point numbers. Examples of floats include 1.5, 3.141592654, 112.0, and -32.018. </li>
    <li><b>Booleans</b> can be either True or False.</li>
</ul>

If you convert a float to an integer, it will be rounded:

In [5]:
int(4.3)

4

In [7]:
round(4.3)

4

We can see the type of an object using the `type()` function.

In [16]:
type('hi')

str

In [17]:
type(15.8)

float

In [18]:
type(10)

int

Strings can be added together, which concatenates them. This will include all special characters and any whitespace in them. The other operations can not be performed on strings because they don’t make sense.

In [9]:
'Hello' + ' world'

'Hello world'

Strings can be denoted by single, double, or triple quotes. This allows you to nest quotes inside of strings

In [26]:
print('This is a sentence with single quotes')
print("This is a sentence with double quotes")
print('''This is a sentence with triple quotes''')
print("""This is also a sentence with triple quotes \n hi""")

# create a string with quotes inside it
print("I told my friend, 'learn Python!' \n")

This is a sentence with single quotes
This is a sentence with double quotes
This is a sentence with triple quotes
This is also a sentence with triple quotes 
 hi
I told my friend, 'learn Python!' 



In the above cell, I wrote a <b>comment</b>, which starts with the <b>#</b> character. Comments allow you to write notes or descriptions within code cells, but they are not run as code. Writing comments is a good practice so that others (or future you) know why you wrote certain code. 

I also included <b>newline characters</b> in the last 2 quotes. As the name suggests, this creates a line break and improves the readability of your code.

## Variables, Lists, Arrays, and Dictionaries

When programming, we often work with many related pieces of data. To be organized and efficient, we store them within a single variable called a <b>data structure.</b> This way, we can efficiently perform operations on the data structure as a whole rather than having to operate on each piece of data as a separate variable.

We will primarily be using 3 types of data structures: lists, dictionaries, and arrays.
<ul>
<li><b>Lists</b> store pieces of data in a sequence, so that data can be referenced by its index in the list.</li>
  <li><b>Dictionaries</b> store pieces of data under labels called <i>keys</i>, so that data can be referenced its key in the dictionary.</li>
<li><b>Arrays</b> store data in matrices and higher-dimensional matrices called <i>tensors</i>, so that data can be referenced by its index in the matrix or tensor.</li>
</ul>

A list can contain elements of different types, but an array can only have elements with the same type. Arrays are most commonly created using the `numpy` (pronounced num-pie) package. A <b>variable</b> is a symbolic name used to store an object. To create a variable, simply assign an object to a name.

Lists are set off by brackets and dictionaries with braces.

In [19]:
my_list = [1, 2, 'apple', False]

Here, I have created a list with multiple data types. We can call the `type()` function on it.

In [20]:
type(my_list)

list

To create a numpy array, first import the numpy package, then create the array. Most people assign the `numpy` package to the short form `np`.

In [22]:
# first import the package
import numpy as np

# next, create an array and assign it to a variable
my_array = np.array([1, 2, 3])
type(my_array)

numpy.ndarray

A dictionary is a set of pairs of data in a lookup table.

## Indexing

## Loops and Functions