# <u> Tutorial 2: Python Basics </u>


<br> This series of tutorials is intended to teach the basics of Python for scientific programming. These tutorials were written by Sanjana Kulkarni, an intern in the High Throughput Analytics group during summer 2021. 

Python can be used as a calculator to perform the following basic mathematical operations and their commands. 

1. `+`: addition
2. `-`: subtraction
3. `*`: multiplication
4. `/`: division
5. `**`: exponentiation
6. `%`: return remainder of division (a.k.a modulo)
7. `==`: check if two values are equivalent, returns True or False

These operations can be typed directly into a code cell. 

Comments can be written within code cells. These aren't code, but descriptions of what the code is doing, so they are not run in Python. To denote comments, use the <b>#</b> character before the comment.

In [8]:
# the remainder when 35 is divided by 6 is 5
35 % 6

5

In [9]:
# 8 is equivalent to 2^3
8 == 2**3

True

There are several data types in Python, the most common of which are integer (int), float, and string (str). To determine the data type of an object, use the `type()` function. This is built into Python, so no packages are needed to be imported. 

In [10]:
# a sequence of characters in quotes is a string
type("hello, world")

str

In [11]:
# this is an integer
type(8)

int

In [12]:
# this is a floating point number (basically a decimal)
type(4.34)

float

Variable declaration in Python is very easy. To declare a variable, simply use the `=` symbol. The numerical or string value of the variable is then stored and can be used over and over instead of writing out the full value. 

In [13]:
# make a variable called a with the value 3
a = 3

# show that a * 3 is the same as 3 * 3
a * 3

9

By default, the results of a Python operation are printed by the cell. If you don't want the value to be printed, set it to a new variable.

In [14]:
b = a * 3

## Operation Assignments

In addition to the operations above, we can assign existing variables to new values using operations. This syntax is the mathematical symbol followed by an equals sign. We have the variable a, which is equal to 3. We can change it as follows:

In [15]:
# make a = a * 4
a *= 4
a

12

Now, `a = 12`. We can do the same thing with the other 5 operations listed above, including modulo. Try it for yourself!

The above code produces the same output as `a = a * 4`, but it is a bit cleaner with an operation assignment, which combines both steps.

## Type Coercion

Python allows for objects of of one type to be recast into another. This is called <b>type coercion</b>. To do this, call the `int`, `str`, or `float` functions on an object. 

In [9]:
# converts the integer 4 to a string
str(4)

'4'

In [10]:
# converts the string 1285 to an integer
int("1285")

1285

In [11]:
# converts the string 1285 to an integer
float("1285")

1285.0

It doesn't always work though. If you call the `int` or `float` functions on a string that is made up of letters rather than numbers, Python will throw an error because there is no numerical equivalent. 

In [12]:
# we cannot convert words to an integer
int("words!")

# the same will happen with float("words!")

ValueError: invalid literal for int() with base 10: 'words!'

## Print Statements

We can print out values easily in Python. Simply use the `print()` function with the objects to be printed inside the parentheses. Nearly every type of Python object can be enclosed in a print statement, though this is functionally the same as assigning the object to a variable and calling that variable. 

Print statements are useful for printing messages, iterating variables, and progress bars.

In [4]:
print(10)

10


In [5]:
print("This is a message!")

This is a message!


## Naming Conventions

One thing to keep in mind about Python is that file names and labels are best WITHOUT spaces. On the command line and when slicing dictionaries and dataframes (discussed later), a name with spaces must be enclosed in quotation marks to keep the characters together. 

Variable names must be written in <b>snake case</b> or <b>camel case</b>. In snake case, different words in a variable name are separated by underscores. In camel case, the second and subsequent words should be capitalized to show where individual words start and end.

Variable names should be somewhat descriptive so that another user does not get lost in a bunch of confusing variable names like `a`, `b`, `c`, etc. I used these above because they were dummy variables, but if you want to store values like means, ranges, titles, etc., it is best to choose something that signals to users (including your future self re-reading old code) what the variable is storing. Here are some examples:

In [1]:
# good variable name -- snake case
# we can tell that the variable is storing the average length of some type of flower
mean_flower_length = 10

# bad variable name
length = 10

# also good variable name -- camel case
meanFlowerLength = 10

There are some native Python functions that might look like they make good variable names. Some common ones are

1. `len` = length of a data structure
2. `str` = string type
3. `int` = integer type
4. `lambda` = used to define functions without a name
5. `iter` = iterates through a data structure

When you type these in a cell, the editor colors the text green, signaling a function. DO NOT MAKE VARIABLES WITH THESE NAMES.  

## Functions

Functions are a set of grouped commands that can be repeated without copying and pasting the same code. If you find yourself doing the same steps on different inputs, it is a perfect time to write a function. Functions in Python are defined with the following syntax:

In [1]:
def my_function(x):
    
    # do some steps
    y = x + 10
    
    z = y - 34
    
    # return the output
    return z

It begins with a function definition and then the name of the function. Inside parentheses are the <b>arguments</b>. A function can take anywhere from 0 to many arguments. It can also return nothing, a list/tuple of many things, single value, dataframe, plot, or any other Python object. <b>All of the code of a function, including the return statement, must be indented.</b>

It is good to write <b>docstrings</b> for functions that you write. Docstrings are located at the top of a function and are set off by triple quotes. Docstrings provide information on how to use a function, similar to Python documentation. If your function has a docstring, then calling `my_function?` will show it. 

In [3]:
my_function(4)

-20

In [5]:
def function_with_docstring(x, y):
    '''
    This function sums the two arguments and then returns modulo 2 of the sum. 
    
    Arguments:
        2 integers or floats
    Returns:
        sum modulo 2
    '''
    z = x + y
    
    return z % 2

In [6]:
# show the documentation
function_with_docstring?

## If / Elif / Else Statements

These blocks of code execute different functions when different conditions are met. The syntax is 

```
if condition1 == True:
    do thing1
    
elif condition2 == True:
    do thing2
    
else:
    do thing3
```

Elif stands for <b><u>el</u></b>se <b><u>if</u></b>, meaning that you drop down to the elif block only if the previous `if` statement was not true. Some helpful hints:

1. You can have multiple elif statements in a block of code, and else statements don't need to be included if cases that don't meet the `if` or `elif` conditions should be ignored.
2. Multiple conditions can be included in a single line, with AND or OR logic, like so:

```
if condition1 & condition2:
    do thing1
elif condition3 or condition4:
    do thing2
```

3. If you follow an `if` statement with another `if` statement, they will be treated as independent conditions. Combining `elif` and `else` with `if` creates a single code block whose statements are dependent on each other.

Often, if/elif/else statements are combined with for loops to iterate through a list. 

In [21]:
for num in range(10):
    
    if num % 2 == 0:
        if num % 3 == 0:
            print(f'{num}: multiple of 6')
        else:
            print(f'{num}: even')
    else:
        print(f'{num} odd')

0: multiple of 6
1 odd
2: even
3 odd
4: even
5 odd
6: multiple of 6
7 odd
8: even
9 odd


Here, we also nested `if` statements within each other, which performs a function if both conditions are met, one after another. You can do this by indenting an if statement so that it is run only if the previous statement return true.

## Loops

Loops are blocks of code that get repeated a fixed number of times. The two types of loops in computer programming are <b>for</b> and <b>while</b> loops. 

For loops iterate through a defined range, and while loops continue while a certain condition is met. 

The range function in Python is returns numbers from 0 to N-1. So `range(10)` gives numbers from 0 to 9, as printed in the for loop below.

In [1]:
# iterate through the number 10
for num in range(10):
    print(num)

0
1
2
3
4
5
6
7
8
9


During the loop iteration, the value of the variable `num` changes, and at the end of the execution, `num = 9`.

In [3]:
num

9

We typically run while loops with counters or with a changing condition until the condition reaches the end point for the while loop. 

In [6]:
counter = 0

while counter < 10:
    
    # print a progress message
    print(f"This is iteration {counter}")
    
    # increase the counter variable by 1
    counter += 1

This is iteration 0
This is iteration 1
This is iteration 2
This is iteration 3
This is iteration 4
This is iteration 5
This is iteration 6
This is iteration 7
This is iteration 8
This is iteration 9


I used an `fprint` function to pass a variable into a string. A similar function is found in many programming languages and does a similar thing. We set off the variable using brackets so that we can update the message as the number changes.

## Packages

At this point, we have accessed functions in 2 open source packages, `numpy` and `pandas`. Packages are collections of functions that have been created for particular use cases. People write packages to make it easier to perform common functions.

Instead of each user writing code for routine analyses, other users have written these functions into a neat, efficient package. All other Python users can import the package and use the functions. 

This is the beauty of Python being an open source language. Anyone can write and contribute packages, and anyone can access and use them. For more information about how to create your own package, check out the Python <a href="https://packaging.python.org/tutorials/packaging-projects/" target="_blank">documentation</a>. 

Before importing a package, it must be installed in your Python environment. Packages are installed through the command prompt using <a href="https://pypi.org/project/pip/" target="_blank">pip</a>, Python's package installer. Pip is first installed with itself:

`pip install pip`

After that, installing packages is straightforward:

`pip install package_name`

The installation will print out a very long log of the progress, and it often takes several seconds to a few minutes to install. If a package is not available, the solutions are usually package-specific, so searching online for solutions is the best way forward. 

Anaconda has many common packages pre-installed, so you don't have to manually install packages like numpy and pandas. If you try to import a package that hasn't yet been installed, you will get a clear message in Python. If you use Python for a while, you will notice that packages get updated with bug fixes and new features. Upgrade existing packages with

`pip install --upgrade package_name`

Remove packages with 

`pip uninstall package_name`

Some of these pip commands may encounter difficulties on Merck laptops because of the restrictions in place on downloadable and modifiable packages. 

## Import Statements

It is generally a good practice to keep all import statements in a separate cell at the top of a script or Jupyter Notebook. This way, you can check that everything you need is imported before running your analyses. 

You may have noticed that I imported the packages abbreviated as `np` for `numpy` and `pd` for `pandas`. These are standard abbreviations that nearly everyone uses, and although it is not necessary to use them, code is cleaner: `np.array` vs `numpy.array`.

## Getting Help

Because Python is open source and is used by so many people, it is very easy to get help on performing different tasks. 

As mentioned above, <a href="https://stackoverflow.com/" target="_blank">Stack Overflow</a> is a great resource, where developers ask and answer questions. Nearly all common questions have been answered, so if you search how to do a particular task, even if it's as simple as "how to switch the rows and columns of a dataframe?", you will get many answers. 

(the answer to this one is to use the transpose method like so: `my_df.T`)

You will inevitably encounter errors while coding in Python. Python prints out reasonably good error messages that almost always identify the line of code that is causing the issue. Sometimes the true issue is in another line of code, but it only manifests when a different line is run. 

If, after debugging on your own, you can't determine the cause of a bug, searching the web with the exact error message is often  helpful. Sometimes it can even give you suggestions on something that went wrong that you may not have thought of. 