# <center>**Introduction to Python Workshop**</center>
###### <center>[Harvard Chan Bioinformatics Core](https://bioinformatics.sph.harvard.edu/)</center>
###### <center>2020-07-29</center>


---

# Pre-class preparation 
Welcome to **Introduction to Python** workshop! Before we start, let's prepare a few things and get familiar with the platform (Google Colab) that we will use during this workshop. Colab provides free cloud service based on the [Jupyter notebook](https://jupyter.org/) environment. It does not require users to install python locally, it is run entirely in the cloud. 

Please follow the instructions below to set up for this workshop:
1. **Create your own copy of this python notebook (which is also the workshop materials)**: Click "File" at the top -> Click "Save a copy in Drive". This notebook will be automatically named "Copy of Intro_to_Python_in_class_version" and copied in your Google Drive. You can rename it if you want.
2. **Check where the material is located**: Click "File" again -> click "Locate in Drive". A new window should pop up, showing you its location in your Google Drive.
3. **Get familiar with Colab interface and terminology**: 
*   code cell
*   text cell
*   creating and deleting cell
*   adding comments
*   saving the document

4. **Run the code cell below**. You should see an output console under the code cell.
> ***remember, to 'run' means to hit the play button to the left of the cell or to hit ```'shift + enter'``` or ```'command(mac)/ctrl(pc) + enter'``` on your keyboard***


In [None]:
# Prepare for the lesson
_4_2 = 0.314
ans_final = "ATGAACGCATCGATATATATGTATGATAGCAAATACTATACGTAATCGATCAGT"
print("Done! You are all set for pre-class preparation!")

---

# Section I: Introduction to Python 

## What is Python?
Python is a powerful, open-source, general-purpose programming language with a wide variety of applications. Many [websites and apps](https://codeinstitute.net/blog/7-popular-software-programs-written-in-python/) that we are familiar with, including YouTube, Instagram, Spotify etc, are actually built on Python. In the field of bioinformatics, Python is also widely used in computational programming, data analysis, and pipeline development. 

Below we have listed some examples where Python-based tools are used:

| **Example use case** | **Tool** |
| :---: | :---: |
| Pipeline development | [bcbio](https://bcbio-nextgen.readthedocs.io/en/latest/), [snakemake](https://snakemake.readthedocs.io/en/stable/) |
| Image analysis | [CellProfiler](https://cellprofiler.org/) |
| Molecular visualization | [PyMOL](https://pymol.org/2/) |
| Machine learning | [scikit-learn](https://scikit-learn.org/stable/) |

> Note: Many bioinformatics tools are written in Python, and you may have encountered/used them without knowing how they work. This is okay, but if you want to customize them or modify something, you will need to understand the scripts.

Since it was first released in 1991, Python has been constantly developed and updated. The current version of Python is 3.0. Python programming language features include:
- Easy-to-understand syntax
- Large number of built-in libraries for common tasks
- Succinct and readable format, both friendly to the programming amateur and the professional
- Widely used programming language, with good documentation and lots of tutorials

## What will you learn in this workshop?
To take this workshop, you don't need to have prior experience in Python (or any other programming language(s)). We will start with basic Python syntax, progress through various concepts and end with learning about writing conditional statements. The workshop is designed to set a foundation for advanced topics in Python, including data wrangling/visualization, pipeline development, and machine learning applications.

## Additional tips
- `#` at the beginning of the line denotes that the following statement is a comment or annotation for the code. Essentially, any line starting with a `#` won't be executed by Python. 
- Indentation is very important in Python syntax. Not only is it necessary for Python, it is great for code readability. Essentially, by making indentations essential, Python forces good code-writing. We will be talking more about this throughout the workshop.

---

# Section II: Getting Started

### Step 1. Import packages 
We can take advantage of pre-packaged code for many common functions in Python. But first, we need to tell Python to import it. This is a really common step for most Python code.

We'll import a package called "numpy" and nickname it "np" and "matplotlib.pyplot" and nickname it "plt." <b> Below, add "as np" after "import numpy" to instruct Python to import the numpy package but nickname it np. </b> When you see "np" in our script, it's actually calling scripts from the numpy package.

After you import it, Python will print a message. Having printed messages like these is a really nice way to check that your cell actually ran.

In [None]:
import numpy 
import matplotlib.pyplot as plt
print('Imported packages')

---

# Section II: Basic Python syntax

## Step 1: Variables
A "variable" is a temporary container that stores some information, and is given a name. We can use the `=` operator to assign some value to a variable name/variable. Variable naming in Python has to fulfill the following rules:
- must start with a letter or underscore character
- can contain only alpha-numeric characters or the underscore character (A-z, 0-9, _ )
- cannot be a reserved keyword in Python. A complete list can be found [here](https://docs.python.org/3.8/reference/lexical_analysis.html#keywords).
- is case-sensitive (e.g. `Year` and `year` are two different variable names)  

In [None]:
# Assign two variables: 2 to x, and 5 to y


The "x" and "y" variables are now stored in the current Python computing environment. 

We can use the `print()` function to print out the value of variables to the console. <br>
A function is a collection of reusable code that performs a particular task. Python has a set of built-in [functions](https://docs.python.org/3/library/functions.html). 
> In Python, functions are the workhorses and are written as follows with the open/close parentheses: `function_name()` 

In [None]:
# Print out the value of variable 'x' or 'y'


## Step 2: Lists

A frequently used "data structure" in Python is called a `list`. A Python list is a collection of data stored within square brackets `[]`.  <br>
This will be something you use all the time in Python.

Lists have the following features:
- order of its elements matters
- can store mixed data types that we introduced above
- can even contain a sublist

> Note: There are other Python data structures, including `tuple`, `dictionary`, `sets`. We are not going to cover those in this workshop, but they can be very useful in some situations. If you are interested in learning more about them, this [website](https://thomas-cokelaer.info/tutorials/python/data_structures.html) has good introductory information.

In [None]:
# Create a list called 'my_list' that contains three numbers separated by commas


## Step 3: Strings

Another commonly used data type is a string. String stores a sequence of characters, and can be created by enclosing characters inside single quotation marks `''` or double quotation marks `""` .


In [None]:
# Generate a string variable called 'text' with the value 'hello world!'. 


In [None]:
# Use the ```print()``` function to print the contents of the variable 'text'.


## Step 4: Data types
Data comes in many different types.<br>
Whole numbers are `int` for "integer"<br>
Numbers with decimal places are `float`<br>
Strings are `str`<br>
Lists are `list`<br>

We can use the `type()` function to check what data type a given variable has.

In [None]:
# Check the data type of your list by using type(my_list)


In [None]:
# Check the data type of your string variable


The last data type we will introduce here is called `bool`. This boolean data type can be either `True` or `False`. It is usually to specify if an expression is true or false. We will cover this data type in the conditional statement section.

In [None]:
# The code here generates a boolean variable called 'test', 
# which judges whether 10 is smaller than 8. 
test = 10 < 8


In [None]:
# print the value of 'test'



## Recap
In this short section, we introduced some very basic terms in Python. We learned **how to assign variables**, and what rules to follow. <br> 
We also described several important **data types** - `int`, `float`, `str`, `list`, and `bool`. 

| **Data type** | **Examples** |
| :---: | :---: |
| int (numeric) | 2 |
| float (numeric) | 3.5 |
| str | 'hello world!' |
| list | [2,3,6,8] |
| bool | True, False|



---

# Section II: Doing Computations

We can perform mathematical calculation(s) on numbers, strings, variables, and lists. <br> 
For example, we can calculate the mean of the numbers in the list you just created by using the function ```np.mean()``` <br>
The name of the list is written in the parentheses : ```np.mean(my_list)```. 
> Note that the mean( ) function is part of the numpy package that we have named **'np'** so it needs to be used with **'np.'** before it

In [None]:
# Calculate the mean of the numbers in your list following the example above 


In [None]:
# Now assign the output to a variable called 'my_mean'


In [None]:
# Print the value of the ```my_mean``` variable using the ```print()``` function


In [None]:
# Check the data type for 'my_mean'


---

# Section III: Accessing data from a list


Now that we know how to create a list, how do we access the data from it? 

We can do so by specifying the "index" number - the location of the data within the list. Similar to some other programming languages, **Python index starts from 0**  

The first data point of a list is `list[0]`. That is the data point at the '0' index of the list 'list'.<br>
Alternatively, we can also use `-` to access the data starting from end of the list. The last element of a list is `list[-1]`. 


In [None]:
# Create a list called 'animals', containing eight strings, each for a different animal.
# don't forget to put each string in quotes.


In [None]:
# Get the item at the 2nd index location of your list 'animals' 


## Getting multiple elements from a list
Now, what if we want to access multiple elements in a list? 

Here we introduce the slicing `:` operator. The syntax of "slicing" is `[start:stop:step]`. *start* refers to the starting index of the slice. *stop* refers to the index of the element just **after** the finish of our "slice". *step* refers to step value of the slice.
> Note: You don't have to specify all slicing elements; when it is not specified, Python will use default value - **by default, it will start from the first element, stop at the last element, and use step of 1**.


In [None]:
# For example, to get the first two items from the list 'animals' you would do:
animals[0:2]


In [None]:
# Then to get every third item from the list 'animals' you would do:
animals[::3]


In [None]:
# Get the fourth item from your list 'animals'


In [None]:
# Get every other item from the list 'animals' (every second item)


## Modifying a list
Next, we will learn several ways to modify elements in an existing list. 

* To change the value of an element, we can simply reassign the element to a new value. 
* To add a new element to a list, we can use the `+` operator. 
* Lastly, to delete an existing element from a list, we can use the `del()` function.

> Note: The `del()` function will modify the original list. Be mindful of this, especially if you run this function more than once you will be removing more values than you initially set out to.

In [None]:
# Change the animal at the second index location


In [None]:
# Add information for another genome - fly with genome length of 180


In [None]:
# Delete information about corn and its genome length


## Recap
We have covered a lot of content in this section! 

We first introduced **what is a list** and **how to create a list in Python**. We then learned **how to access and manipulate one or more elements in a list**. Sometimes there are multiple ways to achieve this goal. 

## Future learning
If you are interested in learning more about basics of Python programming, we listed a few additional resources below:
- [Python course on kaggle](https://www.kaggle.com/learn/python)
- [Python course on codecademy](https://www.codecademy.com/learn/learn-python)
- [Python course on software carpentry](https://swcarpentry.github.io/python-novice-inflammation/)
- [A Byte of Python](https://python.swaroopch.com/)
- [Python for Biologists](http://userpages.fu-berlin.de/digga/p4b.pdf)


---
*This lesson has been modified from the original by:

**Authors**: Jihe Liu, Radhika Khetani

*members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*