Contents covered:
- Python basics
- Plotting
- Reading FASTA files

# Python Basics

## 0. Before we get started...
A big part of becoming a good programmer is knowing when and how to look up documentation and help
on commands and expressions you may not have used before - so when you're unsure about something,
Google to see if you can find some help online or better yet, just try it! For example:
- What is the difference between an int and a float?
- What is the difference between = and == (that's one equals sign and a double-equals sign)?
- How would you go about reversing a string?

Many questions you can think of can best be answered with: "Try it and find out!" That is not to say you
shouldn't ask questions - by all means, do! - but if it's something you might be able to learn the answer to
by trying it, give it a go!

If you don't have a Python book, look up some websites with command references and bookmark them. You can refer to these when you're unsure of the syntax of a command or you forgot exactly how to write a for loop or something. Here are a few:
- [Official Python Documentation](https://docs.python.org/)
- [Codeacademy course](https://www.codecademy.com/learn/learn-python-3)

## 1. Some quick basics
Before we get started writing a full program, we'll go over a few quick basics to make sure we're on the
same page. We'll cover these technical topics at a pretty bare-bones level, and you'll need to seek out
additional resources and practice to make sure you have a good handle on them.


We strongly recommend going through the first three tracks of the interactive Codecademy course on
Python (Python Syntax, Strings & Console Output, and Conditionals & Control Flow), especially if
you have little prior programming experience. You will learn many basic concepts of programming,
how they are implemented in Python, and you will get to do it as you learn it.

### Running commands

In [None]:
4+4

In [None]:
# a double asterisk is the command to raise a number to an exponent
7**2

In [None]:
total = 3
total

In [None]:
total = total + 4
total

In [None]:
total += 2
total

In [None]:
total -= 5
total

### Strings and methods
You'll be working a lot with **strings**. As discussed in class, strings are a class of object that are
processed in a particular way. Very roughly, strings are sequences of text characters.

Classes can also contain methods, or functions associated with objects of that class, and strings contain
many methods that will be very useful. Here's one example, and you should look for references on all
of the wonderful methods available for strings. We'll look at the **upper** method, which converts a string to upper case. None of
these methods will actually change the stored string. To change the stored string, you have to reassign
it, as below.

In [None]:
name = "Herman"
name

In [None]:
name.upper()

In [None]:
name

In [None]:
name = name.upper()
name

### Lists
Another useful type of object is the **list**. We will discuss lists in greater depth later, but this is a brief
introduction/review.

A list is an ordered set of objects that are stored together. Lists can contain anything - strings,
integers, whatever. Two examples of lists:

In [None]:
["Groucho","Harpo","Chico","Gummo","Zeppo"]

In [None]:
[1,1,2,3,5,8,11]

Note that:
- Lists exist in square brackets
- Entries are separated by commas
- You can have repeated items

Lists can be indexed into by position. Python indices beginning with 0, not 1. Thus, if you saved the second
list above as a variable called `fib`:

In [None]:
fib = [1,1,2,3,5,8,11]
fib

In [None]:
fib[0]

In [None]:
fib[1]

In [None]:
fib[4]

In [None]:
fib[2:4]

In [None]:
5 in fib

In [None]:
6 in fib

### Dictionaries
Now that we've played around with lists, let's move on to dictionaries. As mentioned before,
dictionaries are pretty similar to lists, except that instead of using integers to access them, you use
"keys." Every entry in a dictionary consists of a "key" and a "value" - think of a real-life dictionary
with the "keys" being words and the "values" being their definitions.

In [None]:
comp = {'Cyp12a5':'Mitochondrion',
        'MRG15':'Nucleus',
        'Cop':'Golgi',
        'bor':'Cytoplasm',
        'Bx42':'Nucleus'}
comp

In [None]:
comp["bor"]

In [None]:
comp.keys()

In [None]:
comp.values()

In [None]:
for key, value in comp.items():
    print(key, "->", value)

### Control structures
This will be an overview of the primary types of control structures. These are for your own review - so
if you feel comfortable with these from class and/or other tutorials you've gone through, feel
completely free to skim quickly past them!

In [None]:
temp = 68

if temp > 80:
    print("Boy, it's hot!")
elif temp < 50:
    print("Brrr...it's cold!")
else:
    print("Nice and temperate!")

In [None]:
for i in range(3):
    print("Counted number", i)

In [None]:
drink = "water"
for i in range(3):
    print("Letter", i+1, "is", drink[i])

In [None]:
cowTypes = ["brown","white","mooing","corn on the cob"]
for cow in cowTypes:
    print("I see a", cow, "cow!")

In [None]:
professor = "Ian Holmes"
for letter in professor:
    print("Letter: ", letter)

Note that the indexing variable (e.g. above: `i`, `cow`, `letter`, etc.) can be whatever you want. So you
could also technically have written `for i in professor` or `for turnip in professor` in
the final example and used `i` or `turnip` as your variable instead if you so felt like it. Stylistically, it is
good practice to use `i`, `j`, `k`, etc. when indexing through numbers or positions (i.e., a range of some
sort) and an obvious name when indexing by item rather than by index position.

In [None]:
bonks = 0
while bonks < 3:
    print("Bonk times",bonks,"!")
    bonks += 1

### Modules
While Python has decent native functionality, much of Python's power comes from external "modules."
Modules are simply Python scripts containing functions; these can be collectively imported into your
Python program so that you can use those functions in your program. Python comes with many
modules built-in and natively available for import; other modules can be downloaded separately and
loaded in.

A basic example of a built-in module is the `math` module. Python's built-in math capabilities are fairly
basic; should you want to, for instance, take the logarithm of a number or the sine of a number, you will
need the `math` module. Fortunately, this is rather simple (as noted by the floating guy in the comic
above); all you need to do is import the module and begin using the function that you want from that
module. Let's try a quick example.

Suppose you wanted to take the base-10 logarithm of a number.

In [None]:
# you will get an error as Python does not have this function
try:
    log10(100)
except NameError as e:
    print("See! An Error")
    print(e)

In [None]:
# However - let's import the mathmodule!
# You will get no feedback, but it has been imported.
import math

In [None]:
# I will tell you a secret: the function log10() is a
# function included in the math module.
# So now try taking the base-10 logarithm the same way:
try:
    log10(100)
except NameError as e:
    print("See! An Error")
    print(e)

In [None]:
# Oops! Another error! How come?
# When you use a function that comes from a module, you must call it
# as a function of that module, as follows:
math.log10(100)

In [None]:
# How can you know what functions a module contains? Modules have documentation.
help(math)

### Functions
Some of you have figured out functions already. Good use of functions will be expected in future code
you write - but don't worry, you'll want to use them because they're so useful!
Functions are important to have a handle on. Fortunately, they're fundamentally not anything very new.
A function is just a sort of sub-program, a set of commands that can be invoked by calling that
function. Methods are an example of functions - each one has some underlying functinoality that you
access by calling that name. Functions are used to reduce code duplication and increase the modularity
of your program. A function can take inputs (though need not) and can be repeatedly invoked.
Functions can be designed to return something upon being invoked. For instance, a function can
manipulate numbers and return the result of the manipulation, or it can run a comparison and
return True or False. Output can in fact happen without "returning," though - a printstatement
will still print to the screen. Let's look at an example.


In [None]:
def square(num):
    return num*num

In [None]:
square(3)

In [None]:
square(10)

# Plotting with pandas and seaborn

In [None]:
# When importing matplotlib, convention is to rename it “plt” as in:
import matplotlib.pyplot as plt

# To get iPython to plot stuff in-line, this magic command needs to go somewhere:
%matplotlib inline

In [None]:
import pandas as pd
import seaborn as sns

# loading an example dataset
penguins = sns.load_dataset("penguins")
penguins

In [None]:
# scatter plot
sns.scatterplot(data=penguins, x="bill_length_mm", y="flipper_length_mm", hue="species");

In [None]:
# box plot
sns.boxplot(data=penguins, x="species", y="body_mass_g");

In [None]:
# histogram
sns.histplot(data=penguins, x="flipper_length_mm", hue="species");

In [None]:
# stacked bar plot
penguins.groupby("island").sex.value_counts(normalize=True).unstack().plot.bar(stacked=True);

In [None]:
# another stacked bar plot
penguins.groupby("species").island.value_counts(normalize=True).unstack().plot.bar(stacked=True);

In [None]:
# Creating a pandas DataFrame from regular lists
data = [[1,2],[3,4],[5,6]]
index = ["index1", "index2", "index3"]
columns = ["col1", "col2"]
df = pd.DataFrame(data=data, index=index, columns=columns)
df

# Reading FASTA files with BioPython
FASTA is the most common format for nucleotide or aminoacid sequences.
More information [here](https://en.wikipedia.org/wiki/FASTA_format).

In [None]:
# Let's look at an example file
# cat command prints the contents of the file
!cat example.fa

In [None]:
# Installing BioPython
!conda install -y -q -c conda-forge biopython

In [None]:
# Reading a fasta file in Python
# reference: https://biopython.org/wiki/SeqIO

from Bio import SeqIO

for seq_rec in SeqIO.parse("example.fa", "fasta"):
    print(seq_rec.name)
    print(seq_rec.seq)