# Bioinformatics Introduction to Coding

## Programming Basics 2

### Last lesson recap:

- operators
- variable assignment
- data types
- type casting

### Coming up this lesson:

- functions vs. methods
- guanine/cytocine Percentage Exercise
- complimentary Nucleotide Exercise
- package Management

## Functions
As a reminder, **functions** take input, go through a series of steps, and give a result. Lots of functions are already defined in python, like the print() or str() functions you've already seen. You can even make your own functions if you find yourself doing the same things over and over again. For example, let's say we've found a linear slope that fits our data. We might want to make a function where we can specify what X value we have and it'll give us the corresponding Y value. Let's write it up.

In [None]:
# def is the keyword we use to DEFINE a new function we want to make
# we could name our function anything we want, but picking a descriptive name is good
# any arguments we want our function to accept are placed in parentheses after the name
# multiple arguments can be separated with commas, and they can also be named anything
# lastly, a colon : is used to end the line.
def linear_slope(x_value):
    # Python is a little funky and requires us to INDENT everything in a block of code after the :
    # our linear slope formula is Y=mx+b, where m = slope, x= position on the x axis, and b = y intercept
    y = 0.4 * x_value + 2.2
    # once we've calculated something, we get to specify what the function gives back using return
    return y

Just like assigning a value to a variable, it should look like nothing happened. This is because you just defined what the function does, but haven't actually said to use it yet. The action of running the code defined in a function is referred to as **calling** the function.

In [None]:
# Here, we call the function we just defined
# Try changing the value we pass into the function as an argument to see the return value changing
linear_slope(10)

##  Methods
Okay, so now you know what a function is and how to make one. Sometimes a function is attached to a specific **object** like some of the variables you've defined (more on objects later). Anyway, once you're talking about a function which belongs to a specific thing in your code, it becomes...a **method**!

Methods are accessed by adding a dot `.` and the method name after the object.

Well-written methods are cool because they are a way to store specific instructions for certain procedures you're likely to do often on specific data types. Check out some examples below.

In [None]:
# first we're going to create a string which represents real biological data: DNA bases!
DNA="AATGTATACGACAGAGTCCGTGCACCTACCAAACCTCTTTAGTCTAAGTTCAGACTAGTTGGAAGTTTGTCTAGATCTCAGATTTGTCACTAGAGGACGAAaaaatggggaaaa"

# there is a general FUNCTION called len() that tells us how long strings are...
len(DNA)

In [None]:
# but there are also METHODS associated with every string that help us do useful things!

# for example, did you notice something odd about the data?  hint: look at the end

# having such inconsistent capitalization could be a problem if we're looking through the data
# what we should do is standardize everything one way or the other
# let's look at what the string.lower() method does
print(DNA.lower())
# and then look at the data still stored in the variable
print(DNA)

In [None]:
# Why was DNA still mixed-case after we called the lower method?
# Because the method did not actually change the contents of the variable,
# it just took its contents as input, and returned the contents in lower case.
# but we didn't actually save the output to a variable.
# let's give it a shot now; take the info currently stored in DNA, make it lower case, save output to variable
DNA=DNA.lower()
print(DNA)
#we can go the other direction too if we want, which I like for aesthetic reasons
DNA=DNA.upper()
print(DNA)

In [None]:
#let's say we want to know how many adenine there are; 
#this would give an erroneous result if not all adenine were upper case since we're matching 'A' but not 'a'
DNA.count('A')

### Mutation

Some methods we will talk about later **do** actually change the contents of the object they are attached to when called. When this happens, the object is said to have **mutated**. Knowing which methods mutate their objects and which return a new object with the changes is kind of something you just have to know from experience. ¯\\_(ツ)_/¯

## GC Exercise
Being able to do something simple like count types of bases has very practical implications. For example, if you're designing a DNA primer to do PCR, if the primer has too many G's and C's the DNA strands bond too strongly and the process won't work at the temperatures you'd planned!

It's time for you all to use your programming knowledge to solve a problem: you're going to calculate the GC% of a new DNA sequence!

I've written most of the code you'll need below, but some pieces are missing. Think of this kind of like a madlib; the framework is already provided for you, but there's some blanks that you need to fill in.

In [None]:
# here's the new data
DNA2 = "GCGCAtatcTCGCATAATAACccCTGAATATATCGGCATTTGATgttACCCAGGTTGAGTTAGTGTTGAGCT"

# it's gonna be hard to count things in different cases; finish the line of code with a method to standardize case
DNA2 = DNA2.


# first it's good to plan for what values you'll need
# to calculate GC% you'll need the number of G bases and C bases, divided by the total number of bases!

# put in the function to count your guanine here
G_count = DNA2

# next, do cystine
C_count = DNA2

# the last number will be the total number of bases, or sequence LENGTH which you know how to get
total_count =

# now put it all together; use context from other lines of code to figure out your variable name here
= (G_count + C_count) / total_count

print("GC% is ", str(GC_percent))

## Reverse Compliment Exercise
DNA is really useful biological data. But depending on the context, we might want to work with other types of biological data like RNA or proteins. For this exercise, you're going to take a string of DNA and turn it into its reverse complement RNA.

In [None]:
# the string.replace() method is going to do the heavy lifting here
# first, let's look at how the method works 
temp = "test string"
temp.replace("t", "b")

Pretty straightforward, right? Specify what character(s) we want to replace as our first argument. The second argument is what character we want to substitute.

In [None]:
# some methods or functions can have optional arguments after the mandatory ones
# like here we can express how many times we want the replacement to happen
temp.replace("t", "b", 1)

So...you know how the string replace method works. We haven't gone over how to flip the string around yet (stay tuned next lesson), but we can at least find the bases which compliment those in our current DNA data.

In [None]:
# base complement time 
# first, which DNA bases pair with which?
# A -> T ; T -> A ; C -> G ; G -> C

# let's try doing a replacement here
DNA = DNA.replace("A", "T")
print(DNA)


Since we're swapping only one character at a time, we can't distinguish between which T's we just swapped and which ones still need to be swapped. There are lots of different ways to solve problems like this in your programs, but for now let's do some creative thinking and use the tools you already know.

In [None]:
# first thing's first; we wrote over our data in the previous code box, which is a problem
# go back to the previous code and make sure you've got a DNA string with all 4 bases in it
############
# next up, use the string lower method so the whole string is lower case

# remember that string methods are case specific, so if you say count('a') they won't look at A
# using this information, we can deliberate use case information to swap complimentary nucleotides
# I'll do the first one for you
DNA = DNA.replace('a','T')
# do the rest of the bases and then print out your string at the end



Now you might be thinking, "This is dumb. Writing code is dumb. Someone has to have written code to do this already." Well, you're right. But you're here to learn to code, and sometimes that means reinventing the wheel because building wheels is good for you. However, most of the time you'll want to take advantage of code other people have already written, which leads us to our next topic.

### Extending Basic Python
Have you stopped to wonder how your computer's Python interpreter already knows the keywords and functions and stuff you've used? You can think of Python as your local library building. It doesn't have every book ever written because that would take up too much space and managing all that information costs time and money...but it does have commonly used books like how-to manuals or your kid's favorite picture book. If you need something really specific, you can even request a book and find it elsewhere.

You can think of a collection of Python code as the books in our analogy. Well-written books focus on a topic, and code usually focuses on completing certain tasks. Lots of code is already found in your local python library, but sometimes you need to request a book from the library system. These chunks of Python code, or **modules**, are submitted to online repositories so users like you can use the code. Lots of modules put together make a **package**. Lots of packages put together are a **library**. When looking for stuff online, you might get some of these terms mixed up but don't worry, it's mostly just a question of scale. At the end of the day, they're all just a bunch of Python books you want.

Let's go through some examples and wrap up this lesson.

In [None]:
# meet the keyword import
# random is the package name; unsurprisingly, it deals with generating (pseudo)random numbers
import random

In [None]:
# importing the package means Python recognizes that the word "random" means something
# in fact, "random" is now an object in your NAMESPACE, containing all of the function, method, data type, etc. definitions contained in the package.

# the random() function returns a number between 0 and 1, very useful
print(random.random())
# there are a bunch of other options, like drawing from a specified uniform distribution
print(random.uniform(0,10))

# if you run this code multiple times, you can see how the output varies non-deterministically

In [None]:
# numpy is a very common python package that deals with number; pronounced num-pie
import numpy
print(numpy.sin(2))
# you can even change the names of things you import to make them more convenient
import numpy as np
print(np.sin(2))

In [None]:
# last, you can import specific functions from a package to make things easy on yourself
# very useful if you're wanting to do one particular thing a lot
from numpy import sin
print(sin(2))

# no matter how we did it, we still calculated the sin of 2

There's more than one way to skin a cat, as they say, so use whatever works for you when you need to import packages that will help you with your research! If you find a package you want to use online but don't already have it installed, you can often use the **conda** installer to get what you need (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html)

### Homework (again)
So you've done a couple exercises that have hopefully taught you some basic coding principles, but I imagine not everything has been smooth sailing. Please leave some feedback here at the end of the notebook before you send it back to me with everything completed!