# Intro to Python Part 2

This tutorial will cover the following topics:

* text strings
* control statements
* functions
* imports

## Text Strings

Text strings are a special kind of list in python.  They obey many of the same rules but have some additional special functions. To create a text string, make a variable where the assigned value is some combination of characters within a quote (single or double quotes both work fine **with some caveats**). 

In [None]:
my_string='Python is fun, fun, fun!'
print(my_string)

In [None]:
my_string2 = "Python is fun, fun, fun!"
print(my_string2)

In [None]:
my_string == my_string2

What if we want to include quotes in our string text? Need to use the other quote type or escape the quotes

In [None]:
my_string3 = 'Andrew said "Python is fun, fun, fun!"'
print(my_string3)
my_string4 = "Andrew said \"Python is fun, fun, fun!\""
print(my_string4)
my_string3 == my_string4

In [None]:
#try all of the list tricks (they all work)
print(len(my_string))
print(type(my_string))
print(my_string[5:15])
print(my_string[-5:])
print(my_string+' (of course)')

There are a couple reserved string characters that when used with an escape character ('\') will do something special. \t enters a tab where \n creates a new line

In [None]:
print('Hello \t world')
print('Hello \nworld')

One of the more powerful string functions is split.  It divides a string every place a substring is observed.  String functions are added after the variable name separated by a period.  The result is a list of strings:

In [None]:
#We could split on spaces:
print(my_string.split(' '))
#or on commas
print(my_string.split(','))

Find searches for a substring in a string.  The first matching instance is reported:

In [None]:
print(my_string.find('fun'))

You can force find to look further on in the string by giving it at starting point:

In [None]:
print(my_string.find('fun',12))

Startswith and endswith are great functions to check for prefixes and extensions on file names:

In [None]:
print(my_string.startswith('Python'))
print(my_string.startswith('python')) #see if it's case sensitive

There are lots of other special string functions.  Upper and lower convert case.  Strip gets rid of leading and trailing whitespace.  Isnumeric tells whether the string represents numeric characters.

Note that data types can be applied as functions to convert values to and from different data types.  This is important for finding numeric values in strings (again useful for file names).

In [None]:
print(str(11))
print(type(str(11)))
print(int('11'))
print(type(int('11')))

One somewhat annoying thing can be shown below. You can print strs and ints individually, but cannot print a statement with the two different data types.

In [None]:
print('The year is: ')
print(2023)
print('The year is' + 2023)

To get around this, we can use the str() function from above to convert 2023 from an int into a string.

In [None]:
print('The year is: ' + str(2023))

## Exercise 0

How many sentences are in the below string? Hint is there anything we can split the string on to separate the sentences?

In [None]:
test_string = 'Hello. My name is Inigo Montoya. You killed my father. Prepare to die'

## Control Statements

Every programming language has control statements.  In python, the most common are "for x in y", "if then", and "while".  The for statement is typically used to iterate through some sort of data structure, most commonly lists. It will go through the elements of the list one at a time:

In [None]:
my_list=[1,2,3,4,5]
for i in range(len(my_list)):
    my_list[i]+=2
print(my_list)

Note that range is a special python function that goes from zero to it's value minus 1.  That makes it perfect for iterating through lists.  You can optionally give it the starting point as well:

In [None]:
my_list=[1,2,3,4,5]
for i in range(2,len(my_list)):
    my_list[i]+=2
print(my_list)

In [None]:
#list(range) let's you see what the range function is providing
list(range(2,len(my_list)))

You can also iterate through a list without indexes, but the value provided doesn't change the original list (it's a copy):

In [None]:
for val in my_list:
    val+=2
    print(val)
print(my_list)

Another helpful application of for-loops is to iterate through a dictionary. This can be accomplished a couple of different ways. The first is to iterate through the items(key-value pairs) that make up the dictionary: 

In [None]:
teacher_dict = {'Andrew':'CompBio', 'Jay':'BigDataAI','Jonathon':'BigDataAI',"Chris":'Micro',"Sean":'Micro'}
for xi in teacher_dict.items():
    print('Person: ' + xi[0])
    print('Department: ' + xi[1])
    print('----')

Another similar method is to iterate through the keys that make up the dictionary. Remember from last week that running dict[key] will return the value for that key found in the dictionary. This works because keys are unique!

In [None]:
for xi in teacher_dict.keys():
    print('Person: ' + xi)
    print('Department: ' + teacher_dict[xi])
    print('----')

If elif else statements in python are relatively straightforward. Python checks the condition provided witin the if/elif/else statements and if the statement is true, then it runs the code below. You can have as many elif statements as your heart desires. But, it is important to know that python checks the if/elif/else statements sequentially and that it stops after it finds a true condition. So if for some reason multiple conditions are true, only the first block of code will run. This is seen in the second chunk of code, 5 is larger than both 2 and 3, but it checks the if statement first and since it's true, it stops there.

In [None]:
my_list = [3.5,3,4]
if(my_list[0]>2 and my_list[0]<3):
    print('greater than 2, less than 3')
elif(my_list[0]>3):
    print('greater than 3')
else:
    print('less than 3')

In [None]:
my_list = [5,2.5]
if(my_list[0]>2):
    print('greater than 2')
elif(my_list[0]>3):
    print('greater than 3')
else:
    print('less than 3')

While is straightforward as well:

In [None]:
i=0
while i < len(my_list):
    print(my_list[i])
    i+=1

## Exercise 1

Write code that iterates through the below list and prints whether the number is positive or negative:

In [None]:
sign_list = [-6,4,3,-3,-1]


## Functions

Functions form the basis for every modern programming language.  Writing good functions is the key to going from a novice programmer to one that can put together complex workflows.  If a software package is a house, functions are the bricks and well written functions can be layered together to build at a much larger scale.

I think it's helpful to view coding functions just like a function in math. They share the same fundamental ideas:  
1: take in input(s)    
2: do something with the input(s)    
3: return an output  

y = x + 2

In [None]:
def plus2(input_value):
    print('My input is:' + str(input_value))
    add2 = input_value + 2
    return(add2)

In [None]:
plus2(5)

An important note is that these inputs become a local variable inside the function (as well as any other variable created within the function). You can do anything you would normally want to do to a variable with the input variable within the function. However, these local variables are not accessible outside of the function. In the above function, we were able to call input_value as a variable, but now if we try to print what input_value is outside of the function, that variable no longer exists. If we care about those variable, we must return them.

In [None]:
print(input_value)

Now that our function is defined, we can run it as often as we want and save the output as a new variable, which can then be used for whatever else we would like to do.

In [None]:
plus2_result = plus2(input_value = 100)
print(plus2_result)
print(plus2_result / 2)

Functions can also take multiple inputs. By default these inputs are assigned based on the order you provide the inputs.

In [None]:
def my_function(val1,val2,val3):
    print('Input1:' + str(val1))
    print('Input2:' + str(val2))
    print('Input3:' + str(val3))
    return val1+val2+val3

In [None]:
my_function(1,2,3)

You can also manually set the values for specific function inputs, regardless of the order provided:

In [None]:
test1 = my_function(val2=100,val1=200,val3=300)

If you are interested in returning multiple results/variables, making a dictionary of results to return could be a good idea

In [None]:
def my_function_dict(val1,val2,val3):
    my_dict = {}
    my_dict['input1'] = val1
    my_dict['input2'] = val2
    my_dict['input3'] = val3
    my_dict['result'] = val1 + val2 + val3
    return my_dict

In [None]:
res_dict = my_function_dict(5,10,2)
print(res_dict)
print(res_dict['input2'])
print(res_dict['result'])

Function variables can be given default values to make life easier on the function user.  Here we find the nth substring in a string but default the value to the 2nd.  Note that default value function arguments need to come after all of the arguments without default values.

In [None]:
def findnthsubstring(mystr,substr,n=2):
    #it's a good idea to describe what your function does with comments
    #this function finds the nth instance of substr in mystr
    pos=0
    i=0
    while(i<n):
        pos=mystr.find(substr,pos+1)
        i+=1
    return pos

In [None]:
#run the function like normal
findnthsubstring(my_string,'fun',1)

In [None]:
#if we omit the last variable, the default value is used
findnthsubstring(my_string,'fun')

If functions are bricks in a house we are building, it's nice to be able to save them in a separate file and import them into our code.  Python will import any text file with extension .py in the same folder as your notebook.  Let's save our findnthsubstring function to a file called myimport.py.

On command line you can quickly create a new .py file and just copy the function over into the new file, but this gets a little more complex in a jupyter notebook session, but I think doing so introduces a couple new helpful things:  

1: To create a multi-line string, use three sets of quotes in a row on both ends of your string, as seen below. 

2: Writing to a file:  
    - Three options for opening a file: a(append), r(read), w(write)   
    - writelines: method that writes the string line-by-line to the file  
    - Make sure to close the file!  



In [None]:
function_str = """
def findnthsubstring(mystr,substr,n=2):
    #it's a good idea to describe what your function does with comments
    #this function finds the nth instance of substr in mystr
    pos=0
    i=0
    while(i<n):
        pos=mystr.find(substr,pos+1)
        i+=1
    return pos
"""

with open("myimport.py","w") as f:
    f.writelines(function_str)

f.close()

## Exercise 2

Write a function that for a given input, returns whether it starts with the letter 'A' or not.

## List Comprehension

List comprehension allows us to generate a new list based on values of a previous list. It combines the list data structure from last week, for loops, and if statements into a single helpful format.

Let's say we have a list of genes and want to pull out all the mitochondrial genes (with the "mt-" pattern). If you were to achieve this without list comprehension it would look like this:

In [None]:
#make a list of genes
gene_list = ['mt-gene1','SOX4','HOX4','mt-gene2','Sept7']

#make an empty output list to append our mt genes to
mt_gene_list = []

for gene in gene_list:
    if "mt-" in gene:
        mt_gene_list.append(gene)
print(mt_gene_list)

It's too bad but it requires a good amount of nested structures to keep track of. List comprehension can help us simplify this greatly:


In [None]:
#List Comprehension
mt_gene_list = [gene for gene in gene_list if "mt" in gene]
print(mt_gene_list)

We can even apply additional functions or methods if we wanted to do something like converting the mt genes to uppercase:

In [None]:
mt_upper_gene_list = [gene.upper() for gene in gene_list if "mt" in gene]
print(mt_upper_gene_list)

So now that we see how using list comprehension can help make our code simplier and shorter. How are they constructed?

The syntax follows this pattern:

newlist = [**expression** for **item** in **iterable** if **condition**]  

Let's break this down:  
expression: gene.upper()  
item in iterable: gene in gene_list  
condition: if "mt" in gene  

For each item (gene) in our iterable (gene_list), check the condition (see if it contains 'mt'). If so, run the expression (gene.upper) and place it in the newlist (mt_upper_gene_list)

## Exercise  3

Using list comprehension with the num_list provided below and the plus2 function from earlier, generate a list that adds 2 to any value in num_list greater than or equal to 6.  

In [None]:
num_list = [0,1,2,6,7,8]
num_list_p2 = ##insert code here
print(num_list_p2)

## Imports

In [None]:
import myimport #note that you can create text files and save them as .py from jupyter

To save time and environment space, you can also choose specific functions to import from a package instead of importing the entire thing. This doesn't really do anything here, but can make a noticable difference with giant machine learning packages.

In [None]:
from myimport import findnthsubstring

In [None]:
#now call it from our import
myimport.findnthsubstring(my_string,'fun')

As we move into more advanced python concepts, we will use imports to bring in coding tools like numpy, matplotlib, and pandas.