#Introduction to Python I

## Why python?

There are numerous programming languages from which to choose.  Many of them specialized for specific types of tasks. Python, like others, is a general purpose and high-level programming language and can be used for a variety of applications. There are a few reasons however why you might choose to work in python:

### Code readability and maintenance
Python syntax and formatting emphasizes code readability. In many cases, python code often reads close to english sentences instead of quirky syntax. This makes it a useful language to learn as a first language, but also helps others quickly interpret what is actually happening in your code. This helps with mainintaining old code as well and makes updating and sharing materials easier.

### Compatibility
Python supports many operating systems natively.  Meaning your python code can be readily ported across systems with little to no effort. 

### Robust built-in libraries
Python has a HUGE developer community supporting it.  This means that many functions, methods, and utilities ALREADY exist and you can stand on the shoulders of those who have already spent a decent amount of time figuring out how to do something in python.

### Large number of statistics and bioinformatics modules
Aside from the built-in modules, there is also a large community of bioinformatics, general biology, and statistics developers who have contributed packages for python. This includes a number of large frameworks including BioPython that help ease entry and streamline more complex workflows.

## Using Python

### The python shell
The python shell, similar to the bash shell, allows us to use python in an interactive manner. You enter in one command at a time and the result is immediately returned.  The python shell can be called directly from the terminal with the command `python`

```
$ python
```

This is a very convenient tool for testing python commands, and for getting started with the language, but it is not very useful for creating actual programs and or scripts to run. To make an actual program, you will need to put your code in a text file and save it with a `.py` extension. (more on that later today!)

### Python scripts
To create a python script, we can click on File > New File in the top menu above. This will open a text editor in which we are going to create our very first python program. ["Hello World" 
demonstration]

### IPython Notebooks
Another way in which we can use python is how we are doing so here, in interaction iPython Notebooks. (Brief re-intro to navigating notebooks as needed).

In [1]:
print("Hello World")

Hello World


## The Basics
If you ever get lost or need more information about how a function or object works in python, you can use the `help()` function. 

In [1]:
help()


Welcome to Python 3.8's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at https://docs.python.org/3.8/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".



In [1]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [2]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position

## Variables

### Data Types
Let's review a few of the data types that were detailed in the prereq materials. 

When you store data in variables, you need to be aware of _how_ that data is being stored. This is referred to as a data type.  Different data types can do different things and can interact with different 'operators' (e.g. '+, -, *, &, etc' in distinct ways.  For example the `+` operator will sum the values of two variables that contain numbers:

In [8]:
a = 2
b = 3
print(a+b)

5


but it will concatenate two variables that contain strings:

In [9]:
a = '2'
b = '3'
print(a+b)

23


#### 'Built-in' data types by default in python:
    
* Text Type:	`str`
* Numeric Types:	`int`, `float`
* Sequence Types:	`list`, `tuple`, `range`
* Mapping Type:	`dict`
* Set Types:	`set`


You can always check to see what 'type' a variable is by using the `type()` function

In [11]:
x = '5'
type(x)

str

Python will automatically set the data type when you create a variable under certain conditions.

In [None]:
x = "This is a string" # str
x = 10         # int
x = 10.0       # float
x = ['pizza','apple','hotdog']  # list
x = ('pizza','apple','hotdog')  # tuple
x = {'pizza','apple','hotdog'}  # set
x = range(10)  # range
x = {"name": "Loyal", 
      "age": 42, 
     "department" : "neuroscience"
}  # dict
x = True       # bool

In [15]:
x = 5
type(x)

type(str(x))

str

You can also explictly set the data type that you want by using standard functions. This is known as 'casting' a force or change variable to be a specific type.

In [14]:
x = float(4)
x

4.0

### Number types

`int` (integer) numbers are whole numbers (positive or negative) without decimals.  There is no limit to the size of an integer in python.

In [None]:
a = 3
b = 29398752398757573292375982737575
c = -42

The integer type should be used for numbers that will always be whole for their operations.  Think of counting the number of bases in a DNA sequence, or counting the number of times something happens.  You will almost never have a 'partial' quantity for these values.  But be careful of how `int` types are handled when you do certain types of operations

#### Exercise
Create three different integer variables and calculate the mean (designated here as `mu`).  What data type is `mu` after this calculation? why?

In [17]:
a = 5
b = 10
c = 13

#Calculate the mean of a,b,c
mu = (a+b+c)/3

print(mu)

9.333333333333334


The `float` type is for 'floating point numbers'.  These can be positive or negative and can contain one or more decimal values. The `float` type can be specified by adding a decimal value to the end of a number when assigned to a variable:

In [None]:
a = 5.0
b = 1.467283
c = -14.22

The `float` type can also be scientific notation by adding an `e` to indicate the power of 10:

In [3]:
x = 2e4
print(x)

y = -63.5e100
print(y)

20000.0
-6.35e+101


You can **cast** from one type to another with the `int()` and `float()` functions.

### Exercise
What happens to a float when it's cast to an integer?  to a string?

In [5]:
a = 1.1415926
b = 453.8452

print(int(a))
print(str(b))

print(str(b)+str(a))

1
453.8452
453.84521.1415926


### Strings
Strings or string literals are generally denoted by enclosing them in quotes; either single (`'`) or double (`"`) quotes:

In [6]:
print("Welcome to 'Neurogenomics'!")

Welcome to 'Neurogenomics'!


You can assign a string to a variable in the same way.

In [7]:
a = "Welcome to Neurogenomics!"
print(a)

Welcome to Neurogenomics!


Sometimes strings will span multiple lines.  If this is the case, it is convention to enclose them in triple quotes:

In [25]:
b = """Welcome to BCMB!
You've chosen the most exciting graduate program at JHU."""

print(b)

Welcome to BCMB!
You've chosen the most exciting graduate program at JHU.


### Strings, substrings, and slicing
Strings in python are stored as 'arrays of bytes' representing each charcater, which basically means that "Hello" is actually stored in python in something akin to a list: `["H","e","l","l","o"]`.  So we can access, edit, and manipulate different parts of a string (or any other array) by 'slicing' with square brackets. Its important to recognize when doing this that python is a *'0-indexed'* language which means that any time you count in python, the _first_ element will always start with 0. Lets see how this works in practice:

In [26]:
# Create a string literal and store in a variable
a = "Genomics is fun"

In [27]:
# To retrieve the second character in this string we will slice as so
print(a[1])

# To retrieve a range of positions, we will separate the start and end using ':'
print(a[2:6])

# You can use negative values to start the slice from the end of the string
print(a[-6:-1])

# You can leave either side of the ':' black to represent the beginning or end of a string respectively
print(a[:8])

print(a[9:])

# You can also use a 'step' indicator to skip over elements in the string (notice the second colon)
print(a[0::2])

# And you can also step backwards through the end of the string using a negative step indicator (e.g. reverse)
print(a[::-1])

e
nomi
is fu
Genomics
is fun
Gnmc sfn
nuf si scimoneG


_**Note: Slicing is a fundamental concept in python for accessing any set of elements in an array (not just strings). You will use this often.**_

Strings have several 'built-in' functions (methods) that are available for some common queries and manipulations:

In [29]:
# Get the length of the string
print(len(a))

# Convert the string to lower case
print(a.lower())

# upper??
print(a.upper())

# You can strip off excess white space
b = "My favorite gene is Pantr2.   "
print(b.strip())

# You can replace portions of the string
print(a.replace("fun","hard"))

# or split a string into substrings based on a 'separator'
print(a.split(" "))

print(a)
a = a.replace("fun","hard")
print(a)


15
genomics is fun
GENOMICS IS FUN
My favorite gene is Pantr2.
Genomics is hard
['Genomics', 'is', 'fun']


You can also test for instances of a substring within a string. Note the use of a new syntax (_in_) here which we will explore in more detail later)

In [31]:
a = "She sells seashells by the seashore."
x = "sea" in a
print(x)

y = "sho" in a
print(y)

z = "sho" not in a
print(z)
print(type(z))

True
True
False
<class 'bool'>


You can combine strings (concatenate) using the `+` operator as we discussed above:

In [32]:
a = "Toad"
b = "the"
c = "wet"
d = "Sprocket"

print(a + b + c + d)

ToadthewetSprocket


Whoops...we may want to format this a bit better to add a separator. Fortunately, python provides a convenient way to format strings called '_f-strings_'. Simply add the variables in a new string and enclose them with curly braces `{}`.

In [33]:
text = f"The best band ever is {a} {b} {c} {d}!!"
print(text)

The best band ever is Toad the wet Sprocket!!


This is a _very_ useful tool for formatting output strings containing useful pieces of information in your code/scripts

In [34]:
gc = 56
name = 'Pantr2'
chromosome = 'chr4'

summary = f"The {name} gene is located on chromosome '{chromosome}' and has a GC content of {gc}%"

print(summary)

The Pantr2 gene is located on chromosome 'chr4' and has a GC content of 56%


There are a number of methods for manipulating/searching/testing strings that are built in to python. Feel free to [check them out](https://docs.python.org/2.5/lib/string-methods.html) and test them on your own.

## Boolean type
The Boolean type refers to logical tests, and ultimately, `type: bool` can only have two values: `True` or `False` (case sensitive). There are often times when you need to test a value or an expression. In python the value returned from these test is a `bool`:

In [35]:
print(14 > 3)
print(14 == 3)
print(14 < 3)

True
False
False


We _very_ often use `bool` values and variables to help control the *flow of your code*. For example, we could print a message based on whether or not a condition is `True`.

In [36]:
a = 50
b = 100

if a < b:
    print("a is the smaller value")
else:
    print("b is the smaller value")

a is the smaller value


#### Exercise
Create variables containing your first name, middle initial, last name, _age_, and one favorite thing. Construct and print an ouput string that creates a short sentence describing you that contains your full name, _calculated birth year_, and something you like!

## Collection Data types
Collection data types store groups of `items`. Items can be named variables, or objects of other data types, including other collections (nested). There are four main collection data types, each with their own properties/assets:
    
1. A *List* is an _ordered_ collection and is _mutable_. It can also hold duplicate items.
2. A *Tuple* is an _ordered_ collection and is _immutable_. It also allows for duplicate items.
3. A *Set* is an _unordered_ collection and _unindexed_. It does _not_ allow for duplicate items.
4. A *Dictionary* is an _unordered_ collection which is _mutable_ and _indexed_. It does not allow for duplicate index keys.


### Lists
A list is instantiated using square brakets `[]`

In [38]:
genes = ['Gapdh','Mef2c','Pax6','Cxcl1','Msi1']

genes

['Gapdh', 'Mef2c', 'Pax6', 'Cxcl1', 'Msi1']

You can access list items by referring to the index number (remember that python is zero-indexed).

In [39]:
print(genes[1])

print(genes[-2]) # negative indexing to select items from the end of the list. (-1 refers to the last item)

print(genes[1:3]) # you select a range using ":" (returns a new list)

Mef2c
Cxcl1
['Mef2c', 'Pax6']


How could you determine the length of the `genes` list?

In [40]:
# Return the length of list `genes`
len(genes)

5

Since `list` items are mutable, you can change any specific item by refering to it's index number

In [41]:
genes[2] = 'Sox10'
print(genes)

['Gapdh', 'Mef2c', 'Sox10', 'Cxcl1', 'Msi1']


`list` collections (like all collection items) are _iterable_, meaning you can loop through elements.

In [42]:
for x in genes:
    print(x.upper())

GAPDH
MEF2C
SOX10
CXCL1
MSI1


To add items to a list (at the end) you can use the `append()` function

In [43]:
genes.append('Foxp1')

print(genes)

['Gapdh', 'Mef2c', 'Sox10', 'Cxcl1', 'Msi1', 'Foxp1']


Conversely, you remove using several methods:

In [44]:
genes.remove('Mef2c') # removes a 'specific' item
print(genes)

genes.pop() # removes a specified index position or the last item in the list if index is not specified.
print(genes)

genes.sort()
print(genes)

['Gapdh', 'Sox10', 'Cxcl1', 'Msi1', 'Foxp1']
['Gapdh', 'Sox10', 'Cxcl1', 'Msi1']
['Cxcl1', 'Gapdh', 'Msi1', 'Sox10']


We can clear the contents of the entire list by using the `clear()` method

In [45]:
genes.clear() # empties the entire list
print(genes)

[]


To join two lists, you can use the `+` operator

In [46]:
fruit = ['apple', 'banana','pear']
veg = ['carrot','celery','potato']

food = fruit + veg

print(food)

['apple', 'banana', 'pear', 'carrot', 'celery', 'potato']


### Tuples

Tuples operate very similar to lists, but once instantiatied, the items in a tuple cannot be changed. Tuples are created with round brackets `()`. This is a useful data type to hold values associated with a single 'record'.  For example if you wanted to record specific information about a single gene like its name, chromosome, and start position:

In [47]:
a = ('Sox2','chr4',1589182)
b = ('Xist','chrX',23564335)

You access individual elements of a tuple in the same way as a list

In [48]:
print(a[0])

print(b[1])

Sox2
chrX


You can also loop through a tuple since it it iterable.

In [49]:
for val in a:
    print(val)

Sox2
chr4
1589182


Once you create a tuple, you cannot change the values, and you cannot add items to it.

## Dictionaries
Dictionaries are 'indexed' collections, meaning the _values_ within the collection each must have a unique identifying _key_. You can create a dictionary using curly braces `{}`.

In [50]:
myGene = {
    'name': 'Sox2',
    'entrezID': 6657,
    'Ensembl': 'ENSG00000181449',
    'chromosome': 'chr3',
    'start': 181711925,
    'end': 181714436,
    'strand': '-'
}
print(myGene)

{'name': 'Sox2', 'entrezID': 6657, 'Ensembl': 'ENSG00000181449', 'chromosome': 'chr3', 'start': 181711925, 'end': 181714436, 'strand': '-'}


To access elements of a dictionary, you do so in a manner similar to other collection items (`[]`), but you must specify a 'key' instead of a positional index to return the value.

In [54]:
print(myGene['chromosome'])

myGene['chr3']

chr3


KeyError: 'chr3'

Dictionaries **are** mutable so you can change/assign values in the same way

In [52]:
myGene['strand'] = '+'

print(myGene['strand'])

+


When you loop through a dictionary, the values returned are the key index values

In [55]:
for key in myGene:
    print(key)

name
entrezID
Ensembl
chromosome
start
end
strand


You can also iterate over the values, or key:value pairs

In [59]:
myGene.items()

dict_items([('name', 'Sox2'), ('entrezID', 6657), ('Ensembl', 'ENSG00000181449'), ('chromosome', 'chr3'), ('start', 181711925), ('end', 181714436), ('strand', '+')])

In [58]:
for val in myGene.values():
    print(val)

Sox2
6657
ENSG00000181449
chr3
181711925
181714436
+


In [60]:
for k, v in myGene.items():
    print(f'key: {k}  value:{v}')

key: name  value:Sox2
key: entrezID  value:6657
key: Ensembl  value:ENSG00000181449
key: chromosome  value:chr3
key: start  value:181711925
key: end  value:181714436
key: strand  value:+


Sometimes it may be useful to check if a key exists in a dictionary.

In [62]:
lookup = "Ensembl"
if lookup in myGene:
    print(f"Found {lookup} in myGene dictionary keys.")

#### Exercise
1. Given the dictionary below, write a few lines of python code to return the keys in (sorted) alphanumeric order.

2. Do the same but for the values in the dictionary instead of the keys.


In [77]:
myDict = {'sydf': 124,
          'javd': 8927,
          'sfoj': 258,
          'ihes': 753,
          'agsj': 682,
          'bhds': 257
         }

keys = list(myDict.keys())

print(keys.sort())


None


## Control Flow
An important aspect of programming is manipulating the flow of how the program executes commands/functions/operations/etc. This is how programs and scripts take some decisions and execute different things depending on different situations. The structure of most control flow elements in python is fairly similar: evaluate certain conditions/statements and follow this with a colon (`:`).  The subsequent *code block* is below this statement and always indented.  The block ends when the indentation ends. There are three types of control flow statements in python: `if`, `for`, and `while`.  Each operates a bit differently.

### If...Else
The `if` statement first evaluates whether a given expression is `True`. If so, then the associated code block is then executed. We have seen a few examples above but lets make sure we understand how it's organized.

In [78]:
a = 5
if a < 10:
    print(f'{a} is less than 10.')
    
if a >= 10:
    print(f'{a} is greater than or equal to 10.')

5 is less than 10.


Here we've constructed two `if` statements to test the variable `a`. Notice that the first statement (which evaluates to `True`) is executed but the second (`False`) is not. We can also use the `else` and `elif` (read: 'else,if') statement to further condition how python responds to our conditional test.

In [8]:
a = 50
if a < 10:
    print(f'{a} is less than 10.')
elif a == 10:
    print(f'{a} is equal to 10.')
else:
    print(f'{a} is greater than 10.')

50 is greater than 10.


There are a few things to note here.  First, the `elif` statement, we are providing _another_ conditional test for the variable `a`. If the first `if` returns `False`, then the next `elif` in the program will then be tested.  If all of the specific `if` and `elif` statements return `False`, then the remaining code block under the `else` statement is evaluated.  In this way, `else` acts as a 'catch all' if none of the other statements are `True`. 

*Only one* of the statements above will be executed; the first statement to evaluate to `True`. Once this happens, python steps out of the `if...else` statement and then continues on wth the rest of the program.

The second point to make from the above is the use of the double `=` in the `elif`.  This is the 'comparison operator'. A single `=` is used as the 'assignment operator' as we have been using to assign values to variables.  If we want to test that two values are in fact equal, then we _must_ use `==`.

### Operators
The construction of boolean logical tests is an important part of how you control the flow of your python program.  Often we want to test whether a variable has a certain value, or even exists at all.  Or perform some mathematical transformation on a value. To do this, we use different 'operators'. Operators are the constructs which can manipulate the value of individual items (or operands). You are familiar with many of these, for example 5 + 3 = 8. In this expression 5 and 3 are operands and '+' is the operator. Python has several types of operators, here we will distinguish between a few types

#### Arithmetic operators
These you should be inherently familar with for the most part. Assume that a = 10 and b = 20:

* `+`    Addition:	Adds values on either side of the operator.	a + b = 30
* `-`    Subtraction:	Subtracts right hand operand from left hand operand.	a – b = -10
* `*`    Multiplication:	Multiplies values on either side of the operator	a * b = 200
* `/`     Division:	Divides left hand operand by right hand operand	b / a = 2
* `%`     Modulus:	    Divides left hand operand by right hand operand and returns remainder	b % a = 0
* `**`    Exponent:	Performs exponential (power) calculation on operators	a**b =10 to the power 20


#### Comparison operators
These are the operators that allow you to compare two items/variables. Each of these operators returns a `bool` value (`True` or `False`)

In [80]:
a = 10
b = 20

print(a == b) # evaluates whether two operands are equal
print(a != b) # evaluates whether two operands are _not_ equal
print(a < b) # less than
print(a > b) # greater than
print(a <= b) # less than or equal to
print(a >= b) # you can probably guess

False
True
True
False
True
False


#### Assignment operators
These _assign_ values to a variable:

In [9]:
a = 10 # assigns the value 10 to the variable a
a += 5 # Adds the value on the right to the value in a and then assigns the new value to a
print(a)

a -= 10 # subtracts the right value from the value in a and then assigns the new value to a
print(a)

a *= 5 # multiplies the right value with the value of a and then assigns to a
print(a)

a /= 5 # divides the value of a by the right value and then assigns to a
print(a)

15
5
25
5.0


#### Logical and Membership operators
Logical operators help you compare different expressions

In [81]:
a = True
b = True
c = False

# and: if both values are True then the condition is True
print((a and b))

# or: if _either_ value is True then the condition is True
print((a or b))

# not: reverses the logical state of the condition
print(not(a or b))


True
True
False


Membership operators test whether a value (operand) is a member of a collection (as in a string, list or tuple).

In [82]:
fruits = ['apple','banana','pineapple']
a = "orange"

print(a in fruits)

print(a not in fruits)

print("o" in a)


False
True
True


These are the basic operators that you will need to know to create conditional expressions to guide your control flow.

### Looping/Iteration

Looping is a powerful way to perform an operation repeatedly. Python has two main looping constructs depending on when you want to stop iterating.

### While loops
The while statement executes commands as long as an evaluated conditional expression remains true. This will loop through the code block until such time as the statement is no longer `True`. 

In [85]:
i = 1

while i < 10:
    print(i)
    i += 1


1
2
3
4
5
6
7
8
9


### For loops
The `for..in` statement also performs loops. In this case however, the loop _iterates_ over a sequence or collection, and in doing so it assigns each element of the collection to a specific variable. Any collection that is _iterable_ can be used to construct a for loop - this includes strings, lists, tuples, dictionaries, and other python data types. In the example below `range(0,10)` creates an iterable collection of numbers from 0-9.  Each instance of the loop places one of these values (in order) into the newly created variable `i`, and then executes the code block associated with this loop:

In [86]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


An optional `else` statement can be used to execute a code block after the iterations have completed.

In [87]:
genes = ['Gapdh','Mef2c','Pax6','Cxcl1','Msi1']

for gene in genes:
    print(gene)
else:
    print("No more genes!")

Gapdh
Mef2c
Pax6
Cxcl1
Msi1
No more genes!


_*Next steps*_: Look into the `break` and `continue` statements can be used in conjunction with the above statements to further control the flow of a program.

Take a brief break to absorb some of this information.  We will continue with more advanced topics in the next notebook.