# BMI565: Bioinformatics Programming & Scripting

#### (C) Michael Mooney (mooneymi@ohsu.edu)

## Week 1: Documentation and Data Types

1. Introduction and Why Python?
2. Help, Documentation, Code Organization
    - Help
    - Comments and Docstrings
3. None and Logical Values
4. Numeric Data Types
    - Numeric Operators
5. Sequence Data Types
    - Lists
    - Tuples
    - Sets
    - Strings
    - String Formatting
6. Mapping Data Types
    - Dictionaries
7. A Quick Note on Python 2.7 vs. Python 3

#### Requirements
- Python 2.7 or 3.x
- Miscellaneous Files
    - `./images/sequencing_vs_compute.jpg`
    - `./images/EBI.png`

## References for Jupyter Notebooks

[https://jupyter-notebook.readthedocs.io/en/latest/user-documentation.html](https://jupyter-notebook.readthedocs.io/en/latest/user-documentation.html)

Notebook Extensions: [https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html)

## Introduction and Why Python?

- <b>Bioinformatics:</b> The application of computer science in the fields of biology and medicine
- <b>Programming:</b> The process of designing, writing, testing, debugging, and maintaining a list of computer instructions
- <b>Scripting:</b> Creating a list of computer instructions that call one or more stand-alone applications

### Computational demands in molecular biology are increasing
<img src="images/sequencing_vs_compute.jpg" width="500" height="500" align="left" />

<img src="images/EBI.png" width="500" height="500" align="left" />

### Compiled vs. Interpreted Languages

<table align="left">
<tr><td style="text-align:center"><b>Compiled Languages (e.g. C++)</b></td><td style="text-align:center"><b>Interpreted Languages (e.g. Python)</b></td></tr>
<tr><td style="text-align:center">Code is converted from a high-level language<br /> and runs as machine code</td><td style="text-align:center">Code runs on an interpreter and is converted <br />to machine code one instruction at a time</td></tr>
<tr><td style="text-align:center">Code is machine specific</td><td style="text-align:center">Code is portable; can run on multiple platforms</td></tr>
<tr><td style="text-align:center">Tends to run faster</td><td style="text-align:center">Smaller executable (program) size</td></tr>
<tr><td style="text-align:center">You can distribute standalone executables</td><td style="text-align:center">An interpreter must be installed</td></tr>
<tr><td style="text-align:center">You need to recompile code after making changes</td><td style="text-align:center">No need to recompile</td></tr>
</table>

### Python for Scientific Computing

#### Some Python History
- Created by Guido van Rossum (1989)
- Originally created as a means to automate system admin tasks
    - C programs took too long to write
- van Rossum was a Monty Python fan!
- Python became open source in 1991 (version .9)

#### Scientific Computing

There is a large community of Python developers creating tools for scientific applications. Here are a few:

- Numpy and Scipy
    * Tools for working with large data arrays, Optimization, Linear Algebra, etc.
    * [http://www.scipy.org](http://www.scipy.org)
- BioPython
    * Bioinformatics and Computational Biology
    * [http://biopython.org/wiki/Biopython](http://biopython.org/wiki/Biopython)
- Matplotlib
    * Data Presentation, Plotting
    * [http://matplotlib.sourceforge.net/](http://matplotlib.sourceforge.net/)
- Scikit-learn
    * Statistics and Machine Learning
    * [http://scikit-learn.org/stable/](http://scikit-learn.org/stable/)

### Writing and Running a Python Program

A Python program is simply a text file containing commands that can be interpreted by the Python interpreter. It is convention to use the `.py` file extension. A Python program can be created with any text editor, but integrated development environments (IDEs) can make coding easier. IDEs have features such as syntax coloring, automatic indentation, code completion, etc. that can save you time and reduce errors. IDLE is the "official" Python IDE that is packaged with Python installations. Eclipse, Spyder, PyCharm, and Xcode are other examples of IDEs.

Below is an example of a very simple Python program, which simply prints the message "Hello, world!" to the screen. The first line is called a "shebang" or "hashbang", which tells the operating system the location of the Python interpreter. 

    #!/usr/bin/env python
    
    print("Hello, world!")

<u>Try it out</u>: Copy and paste the above two lines into a text file and save it as `hello_world.py`. Then run the program by opening a terminal, changing to the the file's location and typing the following at the command-line:

    python hello_world.py

Alternatively, you can call the program directly, without explicitly calling the Python interpreter (this requires that the shebang be present on the first line). First you will need to make the file executable (using the `chmod` command), then you can call the program directly:

    chmod 755 hello_world.py
    ./hello_world.py

### Running a Python Program from a Notebook

Jupyter "magics" provide additional functionality to the notebooks (more info at the link below). One such magic is the `%run` magic, which allows you to execute a script as if it were run on the command-line. Another option is to use the special command `!`, which gives you access to the underlying shell. With `!` you can run any shell command, not just a python script (see examples below). It's important to note that neither of these options are valid Python syntax, so they will produce errors if run in a standard Python script or interactive session.

*Results may vary on Windows machines.

[https://ipython.readthedocs.io/en/stable/interactive/magics.html](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

In [1]:
%run hello_world.py

Hello, world!


In [2]:
!python hello_world.py

Hello, world!


In [3]:
!pwd

/Users/mooneymi/Documents/github/bioinformatics_programming


## Python Coding Conventions

[https://www.python.org/dev/peps/pep-0008/](https://www.python.org/dev/peps/pep-0008/)

Python's Style Guide is great resource to help you improve the consistency and readability of your code. However, there is no single set of rules accepted by the entire coding community. Often there are project-specific style guides for large development projects. For instance, Google and Numpy have their own style guides.

## Python Help and Code Documentation

Python's `help()` function can provide documentation about functions, modules, methods, etc. 

<a href="https://docs.python.org/2/">Python's Online Documentation</a> is a great resource. 

And, of course, Google will provide plenty of information, examples, etc.

In [4]:
## For help on any function simply call help() with the function name 
## as the parameter. For example, what does the len() function do?
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [5]:
print("Hello")

Hello


### Comments:  Please use them!

Any text following a # is ignored by the python interpreter. Surrounding text with """ can be used to create multi-line comments.

Comments can be purely descriptive (e.g. used to explain how code works), but can also serve a purpose when testing or debugging code. For instance, commenting out sections of your code can help you narrow down where an error is occuring.

It is a good idea to write descriptive comments as you code, rather than going back after code is already written. Please use comments thoroughly so it is easy for others to interpret your code. 

All code submitted for homework assignments should include a block of comments at the top of the file that includes the following information:

    """
    Your Name
    Assignment Number
    Date Submitted/Written
    A short description of what the code does
    
    Usage:
    An example of how the script should be run (e.g. python myscript.py input_data.txt)
    """

In [6]:
print("Hello")
#print("Good-bye")

Hello


### Docstrings

Docstrings are a good way to document your code. A docstring is a string literal that occurs as the first statement in a module, function, class or method definition. The info in a docstring will be shown when `help()` is called on the function, module, etc.

In [7]:
def hello_world(n=1):
    """
    This function prints the message "Hello, world!" a specified number of times.
    """
    for i in range(n):
        print("Hello, world!")


In [8]:
hello_world(3)

Hello, world!
Hello, world!
Hello, world!


In [9]:
help(hello_world)

Help on function hello_world in module __main__:

hello_world(n=1)
    This function prints the message "Hello, world!" a specified number of times.



There are a number of very useful style guides for creating docstrings. Automated documentation programs, such as [Sphinx](http://sphinx-doc.org/), can parse docstrings that follow these guidelines, and allow you to quickly create web documentation of your programs. I highly recommend you get used to writing thorough docstrings in a standard format, so that your code is well documented.

[Google Style Guide](http://sphinxcontrib-napoleon.readthedocs.org/en/latest/example_google.html#example-google)

[Numpy Style Guide](http://sphinxcontrib-napoleon.readthedocs.org/en/latest/example_numpy.html#example-numpy)

A quick example of a Google-style docstring for a function:

    def hello_world(n=1):
        """
        This function prints the message "Hello, world!" a specified number of times.
        
        Args:
            n (int): The number of times to print the message.
            
        Returns:
            None
        
        """


## Python Data Types

## None and Logical Values

`None` is the Python NULL value. NONE or none will not be interpreted as the `None` value.

`None` is the default return value for Python functions (e.g. when no return value is specified).

In [10]:
x = hello_world()

Hello, world!


In [11]:
if x == None:
    ## This usually works
    print("x == None")
if x is None:
    ## But this is the prefered way to compare to None
    print("x is None")


x == None
x is None


Logical values are specified with `True` or `False`. Again, these values must be specified exactly as `True` or `False` (e.g. not TRUE or true).

In [12]:
x = True
if x:
    print("True")

if not x:
    print("False")

True


### Logical Operators

<table align="left">
<tr><td style="text-align:center"><b>Operation</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>x and y</code></td><td><code>True</code>if both<code>x</code>and<code>y</code>are<code>True</code></td></tr>
<tr><td style="text-align:center"><code>x or y</code></td><td><code>True</code>if<code>x</code>or<code>y</code>is<code>True</code></td></tr>
<tr><td style="text-align:center"><code>not x</code></td><td><code>True</code>if<code>x</code>is<code>False</code></td></tr>
<tr><td style="text-align:center"><code>all(s)</code></td><td><code>True</code>only if all elements of<code>s</code>are<code>True</code></td></tr>
<tr><td style="text-align:center"><code>any(s)</code></td><td><code>True</code>if any elements of<code>s</code>are<code>True</code></td></tr>
</table>

In [13]:
(True and False) or (True and not False)

True

In [14]:
## The all() function returns True if all elements are True
all([True, True, False])

False

In [15]:
## The any() function returns True if any elements are True
any([False, False, True])

True

## Numeric Data Types

- `int` - Integer<br />
- `float` - Floating point<br />
- `complex` - Complex numbers: these have `.real` and `.imag` components (each are floating point numbers)

### Numeric Operators

<table align="left">
<tr><td style="text-align:center"><b>Operator</b></td><td style="text-align:center"><b>Example</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>*</code></td><td style="text-align:center"><code>x * y</code></td><td>Multiplication of<code>x</code>and<code>y</code></td></tr>
<tr><td style="text-align:center"><code>/</code></td><td style="text-align:center"><code>x / y</code></td><td>Quotient of<code>x</code>and<code>y</code></td></tr>
<tr><td style="text-align:center"><code>//</code></td><td style="text-align:center"><code>x // y</code></td><td>Floored quotient of<code>x</code>and<code>y</code></td></tr>
<tr><td style="text-align:center"><code>%</code></td><td style="text-align:center"><code>x % y</code></td><td>Remainder of<code>x / y</code></td></tr>
<tr><td style="text-align:center"><code>**</code></td><td style="text-align:center"><code>x ** y</code></td><td><code>x</code>to the power<code>y</code></td></tr>
<tr><td style="text-align:center"><code>+</code></td><td style="text-align:center"><code>x + y</code></td><td>Addition of<code>x</code>and<code>y</code></td></tr>
<tr><td style="text-align:center"><code>-</code></td><td style="text-align:center"><code>x - y</code></td><td>Difference of<code>x</code>and<code>y</code></td></tr>
<tr><td style="text-align:center"><code>-</code></td><td style="text-align:center"><code>-x</code></td><td>Negation of<code>x</code></td></tr>
</table>

### Other Numeric Operations
<table align="left">
<tr><td style="text-align:center"><b>Operation</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>abs(x)</code></td><td>Absolute value of<code>x</code></td></tr>
<tr><td style="text-align:center"><code>pow(x, y)</code></td><td><code>x</code>to the power<code>y</code></td></tr>
<tr><td style="text-align:center"><code>divmod(x)</code></td><td>The pair<code>(x / y, x % y)</code></td></tr>
<tr><td style="text-align:center"><code>int(x)</code></td><td><code>x</code>converted to integer</td></tr>
<tr><td style="text-align:center"><code>float(x)</code></td><td><code>x</code>converted to float</td></tr>
<tr><td style="text-align:center"><code>complex(re,im)</code></td><td>A complex number with real part<code>re</code>and imaginary part<code>im</code>(defaults to 0)</td></tr>
<tr><td style="text-align:center"><code>c.conjugate(c)</code></td><td>The conjugate of the complex number<code>c</code></td></tr>
</table>

In [16]:
## Division (see section below about differences 
## between Python 2.7 and Python 3)
5/6

0.8333333333333334

In [17]:
## For Python 2.7, convert one of the operands 
## to a float for floating point division
float(5)/6

0.8333333333333334

In [18]:
5.0/6

0.8333333333333334

### Operator Precedence

1. `()`
2. `**`
3. `+/-` (negation)
4. `*,/,//,%`
5. `+,- ` (addition and subtraction)
6. `not`
7. `and`
8. `or`

### Comparison Operators
<table align="left">
<tr><td style="text-align:center"><b>Operator</b></td><td style="text-align:center"><b>Example</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code><</code></td><td style="text-align:center"><code>x < y</code></td><td><code>x</code>less than<code>y</code></td></tr>
<tr><td style="text-align:center"><code>><code></td><td style="text-align:center"><code>x > y</code></td><td><code>x</code>greater than<code>y</code></td></tr>
<tr><td style="text-align:center"><code>==</code></td><td style="text-align:center"><code>x == y</code></td><td><code>x</code>equals<code>y</code></td></tr>
<tr><td style="text-align:center"><code><=</code></td><td style="text-align:center"><code>x <= y</code></td><td><code>x</code>less than or equal to<code>y</code></td></tr>
<tr><td style="text-align:center"><code>>=</code></td><td style="text-align:center"><code>x >= y</code></td><td><code>x</code>greater than or equal to<code>y</code></td></tr>
<tr><td style="text-align:center"><code>!=</code></td><td style="text-align:center"><code>x != y</code></td><td><code>x</code>not equal to<code>y</code></td></tr>
</table>

## Sequence Data Types

### Lists

Lists are ordered collections of data. Python lists are mutable data types, which means the individual elements of a list can be modified.

#### List methods
<table align="left">
<tr><td style="text-align:center"><b>Method</b></td><td><b>Description</b></td></tr>
    <tr><td style="text-align:center"><code>l.append(x)</code></td><td>Appends <code>x</code> to the end of the list <code>l</code></td></tr>
<tr><td style="text-align:center"><code>l.extend(y)</code></td><td>Appends each element of the list <code>y</code> to the end of <code>l</code>.</td></tr>
<tr><td style="text-align:center"><code>l.insert(i, x)</code></td><td>Inserts <code>x</code> at index <code>i</code></td></tr>
<tr><td style="text-align:center"><code>l.sort()</code></td><td>Sorts the list <code>l</code></td></tr>
<tr><td style="text-align:center"><code>l.pop([i])</code></td><td>Removes the element at position <code>i</code> and returns it. If <code>i</code> is not given, the last element will be removed.</td></tr>
<tr><td style="text-align:center"><code>l.reverse()</code></td><td>Reverses the list <code>l</code></td></tr>
<tr><td style="text-align:center"><code>l.index(x)</code></td><td>Returns the index of <code>x</code></td></tr>
<tr><td style="text-align:center"><code>l.remove(x)</code></td><td>Removes the first instance of <code>x</code> in the list</td></tr>
</table>

\*\* Note: Methods such as `list.append()`, `list.extend()`, etc. perform in-place operations, and do not return the updated list. They return `None`. So be careful about assigning the results of these methods to a variable and expecting a list.

see [https://docs.python.org/2/tutorial/datastructures.html](https://docs.python.org/2/tutorial/datastructures.html) for more details

In [19]:
help(list.index)

Help on method_descriptor:

index(self, value, start=0, stop=9223372036854775807, /)
    Return first index of value.
    
    Raises ValueError if the value is not present.



In [20]:
help(list.sort)

Help on method_descriptor:

sort(self, /, *, key=None, reverse=False)
    Sort the list in ascending order and return None.
    
    The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
    order of two equal elements is maintained).
    
    If a key function is given, apply it once to each list item and sort them,
    ascending or descending, according to their function values.
    
    The reverse flag can be set to sort in descending order.



In [21]:
## Lists can contain multiple different data types
list1 = [1,2,3.5,'A','B','C']
list1

[1, 2, 3.5, 'A', 'B', 'C']

In [22]:
x = list1.append('D')
if x is None:
    print("x is None")

x is None


In [23]:
list1

[1, 2, 3.5, 'A', 'B', 'C', 'D']

In [24]:
## Use the len() function to get the length of any 
## sequence variable (list, string, tuple, etc.)
len(list1)

7

In [25]:
## You can access list elements with an index (starting with 0)
list1[0]

1

In [26]:
## You can access a subset of a list using slice notation
## The first index is inclusive, and the second is exclusive, 
## so [0:3] will give you the first, second and third elements
list1[0:3]

[1, 2, 3.5]

In [27]:
## You can also specify a 'step' to get, for instance,
## every second element in the list. [0:-1:2] will give you
## the first, third, fifth, etc. element excluding the last 
## element of the list [-1].
list1[0:5:2]

[1, 3.5, 'B']

In [28]:
list1

[1, 2, 3.5, 'A', 'B', 'C', 'D']

In [29]:
## A negative index starts from the end of the list 
## (-1 being the last element)
list1[-1]

'D'

In [30]:
## To specify the entire list don't enter start or end indices
list1[:]

[1, 2, 3.5, 'A', 'B', 'C', 'D']

### Copying Lists

When you assign a list variable to another variable, you are simply creating a reference to the first list. This means that if the values of the first list are changed, so are the values of the second list. To create a new list that is a copy of another list, you should use the `list()` function or slice notation (see below). The `copy.deepcopy()` method can be used to copy lists of objects.

In [31]:
## Create a new list
A = [1,2,3,4]
## Assign list A to a new variable name
B = A
B

[1, 2, 3, 4]

In [32]:
## Create a copy of list A
C = list(A)
C

[1, 2, 3, 4]

In [33]:
## Another way to create a copy of list A
D = A[:]
D

[1, 2, 3, 4]

In [34]:
## Change the value of the third element in list A
A[2] = 16
A

[1, 2, 16, 4]

In [35]:
## The value of list B changes also! List B is simply a reference 
## to list A (a new name for the same entity).
B

[1, 2, 16, 4]

In [36]:
## List C is a new list (a copy of A) and its values did not change
C

[1, 2, 3, 4]

In [37]:
## Same for list D
D

[1, 2, 3, 4]

In [38]:
## Make a list of lists
E = [A, C]
E

[[1, 2, 16, 4], [1, 2, 3, 4]]

In [39]:
## Use multiple indices to indicate, for example, the first element 
## of the first list
E[0][2]

16

In [40]:
## When using list() to create a copy of a list that contains objects 
## a new list will be created, but the elements of the list will
## still be references to the original objects
F = list(E)

In [41]:
## Using the deepcopy() method will create a copy of a list and any 
## objects (e.g. other lists) that are elements of that list
import copy
G = copy.deepcopy(E)
G

[[1, 2, 16, 4], [1, 2, 3, 4]]

In [42]:
## Change the last element of list C
C[-1] = 20
C

[1, 2, 3, 20]

In [43]:
## The value of F changes (the second list is still a reference to C)
F

[[1, 2, 16, 4], [1, 2, 3, 20]]

In [44]:
## But the value of G does not change, since all elements were 
## copied using the deepcopy() method
G

[[1, 2, 16, 4], [1, 2, 3, 4]]

### Sets

Sets are unordered collections of unique (non-duplicate) items. Sets are mutable, and are very useful for comparing collections of items.

#### Set Methods

<table align="left">
<tr><td style="text-align:center"><b>Method</b></td><td style="text-align:center"><b>Equivalent To</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>s.union(x)</code></td><td style="text-align:center"><code>s | x</code></td><td>Returns a new set containing the elements of<code>s</code>and the elements of<code>x</code></td></tr>
<tr><td style="text-align:center"><code>s.intersection(x)</code></td><td style="text-align:center"><code>s & x</code></td><td>Returns a new set containing only the elements in both<code>s</code>and<code>x</code></td></tr>
<tr><td style="text-align:center"><code>s.difference(x)</code></td><td style="text-align:center"><code>s - x</code></td><td>Returns a new set containing elements in<code>s</code>but not in<code>x</code></td></tr>
<tr><td style="text-align:center"><code>s.symmetric_difference(x)</code></td><td style="text-align:center"><code>s ^ x</code></td><td>Returns a set containing elements in either<code>s</code>or<code>x</code>, but not both</td></tr>
<tr><td style="text-align:center"><code>s.issubset(x)</code></td><td style="text-align:center"><code>s <= x</code></td><td>Returns<code>True</code>if<code>x</code>contains all elements of<code>s</code></td></tr>
<tr><td style="text-align:center"><code>s.issuperset(x)</code></td><td style="text-align:center"><code>s >= x</code></td><td>Returns<code>True</code>if<code>s</code>contains all elements of<code>x</code></td></tr>
</table>

#### Other Set Operations
<table align="left">
<tr><td style="text-align:center"><b>Operation</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>s.add(y)</code></td><td>Adds the value<code>y</code>to the set<code>s</code></td></tr>
<tr><td style="text-align:center"><code>s.remove(y)</code></td><td>Removes the value<code>y</code>from<code>s</code>. Returns an error if<code>y</code>is not in<code>s</code>.</td></tr>
<tr><td style="text-align:center"><code>s.discard(y)</code></td><td>Removes the value<code>y</code>from<code>s</code>if present</td></tr>
<tr><td style="text-align:center"><code>s.update(z)</code></td><td>Updates the set<code>s</code>by adding all elements from set<code>z</code></td></tr>
<tr><td style="text-align:center"><code>s.clear()</code></td><td>Removes all elements from<code>s</code></td></tr>
</table>

see <a href="https://docs.python.org/2/library/stdtypes.html#set">https://docs.python.org/2/library/stdtypes.html#set</a> for more details

In [45]:
## Create a set using the set() function
s1 = set(['blue', 'yellow', 'red'])
s2 = set(['orange', 'green', 'blue'])

## Curly braces can also be used
s3 = {'pink', 'purple'}

In [46]:
## Get the intersection of two sets
s1 & s2

{'blue'}

### Tuples
Tuples are immutable sequences. Tuples are similar to lists, but you can't change individual elements.

In [47]:
## Tuples are created by placing parentheses around 
## a sequence of values separated by commas
t1 = (1,2,3,4)
t1

(1, 2, 3, 4)

In [48]:
## To create a tuple with a single value, place a comma at the end
t2 = ('A',)
t2

('A',)

In [49]:
## You can't change the elements of a tuple
t1[0] = 3

TypeError: 'tuple' object does not support item assignment

### Strings

Strings are immutable sequences of characters, and can be accessed much like lists. There are, of course, numerous methods that are specific to strings. Below are a few useful string methods

#### String Methods
<table align="left">
<tr><td style="text-align:center"><b>Operation</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>str.find(sub)</code></td><td>Returns the lowest index where substring<code>sub</code>is found in<code>str</code></td></tr>
<tr><td style="text-align:center"><code>str.replace(old, new)</code></td><td>Returns a copy of<code>str</code>with all occurences of substring<code>old</code>replaced by<code>new</code>.</td></tr>
<tr><td style="text-align:center"><code>str.split([sep])</code></td><td>Returns a list containing substrings of<code>str</code>using<code>sep</code>as the delimiter.</td></tr>
<tr><td style="text-align:center"><code>str.join(iterable)</code></td><td>Returns a string that is a concatenation of the elements in<br /><code>iterable</code>. The separator between elements is<code>str</code>.</td></tr>
<tr><td style="text-align:center"><code>str.strip([chars])</code></td><td>Returns a copy of <code>str</code> with leading and trailing<code>chars</code>removed. <br />If<code>chars</code>is not specified, whitespace will be removed.</td></tr>
</table>

see [https://docs.python.org/2/library/stdtypes.html#string-methods](https://docs.python.org/2/library/stdtypes.html#string-methods) for more details

In [50]:
## Define a string by placing quotes around a sequence of characters
s1 = "hello"
s1

'hello'

In [51]:
## Use the str() function to convert values to string
str(5)

'5'

In [52]:
## Modify the string by replacing a substring
s2 = s1.replace('h', 'H')

In [53]:
s2

'Hello'

In [54]:
## Create a list containing a string's characters
list(s1)

['h', 'e', 'l', 'l', 'o']

### Operations for Sequence Data Types

<table align="left">
<tr><td style="text-align:center"><b>Operation</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>s + y</code></td><td>Concatenates <code>s</code> and <code>y</code></td></tr>
<tr><td style="text-align:center"><code>s * n</code></td><td>Returns <code>n</code> copies of <code>s</code></td></tr>
<tr><td style="text-align:center"><code>v1, v2, v3 = s</code></td><td>Variable unpacking</td></tr>
<tr><td style="text-align:center"><code>x in s</code> <br /> <code>x not in s</code></td><td>Determine membership. Returns <code>True</code> or <code>False</code>.</td></tr>
<tr><td style="text-align:center"><code>len(s)</code></td><td>Returns the length of <code>s</code>.</td></tr>
<tr><td style="text-align:center"><code>min(s)</code></td><td>Returns the minimum value in <code>s</code>.</td></tr>
<tr><td style="text-align:center"><code>max(s)</code></td><td>Returns the maximum value in <code>s</code>.</td></tr>
<tr><td style="text-align:center"><code>sum(s)</code></td><td>Returns the sum of <code>s</code> (elements of <code>s</code> must be numeric).</td></tr>
</table>

In [55]:
## Concatenate multiple strings together
s1 + ' ' + s2

'hello Hello'

In [56]:
## Does s1 contain the letter A?
'e' in s1

True

### String Formatting

The </code>%</code> operator can be used to control the formatting of string. This is useful for making output more human-readable. The format is as follows:

    "<some-string> %<modifier><conversion-specifier>" % (<a tuple of values>)

#### Conversion Specifiers
<table align="left">
<tr><td style="text-align:center"><b>Character</b></td><td><b>Output Format</b></td></tr>
<tr><td style="text-align:center"><code>d</code> or <code>i</code></td><td>Integer decimal</td></tr>
<tr><td style="text-align:center"><code>o</code></td><td>Octal</td></tr>
<tr><td style="text-align:center"><code>x</code></td><td>Hexadecimal</td></tr>
<tr><td style="text-align:center"><code>f</code></td><td>Floating point decimal</td></tr>
<tr><td style="text-align:center"><code>e</code></td><td>Floating point exponential format</td></tr>
<tr><td style="text-align:center"><code>s</code></td><td>String</td></tr>
</table>

#### Formatting Modifiers
1. A number specifying minimum field width
2. A `.` separating the field width from the precision number
3. A precision number specifying the number of characters to be printed from a string (or the number of digits after the decimal point for a floating point number)

In [57]:
## Create a string using the % operator
s2 = "My favorite number is %d" % (5,)
s2

'My favorite number is 5'

In [58]:
## More complicated formatting
s3 = "This is %s displayed to 5 decimal points: '%10.5f'" % ("pi", 3.14159265359)
s3

"This is pi displayed to 5 decimal points: '   3.14159'"

#### Another Way to Format Strings

The `str.format()` method can also be used to format strings, using similar specifiers and modifiers as described above. In this case, however, placeholders in your string are denoted by curly braces (`{}`), and `'%'` is replaced with `':'`. The general format for the placeholder (or replacement field) is as follows:

    { [field_name] [! conversion] [: format_spec] }

[https://docs.python.org/3/library/string.html#formatstrings](https://docs.python.org/3/library/string.html#formatstrings)

[https://docs.python.org/3/library/string.html#formatspec](https://docs.python.org/3/library/string.html#formatspec)

In [59]:
## Format a string using the format() method
pi = 3.14159265359
s4 = "This is {name} displayed to 5 decimal points: '{pi:10.5f}'"
s4.format(name="pi", pi=pi)

"This is pi displayed to 5 decimal points: '   3.14159'"

#### The newest way to format strings: f-strings

[https://peps.python.org/pep-0498/](https://peps.python.org/pep-0498/)

[https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes)

In [60]:
import datetime
name="Bob"
birthday = datetime.date(1990, 1, 2)
msg = f"My name is {name} and my birthdate is {birthday:%A, %B %d, %Y}"
msg

'My name is Bob and my birthdate is Tuesday, January 02, 1990'

#### Escape Characters
<table align="left">
<tr><td style="text-align:center"><b>Character</b></td><td><b>Output</b></td></tr>
<tr><td style="text-align:center"><code>\\</code></td><td>Backslash</td></tr>
<tr><td style="text-align:center"><code>\'</code></td><td>Single quote</td></tr>
<tr><td style="text-align:center"><code>\"</code></td><td>Double quote</td></tr>
<tr><td style="text-align:center"><code>\n</code></td><td>Newline</td></tr>
<tr><td style="text-align:center"><code>\t</code></td><td>Tab</td></tr>
</table>

In [61]:
print("Line1\nLine2")

Line1
Line2


## Mapping Data Types

Dictionaries are unordered collections of<code>key:value</code> pairs. Keys must be hashable values (immutable data types). 

#### Dictionary Methods
<table align="left">
<tr><td style="text-align:center"><b>Method</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center"><code>d.get(key[, default])</code></td><td>Returns the value associated with<code>key</code> in<code>d</code>. If<code>key</code> doesn't exist, returns<code>default</code>, or<code>None</code> if<code>default</code> is not specified.</td></tr>
<tr><td style="text-align:center"><code>d.keys()</code></td><td>Returns a list of the keys of<code>d</code></td></tr>
<tr><td style="text-align:center"><code>d.values()</code></td><td>Returns a list of the values of<code>d</code></td></tr>
<tr><td style="text-align:center"><code>d.items()</code></td><td>Returns a list of the keys:value pairs as tuples.</td></tr>
<tr><td style="text-align:center"><code>d.iteritems()</code></td><td>Returns an iterator over the dictionary's<code>(key, value)</code> pairs.</td></tr>
<tr><td style="text-align:center"><code>d.update(y)</code></td><td>Updates<code>d</code> with key:value pairs of<code>y</code> overwriting existing keys</td></tr>
<tr><td style="text-align:center"><code>d.pop(key[,default])</code></td><td>If<code>key</code> is in<code>d</code> remove it and return its value. If the key does not exist, returns<code>default</code>, if provided, otherwise an error.</td></tr>
</table>

See [https://docs.python.org/2/library/stdtypes.html#mapping-types-dict](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict) for more details

In [62]:
## Create a dictionary by placing curly brackets {}
## around a comma delimited list of key:value pairs
d1 = {'A':'blue', 'B':'red'}
d1

{'A': 'blue', 'B': 'red'}

In [63]:
print(d1)

{'A': 'blue', 'B': 'red'}


In [64]:
## Or call the dict() function on a list of tuples
d2 = dict([('A', 'blue'), ('B', 'red')])
d2 == d1

True

In [65]:
## Get all keys and values as a list of tuples
d1.items()

dict_items([('A', 'blue'), ('B', 'red')])

In [66]:
## Access the dictionary's values
d1['A']

'blue'

In [67]:
d1.get('A')

'blue'

## Python 2.7 vs. Python 3

For the most part, the code you see in the course materials should be compatible with both Python 2.7 and Python 3. Python 2.7 is still the default version installed on many systems and some third-party packages may not yet be available for Python 3. However, Python 3 is over five years old now and is being used more and more. It's important to understand the differences, so you'll be ready for the future.

A good way to start getting your code ready for Python 3, while still using 2.7, is to make use of the [future compatibility package](http://python-future.org/index.html). For example, the following code imports two features that make Python 3 different from Python 2.7: the print function (no longer a print statement), and the division operator (no longer integer division).

In [68]:
from __future__ import print_function, division

#### ** I suggest that you use the `__future__` package, as in the examples above, to make your code compatible with both Python 2.7 and Python 3. And please make it clear (use comments to document your code) what version of Python you are using for your assignments.

## References

- <u>Programming Languages</u>, Ravi Sethi, 2nd Edition, Addison‐Wesley (1996)
- <u>Problem Solving, Abstraction and Design Using C++</u>, Frank Friedman, Elliot Koffman, 4th Edition, Addison-Wesley (2004)
- <u>Python for Bioinformatics</u>, Sebastian Bassi, CRC Press (2010)

#### Last Updated: 14-Sep-2022