<small><small><i>
All of these python notebooks are available at https://github.com/kipkurui/Python4Bioinformatics
</i></small></small>

# Getting started

Python can be used as a calculator. Simply type in expressions to get them evaluated.

## Basic syntax for statements 
The basic rules for writing simple statements and expressions in Python are:

* No spaces or tab characters allowed at the start of a statement: Indentation plays a special role in Python (see the section on control statements). For now, simply ensure that all statements start at the beginning of the line.
* The '#' character indicates that the rest of the line is a comment
* Statements finish at the end of the line:
* Except when there is an open bracket or parenthesis:

```python
1+2
+3 #illegal continuation of the sum

(1+2
        + 3) # perfectly OK even with spaces
```
* A single backslash at the end of the line can also be used to indicate that a statement is still incomplete 
```python
1 + \
2 + 3 # this is also OK
```
The jupyter notebook system for writing Python intersperses text (like this) with Python statements. Try typing something into the cell (box) below and press the 'run cell' button above (triangle+line symbol) to execute it.


In [7]:
(1+3 
+4)

8

In [2]:
1+3\
+4

8

In [10]:
1+2+3 #doing math

6

### Your First Code

In [11]:
print("Hello World!")

Hello World!


In [15]:
print("My name's Caleb")

My name's Caleb


Notice the syntax. The keyword `print` is a built-in command, and 'Hello World' is a string. In Bioinformatics, a string example would be DNA or amino acids sequence. 

For example:

In [14]:
print('ACGTACTAG')

ACGTACTAG


In [17]:
input(print('What is your name?'))

What is your name?


None caleb


'caleb'

In [18]:
print(input('What is your name?'))

What is your name? Caleb


Caleb


You can write a program that asks for a DNA sequence, and prints it out. This is as simple as:

In [19]:
print(input("Please Enter a DNA sequence: "))

Please Enter a DNA sequence:  ACGTAGATACGAT


ACGTAGATACGAT


In [20]:
print(input("Please Enter aa sequence: "))

Please Enter aa sequence:  thyysewqhrlfjdskfhds


thyysewqhrlfjdskfhds


As we go along, we'll learn how to check if the user has entered a valid DNA or Amino acid sequence. For, now, let's learn some basics. 

### Getting help
Python has extensive help built in. You can execute **help()** for an interactive help session or **help(x)** for any library, object or type **x** to get more information. For example:

In [21]:
help()


Welcome to Python 3.7's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at https://docs.python.org/3.7/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".



help>  print


Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



help>  q



You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)".  Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.


In the interactive session above, enter **print**. Alternatively, you can obtain the same information by typing:

In [29]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [22]:
?input(prompt='Enter your name: ')

[0;31mSignature:[0m [0minput[0m[0;34m([0m[0mprompt[0m[0;34m=[0m[0;34m''[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Forward raw_input to frontends

Raises
------
StdinNotImplentedError if active frontend doesn't support stdin.
[0;31mFile:[0m      ~/miniconda3/envs/bioinf/lib/python3.7/site-packages/ipykernel/kernelbase.py
[0;31mType:[0m      method


In [12]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



You can also print many items, and specify a separator. This is the delimiter. Below, we are using tab. 

In [13]:
print('Name', 'ID', 'Age', 'Gender', sep='\t')

Name	ID	Age	Gender


# Variables & Values

A name that is used to denote something or a value is called a variable. In Python, variables can be declared and values can be assigned to it as follows:

In [14]:
a =  2+3

In [15]:
a

5

In [33]:
x = 2          # anything after a '#' is a comment
y = 5
xy = 'Hey'
?print(x+y, xy, sep="\t")

[0;31mDocstring:[0m
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
[0;31mType:[0m      builtin_function_or_method


Multiple variables can be assigned with the same value.

In [41]:
print('test', 'one', 'two', end='\t')
print('test', 'one', 'two', end='\t', flush=True)

test one two	test one two	

In [17]:
x = y = 1
print(x,y)

1 1


To understand how Python asigns variables we will use: http://www.pythontutor.com/visualize.html. We'll use this to visualize what goes on behind the scene as you assign a value to a variable. 

### Datatypes
The basic types build into Python include `float` (floating point numbers), `int` (integers), `str` (unicode character strings) and `bool` (boolean). Some examples of each:

#### Intergers

Their type is `int`, and they can have as many digits as you want.

In [50]:
type(1) == type(1.0) #simple interger

False

In [19]:
-12 #a negative integer

-12

In [20]:
+123 # A positive interger

123

In [52]:
True, False

(True, False)

#### String

A string is enclosed in a pair of single or double quotes.

In [21]:
dna="ATCGTAGTACGGTA"
type(dna)

str

When you have long strings, you can enclose with triple double quotes. This allows for spaces and new lines. 

In [49]:
aa = """MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS
LFPGEELK SLLKKTPDVV KAFPLLLAVR DESISLLD"""

print(aa)

type(aa)

MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS
LFPGEELK SLLKKTPDVV KAFPLLLAVR DESISLLD


str

#### Float

Used to represent floating point numbers, which always have a decimal point and a number afterward. 

In [23]:
2.0           # a simple floating point number

2.0

#### Booleans

There are only two Boolean values: True and False.

In [24]:
True or False # the two possible boolean values

True

In [25]:
'AT' in dna

True

# Operators

## Arithmetic Operators

| Symbol | Task Performed |
|----|---|
| +  | Addition |
| -  | Subtraction |
| /  | division |
| %  | mod |
| *  | multiplication |
| //  | floor division |
| **  | to the power of |

When one of the numbers in the operation is a float, the result is also a float. 

In [53]:
2.0 + 1

3.0

In [27]:
1+2

3

In [28]:
2-1

1

In [54]:
1*2.0

2.0

In [30]:
3/4

0.75

In many languages (and older versions of python) 1/2 = 0 (truncated division). In Python 3 this behaviour is captured by a separate operator that rounds down: (ie a // b$=\lfloor \frac{a}{b}\rfloor$)

In [31]:
3//4.0

0.0

The mudulo `%` returns the remainder after division.

In [32]:
15%10

5

Python natively allows (nearly) infinite length integers while floating point numbers are double precision numbers:

In [33]:
11**300

2617010996188399907017032528972038342491649416953000260240805955827972056685382434497090341496787032585738884786745286700473999847280664191731008874811751310888591786111994678208920175143911761181424495660877950654145066969036252669735483098936884016471326487403792787648506879212630637101259246005701084327338001

In [34]:
11.0**300

OverflowError: (34, 'Numerical result out of range')

## Relational Operators

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| > | greater than |
| <=  | less than or equal to |
| >=  | greater than or equal to |

Note the difference between `==` (equality test) and `=` (assignment)

In [35]:
z = 2
z == 2

True

In [36]:
z > 2

False

Comparisons can also be chained in the mathematically obvious way. The following will work as expected in Python (but not in other languages like C/C++):

In [37]:
0.5 < z <= 1

False

### String Operations

Four binary operators act on strings:**in** , **not in** , **+** , and **\***

Let's use the mitochondrial tRNA (NCBI Reference Sequence: NC_012920.1) to practice with string operations: NCBI Reference Sequence: NC_012920.1

In [55]:
trna='AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTTA'

In [56]:
# we can check if a given motif is in sequence

'ATTAA' in trna

True

In [40]:
# We can also check if a given motif is absent

'GGCTGTT' not in trna

True

In [41]:
#we can concatentate two strings

'ATTAA' + 'GGCTGTT'

'ATTAAGGCTGTT'

In [42]:
# Create a long string from a substring by multiplying with an integer
'GGCTGTT' * 4

'GGCTGTTGGCTGTTGGCTGTTGGCTGTT'

We'll continue with string formatting in the next lecture. 