# <h1><b>Python for Life Sciences</b></h1>

***

# *Introduction*

Python is a flexible programming language that has been broadly accepted for research in the life sciences especially to organise and make sense of the huge amount of data already available and also generated daily. 

The main reasons for the popularity of Python are:

 - its abilities to process, analyse and visualize data. 
 - it is easy to learn, and 
 - it lets users fulfil many duties with few lines of code.
    
In this notebook you'll first get introduced to basic concepts for programming with Python. 

**At the end of this session, you'll know:**
 - Basic data types in Python
 - How to operate upon these data types

The interactive nature of this notebook lets you follow along with your own code and try out new things with Python.  You will hear people referring to Python as Python3.  Python3 is the latest version of Python and the one we are using.


# *Required Basic Ideas in Python*
   - Identifiers
   - Numbers
   - Strings
   - Lists
   - Tuples
   - Sets
   - Dictionaries
   - Classes
   - Functions



In [1]:
#A Simple Command for The Length of A String
len('MNKMDLVADVAEKTDLSKAKATEVIDAVFA')

30

## **Identifiers**

A Python identifier is a name used to identify any object. An official identifier either begins with any letter (A to Z or a to z) or starts with an underscore(_)that is followed by a letter or number(0 to 9).

Punctuation symbols such as @, $, and % are not allowed within Python identifiers. Python is a **case-sensitive programming** language; hence, **Standard** and **standard** are two separate identifiers in Python. 
Given below are the name conventions for Python identifiers that coordinate with the syntax:
- Class names begin with an upper-case letter; all other identifiers can start with a lower-case letter.
- The beginning of an identifier with one underscore represents a private identifier.
- The beginning of an identifier with two underscores represents an extremely private identifier.
- If the identifier also ends with two underscores, it indicates a language-defined special name.

## **Numbers**
These data types are different numeric values that can be used to deal with all kinds of mathematical calculations. Numbers are represented in various forms:
- *Integer*: A whole number (with + or - predominant symbols).
- *Long*: A whole number with an extensive digit size.
- *Float*: A real number having a decimal point.
- *Complex*: An imaginary number, given through square root of (-1).

### **Basic operations**
Here are a few valid examples of Basic Operations with Real Number inputs and outputs:

In [2]:
x, y = 3, 4   ## Python allows multiple assignments!
a, b = 5, 2

print("Sum of x and y: ",x+y)  # Addition
print("Difference of x and y: ", x-y)  # Subtraction
print("Product of a and b: ", a*b)  # Product
print("Integer quotient of a and b: ", a//b) # Note the // operator for integer division
print("Floating point quotient of a and b: ", a/b)  # Floating point division
print("--------------------------")

# Floating point numbers
e = 4e-2; f = 1e-5 # product of a mantissa and 10 raised to an exponent
print("Floating point quotient of e and f: ", e/f)  # Floating point division


Sum of x and y:  7
Difference of x and y:  -1
Product of a and b:  10
Integer quotient of a and b:  2
Floating point quotient of a and b:  2.5
--------------------------
Floating point quotient of e and f:  3999.9999999999995


You might have expected 4000 as the answer (and correctly so) but floating point calculations have small rounding errors that are unavoidable.  Python offers the **_round()_** function to round decimals to a certain precision.  For instance, the following code snippet rounds our quotient to two places of decimal.

In [3]:
print("Floating point quotient of e and f: ", round(e/f,2))  # Floating point division

Floating point quotient of e and f:  4000.0


#### **Method to Deal with Division by Zero(0)**
Division by 0 is invalid. But there is a way to handle it!!

In [4]:
x = 5; y = 0   ## Note multiple statements on a line
# Let us divide by 0 and see what happens
x/y

ZeroDivisionError: division by zero

As expected we got an error.  In Python lingo this is called an `exception` and Python has techniques to handle such errors that cause surprise exits in your program.

Let us try the same division by 0 but this time with more graceful handling of the error.

In [None]:
try: 
    print(x/y)
except:
    print("Division by 0 is not allowed!")

This was basic _exception handling_. The division by zero was the 'exception'.

#### **Complex numbers**
Complex numbers have a real and an imaginary part and find use in certain research disciplines.

In [24]:
z1 = complex(1,2)
z2 = complex(3,4)
print("Real and Imaginary parts: ",z1.real, z2.imag)
print("Complex conjugate: ", z1.conjugate())
print("Product: ", z1*z2)
print("Quotient: ", z1/z2)

Real and Imaginary parts:  1.0 4.0
Complex conjugate:  (1-2j)
Product:  (-5+10j)
Quotient:  (0.44+0.08j)


#### **Logical Operations**
- Python normally deals with 'truth values'. 
- Truth values rely on valid mathematical laws with _Boolean Logic_.
- The standard _Boolean operators_ are **not**, **and**, and **or**.
- The operators cannot be replaced by any special symbols in Python.

Given below is a list of operators that can be used for _Boolean Expressions_.

In [25]:
#Opposite results
print(not True)  # False
print(not False) # True

#Valid Set of Options only when All are True.   
print(True and True) # True
print(True and False) # False
print(False and False) # False

#Options provided from which a True option is preferred to be the result.
print(True or True) # True                 
print(True or False) # True
print(False or True) # True
print(False or False) # False

False
True
True
False
False
True
True
True
False


### **Comparison Operators**
Six valid _Comparison Operators_ (with Boolean Results) exist apart from Boolean Operators. The list of symbols that give Boolean Results is as follows:
[==, >, >=, !=, <=, <]. 
The logical and comparison operations require practice for familiarity. This may involve the use of the Python Interpreter until It becomes entirely reasonable.

The operations can be used often. Given are few examples of applications:

In [26]:
3 == 7//2  #checks if the first given value is equal to the quotient of the latter calculation
4 > 14%5   #checks if the former value is greater than the remainder of the latter calculation

#symbol checks inequality or disparity
'five' != 'three'
'two' != 'two' 

#Given Output represents the Final Result of the last example.

False

### Your turn now!

Do this short quiz to test your understanding.

1. Add 1 and 2.5.  What do you get?

2. Subtract 2 from 4. What is the difference?

3. Multiply 3 with 5.2. What is the product?

4. Divide 9.1 by 4. What is the quotient?


In [30]:
# Your code here (as u like)!!!
print(1+2.5)         #(1.)
print(4-2)           #(2.)
print(3*5.2)         #(3.)
print(9.1/4)         #(4.)

3.5
2
15.600000000000001
2.275


## **Strings**
Strings are the most commonly used data type in any programming language, including Python. Plainly explained, strings are printable characters shown within quotes.

In [31]:
# A string-syntax sample in Python:
example = "Sample string"
print(example)

Sample string


## **String Operations**
Four binary operators act on strings: *in*, *not in*, *+*, and _*_.
The first three operators require both components(operands) to be valid strings. The fourth operator can work for a string or an integer.
A single-charactered substring (part of a string) can be taken by subscription and a longer substring by slicing. Both operations can be observed using square brackets.

### **String Operators**
The _in_ and _not in_ operators check if the first string is a part of the latter one (present at any position). The final result of these operators is either _True_ or _False_.

Here are examples of valid applications of such String Operators:


In [32]:
#checks Presence of former string(codon) within latter one(genetic sequence) at last line
'AGT' in 'ACGCATAGTCATG'   
'ATT' in 'ACGCATAGTCATG' 

False

In [33]:
#checks Absence of former string(codon) within latter one(genetic sequence) at last line
'AGT' not in 'ACGCATAGTCATG'   
'ATT' not in 'ACGCATAGTCATG' 

True

#### **Symbolic Operators that work with Strings:**
These are examples of number operations we can perform on strings:

In [34]:
str1 = "Hello"; str2 = "World"
print("Concatenation: ", str1+" "+str2) # Note the insertion of a blank space between words
print("Case conversion: ", str1.upper()+" "+str2.upper())
print("String reversal: ", str1[::-1] + " " + str2[::-1]) # Reverse and then concatenate
print("String reversal: ", (str1 + " " + str2)[::-1]) # Concatenate and then reverse

# We will have to wait a bit to understand this 'strange' notation

Concatenation:  Hello World
Case conversion:  HELLO WORLD
String reversal:  olleH dlroW
String reversal:  dlroW olleH


##### **Concatenation**
_Concatenation_ is the process that joins or links sequences end to end. 

_Concatemer/Concatenate_ were initially synonyms for an intermediate structure during the biosynthesis of certain DNA viruses, with many
genomes connected together end to end, and treated like the immediate precursor to the mature viral genome. 

The term(_Concatenate_) is now generalized to include any linear or circular DNA structure composed of viral genomes joined end to end. The term is also used for circular DNAs which are physically inseparable, one being threaded through the other.  
Examples of Concatenation (alongwith other number operations) using general string -- given above.
Examples of Concatenation using biological sequences (Python string syntax) are given below. 

In [35]:
'TC'+'AG'     #Short Sequence pair Concatenation

'TCAG'

In [36]:
'ttt' + 'ccc' + 'aaa' + 'ggg'  #Lower Case Longer Sequence Concatenation 

'tttcccaaaggg'

##### **Multiplication**
A string can be repeated a specific number of times by multiplying it by an integer:

**A Commutative Process that occurs in any Order**

In [37]:
'TC' * 12  

6 * 'TC'

'TCTCTCTCTCTC'

### **Subscription**
_Subscription_ is a process that pulls out a one-component substring of a string. The extraction is depicted with a pair of square brackets enclosing an integer-valued expression called an _index_. The first component in the sequence is at position 0(not 1). 

In [38]:
#Sequence Slicing Syntax(Python3 with sequence of amino acids) 
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[0]

'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[1]

'N'

The index can also be a negative number, in which case the index is counted from the end of the string. The last component is at index −1.

In [39]:
#'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7 // 2]
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-1]
#'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-5]

'A'

![image.png](attachment:6e2e8f66-7de0-45cc-9c57-7bd1c09e2631.png)

### **Slicing**
_Slicing_ is a process that pulls out a series of components from a string. The process is often used to clearly and briefly indicate parts of strings.
The component locations of a slice are indicated by few integers within square brackets, split by colons. The first index represents the location of the first component to be pulled out for slicing. The second index represents the slice end-point; but, the component at that location is excluded from the slice. Given framework, a slice [m:n] would hence be read as “starting from position m up to but excluding component n .”

In [40]:
#Sequence Slicing Syntax (Python3 with same sequence) 
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[1:8]

'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7:-1]

#Final command line in a cell that displays result:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-4:-3]   

'A'

#### **What if (m = n)?**
When both index numbers for a sequence are equal, the final result for string slicing is usually an empty string:

In [41]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7:7] #An example: m = n

''

#### **What if (m > n)?**
When both index numbers for a sequence are not clearly defined(i.e m > n), the final result for string slicing here too is an empty string:

In [42]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-7:-9] #An example: m > n

''

The rules for the process of slicing seem to be more flexible.

If the slice contains the beginning or end of the string, that portion of the slice notation may be cut off. Consider the fact that cutting off the second index is not the same as providing −1 as the second index. 
Cutting off the second index says to go up to the end of the string, one past the last component; whereas −1(n) means go up to the second last component (i.e., up to but excluding the last component):

In [43]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:-1]
#Result:Entire sequence,excluding the last n component(s).

'MNKMDLVADVAEKTDLSKAKATEVIDAVF'

In [44]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:8]
#Result:First n component(s) of given sequence.

'MNKMDLVA'

In [45]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7:]
#Result:Excluding First m component(s) of given sequence.

'ADVAEKTDLSKAKATEVIDAVFA'

In [46]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7:-1]
#Result:Excluding First m and last n component(s) of given sequence.

'ADVAEKTDLSKAKATEVIDAVF'

In [47]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:]
#Result: Entire string (for no index)

'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'

#### **Three indexes in String Slicing Syntax (m:n:o)**
As explained before, the final sequence result starts from component m(start index), ends at n(stop index) and contains every o^th component of the given sequence (between m and n).The third index(o) shows the number of components to skip after each one that is taken, known as a _step_. When the third index is absent, as usual, the default step is 1, meaning no skip. 
An example:

In [48]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[0:9:2]
#Result: Every even(second) component between m(0) and n(9).

'MKDVD'

##### **What for a Negative Step(o)?**
The step(o) may be a negative integer. When the step is negative,the slice takes components from the sequence in reverse order. In order to get a valid result, when a negative step is given, the start index should be greater than the stop index(i.e m > n). The final result excludes component at the stop index(n).

###### **What if the Stop Index(n) is not given?**
When the Stop index is absent, a longer sequence (with a negative step) is made as the final result in this case.

###### **What if the First Index(m) is absent?**
When the Start index is not there, the slicing protocol(with a negative step) begins from the last component.

###### **Entire Sequence Reversal**
An entirely reversed sequence can be produced by ensuring the absence of two indexes(m and n), along with the presence of a negative step.

Diverse Examples for a Negative Step:

In [49]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[15:0:-3]
#Result:Every third, backward component between m(15) and n(0)

'LKVVM'

In [50]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[15::-3] # No Stop Index
#Result:Longer sequence with same rules

'LKVVMM'

In [51]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:26:-1] # No First Index
#Result:Slicing from the end, upto n^th component of sequence

'AFV'

In [52]:
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[::-1] # Only Third Index
#Result:Entirely reversed sequence

'AFVADIVETAKAKSLDTKEAVDAVLDMKNM'

![image.png](attachment:d8d97acf-c9af-4e0e-9765-1563f5296d0b.png)

### **Lexical Comparison in Python**
This process involves a set of string elements. The lexical sort or comparison of each string component in a set is done on the basis of its name (ascending alphabetical order).

In [22]:
arr = ['one', 'two', 'three','four','five','six','seven','eight']
arr.sort()
print(arr)     #final output based on ascending alphabetical lexical sort of given word array.

'three' > 'eight'   #comparison based on lexical properties of both words in sorted array.

['eight', 'five', 'four', 'one', 'seven', 'six', 'three', 'two']


True

In [23]:
help(str)  ## getting help in Python

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

In the code fragment, a string is alloted to a variable; later, the variable behaves like a repository for the string value. 
The remaining four given below (lists, tuples, sets and dictionaries) are used (by Python coders) as data structures instead of data types. Data structures are primary functional units that arrange data so that it can be used well by the program or application which is being dealt with.
Although the coders mainly prefer using the data structures, they still need to be aware of the primary data types (Eg.numbers, strings).


## **Lists**
Lists are data chains. 
In order to make a list, the components are first confined(separated by commas) inside square brackets.
Every component of a list is naturally given an index number(starting from 0). Each member can be approached using the index so that the regular components of the list can come back.
If long data sets need to be collected and reviewed, its preferred to use lists as a data store in a list makes the withdrawal of statistical data really simple.
An example of a list:

In [20]:
example2= ["Rollno", "Name", "score", "average"] #string list

rollno= [20,40,130,243,513]                                #roll-number list
score= [90,80,89,75,89]
rollno[4]

expr= example2[3]      #Code for access of a specific component within given list
print(expr)
L= len(rollno)
avg= sum(score)/L      #Calculation using the list
print(avg)

average
84.6


## **Tuples**
Tuples are similar to lists, but their content cannot be altered after creation. While creating a tuple, normal brackets are used instead of square ones. If the data needs to be kept in the the read-only format, a format of tuples can be used for data representation. 
Given below is an example of a tuple--


In [53]:
rollno= (20,40,130,243,513)                                #roll-number tuple
print(rollno[3])                                               #invalid syntax

243


In [54]:
rollno[3] = 244 # gives an error!! tuples cannot be changed

TypeError: 'tuple' object does not support item assignment

## **Sets**
A set is another data collection like a list. But, it is unordered and cannot have _duplicate_ items. Unlike lists, no index number can be used to refer to any item of a set. All you can
do is check the membership of an item in a set.

_A Set display_: Comma-separated values and the curly brackets({}) indicate a set; the brackets enclose the elements of the set.

In [5]:
set('TCAGTTAT')           #An alphabetically organized set of elements (Result in Python 3):

{'A', 'C', 'G', 'T'}

Converting the string 'TCAGTTAT' to a set removed all the duplicates and gave us the unique items(characters) in the  string. You will note that sets are represented as comma-separated items, enclosed in curly brackets.  You can also create a set by enclosing a comma-separated list of items in curly brackets as follows:

In [6]:
another_set = {'A','A','C','G','T'}
another_set

{'A', 'C', 'G', 'T'}

Note how duplicate items show up only once.  To create an empty set, do the following:

In [7]:
empty = set()
empty

set()

Empty braces do not create a hollow set, but calling set without arguments does as we saw above.  Empty braces on the other hand create an empty dictionary.

Let's observe a few more diverse examples:

In [8]:
DNApues = {'G','T','A','C'}          #Genetic variables (Purines)
DNApues                              #Output(Result): Alphabetically organized set

{'A', 'C', 'G', 'T'}

In [9]:
RNApyrms = {'G','U','A','C'}          #Genetic variables (Pyrimidines)
RNApyrms                              #Output(Result): Alphabetically organized set

{'A', 'C', 'G', 'U'}

In [10]:
#When a group of strings, held in braces makes a set, no string is broken into components
{'UCAG', 'TCAG'}     

{'TCAG', 'UCAG'}

In [13]:
A = DNApues
B = RNApyrms

In [14]:
diffrA = A - B
print(diffrA)                     #The output is the unique data present only in set A

{'T'}


In [15]:
diffrB = B - A
print(diffrB)                     #The output is the unique data present only in set B

{'U'}


The output in this case is (A $\cap$ B):

In [18]:
print(A.intersection(B))       #The output is the common data present in both sets (A & B)

{'G', 'A', 'C'}


In [19]:
print(A.union(B))       #The output is a set with all data present in A or B (in any order)

{'T', 'U', 'A', 'C', 'G'}


## **Dictionaries**
A dictionary is a data structure with pairs of keys and values in which every value corresponds to a particular key and the value can eventually be accessed using the key. The given code-example represents a key/value pairing syntax:

In [55]:
example4={"Rollno":513, "Name":"Shobha", "score":75, "total":93}
print(example2[2])
print(example2[3])
print(example4["Name"])
print(example4["score"])
print(example4["total"])


score
average
Shobha
75
93


In [56]:
M = {'fruits': 2, 'cake': 1, 'snax': 3}
M['fruits']                              #Code to get a value by key-mention.

2

In [57]:
M                                        #Same order as given input.

{'fruits': 2, 'cake': 1, 'snax': 3}

## **Conclusion**

With its intuitive data structures, and ease of learning, Python gives us researchers the
ability to view, manipulate and process our data in unique ways.

What we have learned so far are the fundamental concepts and programming constructs that
can help us build more advanced techniques to process research data in Life Sciences or for that matter, any field of scientific endeavour.


Systematic methods can be used under a useful software like Python 3 to organize data obtained for research areas. The software is related to an Object-Oriented-Programming Language with a well-defined syntax which is quite open for any modified updates. The influential updates are responsible for the creation of new versions of the software. The broadly explained and effective software version (Python 3) for research in different subject areas is certainly a recent one. 
The different subject areas require organized research in diverse and weighty repositories of data science. The explained software is the most helpful tool for practical usage. A life science is a good example of a diverse repository with maximum research areas that involve the application of Python 3 for the arrangement of all interesting kinds of data.
The random steps for the organization of data in the life sciences are mostly non-linear and diverse. It is due to the variety of these subjects that they do not follow a single linear protocol for any activity.

## Quiz

### Another turn for a set of subtopics -- Lists, Tuples, Sets and Dictionaries
Run this short quiz to check your understanding.

1. List all even integers between 1 and 10.  What do you get?

2. Give an example of consistent data in the form of a Tuple.(Do not try to change It)

3. Convert any String into a Set of Letters.

4. Give a dataset(keys and values) in curly brackets(Dictionary) that represents partly
   organized data.

5. Can you show that (A $\cup$ B)' = A' $\cap$ B'

In [6]:
#Key examples related to Quiz.

list = [2,4,6,8,10]
print(' ')
print('(1.)A list of even numbers (1-10):', list)                  
#-------------------------------

pri_col = ('Blue','Red','Green')            #Tuple with primary colours from a rainbow
print(' ')
print('(2.)A single member of the tuple with a specific index no.:', pri_col[1])     
#------------------------------

aaseq = '''MWNSNLPKPN AIYVYGVANA NITFFKGSDI LSYETREVLL KYFDILDKDE RSLKNALKD LEN PFGFAPYI
RKAYEHKRNF LTTTRLKASF RPTTF'''   #Bacterial Restriction enzyme (Amino acid sequence)
len(aaseq)                       #Length of entire string within quotes (including spaces)
print(' ')
print('(3.)The entire set of unique components within a sequence:', set(aaseq))              
#------------------------------     #Output within curly brackets

aspects = {'name':'mouse', 'group':'mammal', 'genes':30000}
print(' ')
print('(4.) A specific Value as output by Key-mention:', aspects['group'])          

                #-------------------x-------------------


 
(1.)A list of even numbers (1-10): [2, 4, 6, 8, 10]
 
(2.)A single member of the tuple with a specific index no.: Red
 
(3.)The entire set of unique components within a sequence: {'N', 'W', 'K', 'P', 'V', 'H', 'L', 'I', '\n', 'S', 'Y', 'A', 'M', 'R', 'G', 'E', 'T', ' ', 'F', 'D'}
 
(4.) A specific Value as output by Key-mention: mammal
