### NOTE FOR LUCA

**Remember to set/remove metadata as:**
{
  "nbsphinx": "hidden"
}

to enable/disable solutions view


# Practical 12

In this practical we will will see how we can use the object oriented programming paradigm in Python and the functional paradigm through lambda functions.

## Slides

The slides of the introduction can be found here: [Intro](docs/Practical12.pdf)

## Object Oriented Programming

As seen in the lecture, Python is a multi-paradigm language and it supports in fact the **imperative/procedural** paradigm (programs are sequences of statements that change the state of the system), the **functional** paradigm (programs are seen as mathematical functions, e.g. list comprehensions), some libraries are **declarative** (they define the logic without specifying the control-flow e.g. Matplotlib) but is also **Object Oriented**. In fact **everything in Python is an object.** Moreover, as we will see, new data-types can be defined in Python.

In Object Oriented Programming (OOP) objects are **data structures that contain data, which is attributes and functions to work with them**. In OOP programs are made by a set of objects that interact with each other.

OOP allows to create a distinction (*abstraction*) between the way objects are **implemented** and how objects are **used** (i.e. what we can do with them).

### Classes, Methods and Objects

The three key players of OOP are: **classes**, **objects** and **methods**.  

**Classes** (types) are an abstraction that captures: 

1. the internal data representation (i.e. data attributes that are called **fields**)
2. the interface to interact with the class (i.e. functions that can be used to manipulate the the **methods**). 


**Objects** are **instances** of classes. Classes define the structure and are used to create objects. Objects are a *concrete* realization of the class, a real instance based on the footprint of the class. Programs are interactions among different objects. 


**Methods** are functions that can be applied to manipulate objects.

Attributes and methods within an instantiated object can be accessed by using the ```. ``` (dot) operator.


### Self
Within a class method, we can refer to that very same instance of the object being created by using a special argument that is called ```self```. **self** is always the first argument of each method.


<div class="alert alert-info">

**Important note:** All data types seen so far are in fact classes and every time that we used a data type (e.g. defining a list, a string etc.) we were in fact instantiating an object of that type (class).

</div>


### Definition of a class

The syntax to define a class is the following:
```
class class_name:
        #the initilizer method
        def __init__(self, val1,...,valn):
            self.att1 = val1
            ...
            self.attn = valn
        
        #definition of a method returning something
        def method1(self, par1,...,parn):
            ...
            return value
        
        #definition of a method returning None
        def method2(self, par1,...,parn):
            ...
```        
In this case we defined a **class** ```class_name```  that has ```att1,..., attn``` (attributes) **fields** and two methods ```method1``` with parameters ```par1,...,parn``` returning a **value** ```value``` and a method ```method2``` with parameters ```par1,...,parn``` that does not return anything. 

The values of the fields are **initialized when the object is instantiated** at the beginning by calling the **\_\_init\_\_** method, which does not return anything. Note also the use of ```self``` in the **initializer**, which is used to specify that each field **of this instance** has to be given the corresponding value and must always be the first argument of the initializer.  

The object is instantiated with:
```
my_class = class_name(p1,...,pn)
```
which attributes the values ```p1,...,pn``` to the fields ```field1,...,fieldn```.


**Example:**
Let's define a simple class rectangle with two fields (length and width) and two methods (perimeter and area).

In [1]:
import math

class Rectangle:
    def __init__(self, l,w):
        self.length = l
        self.width = w
        
    def perimeter(self):
        return 2*(self.length + self.width)
    
    def area(self):
        return self.length * self.width
    
    def diagonal(self):
        return math.sqrt(self.length**2 + self.width**2)

R = Rectangle(5,10)
print(type(R))
R1 = Rectangle(5,10)
print(type(R1))
print("R == R1? {} id R:{} id R1:{}".format(R == R1, 
                                            id(R),
                                            id(R1)))
p = R.perimeter()
a = R.area()
d = R.diagonal()
print("\nR:\nLength: {} Width: {}\nPerimeter: {}\nArea:{}".format(R.length,
                                                                  R.width,
                                                                  p,
                                                                  a))
print("R's diagonal: {:.2f}".format(d))
R2 = Rectangle(72,13)
p = R2.perimeter()
a = R2.area()
d = R2.diagonal()
print("\nR2:\nLength: {} Width: {}\nPerimeter: {}\nArea:{}".format(R2.length,
                                                                   R2.width,
                                                                   p,
                                                                   a))
print("R's diagonal: {:.2f}".format(d))

<class '__main__.Rectangle'>
<class '__main__.Rectangle'>
R == R1? False id R:140619190253720 id R1:140619190253776

R:
Length: 5 Width: 10
Perimeter: 30
Area:50
R's diagonal: 11.18

R2:
Length: 72 Width: 13
Perimeter: 170
Area:936
R's diagonal: 73.16


Note that the type of the two objects ```R``` and ```R2``` are of type ```Rectangle``` and that they have have different identifiers. Instantiating objects automatically calls the initializer methods (```__init__```) passing the correct parameters to it. The dot ```.``` operator is used to access methods of the objects. Through the dot operator we can also access the fields of an object, even though this is normally not the best practice and implementing methods to ```get``` and ```set``` the values of fields are recommended.     


### The life-cycle of classes and objects in a program:

The usual life-cycle of classes and objects is the following:

1. Classes are defined with the specification of class attributes (fields) and methods;
2. Objects are instantiated based on the definition of the corresponding classes;
3. Objects interact with each other to implement the logic of the program and modify their state;
4. Objects are destroyed (explicitly with del) or implicitly when there are no more references to them.


## Encapsulation

When defining classes, it is possible to hide some of the details that must be kept private to the object itself and not accessed directly. This can be done by setting **methods** and **attributes** (fields) as **private** to the object (i.e. accessible only internally to the object itself).

Private attributes and methods can be defined using the ```__``` notation (i.e. the name of the attribute or method is **proceeded** by two underscores ```__```). 

**Example** 
Let's see what happens to the rectangle class with encapsulation.

In [4]:
%reset -f 

import math

class Rectangle:
    def __init__(self, l,w):
        self.__length = l
        self.__width = w
        
    def perimeter(self):
        return 2*(self.length + self.width)
    
    def area(self):
        return self.length * self.width
    
    def diagonal(self):
        return math.sqrt(self.length**2 + self.width**2)

R = Rectangle(10,6)
p = R.perimeter()
a = R.area()
d = R.diagonal()
print("\nR:\nLength: {} Width: {}\nPerimeter: {}\nArea:{}".format(R.length,
                                                                  R.width,
                                                                  p,
                                                                  a))

Since the attributes ```length``` and ```width``` are private to the class, it is not possible to get access from the outside. The code above will fail with the following error message:

![](img/pract12/attribute_error.png)

To work around this, a specific interface must be defined to get the values (these are normally called **getter** methods as they get and return the value).

**Example** 
Let's see what happens to the rectangle class with encapsulation and getter methods.

In [9]:
%reset -f 

import math

class Rectangle:
    def __init__(self, l,w):
        self.__length = l
        self.__width = w
        
    def getLength(self):
        return self.__length
    
    def getWidth(self):
        return self.__width
        
    def perimeter(self):
        return 2*(self.__length + self.__width)
    
    def area(self):
        return self.__length * self.__width
    
    def diagonal(self):
        return math.sqrt(self.__length**2 + self.__width**2)

R = Rectangle(10,6)
p = R.perimeter()
a = R.area()
d = R.diagonal()
print("\nR:\nLength: {} Width: {}\nPerimeter: {}\nArea:{}".format(R.getLength(),
                                                                  R.getWidth(),
                                                                  p,
                                                                  a))


R:
Length: 10 Width: 6
Perimeter: 32
Area:60


**Setter** methods can be used to change the values of attributes after initialization. 

**Example:**
Let's define a Person class with the following attributes: name, surname, telephone number and address. All attributes are private. The address and phone numbers might change, so we need a method to change them.   

In [21]:
%reset -f 

class Person:
    def __init__(self, name, surname, birthdate):
        self.__n = name
        self.__s = surname
        self.__dob = birthdate
        self.__a = "unknown"
        self.__t = "unknown"
    
    def setAddress(self, address):
        self.__a = address
        
    def setTelephone(self, telephone):
        self.__t = telephone
    
    def getName(self):
        return self.__n
    
    def getSurname(self):
        return self.__s
    
    def getDoB(self):
        return self.__dob
    
    def getAddress(self):
        return self.__a
    
    def getTel(self):
        return self.__t

Joe = Person("Joe", "Page", "20/5/1980")
Joe.setAddress("Somerset Rd.,Los Angeles, CA 90016")
print("{} {}\nDate of Birth: {}\nPhone: {}\nAddress: {}".format(Joe.getName(),
                                               Joe.getSurname(),
                                               Joe.getDoB(),
                                               Joe.getTel(),
                                               Joe.getAddress()
                                              ))
#Joe moves to Trento
Joe.setAddress("via Sommarive, Povo, Trento")
print("\nNew address: {}".format(Joe.getAddress()))

Joe Page
Date of Birth: 20/5/1980
Phone: unknown
Address: Somerset Rd.,Los Angeles, CA 90016

New address: via Sommarive, Povo, Trento


Note that the ```setAddress``` method is actually a **setter method** that is used to change the value of the attribute ```__a```. 

## Special methods

As seen in the lecture, it is possible to redefine some operators by redefining the corresponding **special methods** through a process called **overriding**. 

Some of these are reported below:

![](img/pract12/special_methods.png)

These are useful to define things like the sum of two objects (```__add__```), the equality (```__eq__```), which one is the smallest (```__lt__``` that is less than) or the way the object should be translated into a string (```__str__```), for example for printing purposes.

More information on these special methods can be found [here](https://docs.python.org/3/reference/datamodel.html?#special-method-names).


**Example**
A decimal number $T$ can be expressed in base $X$ (for simplicity we will consider $X \in [1,9]$) as: $T = aX^{N} + bX^{N-1}+...+ (n+1) X^{0}$. Such a number can be represented as two elements: $(X, (a,b,...,n+1))$ (the base and the tuple of all the values). 
Let's define a class MyNumber that can represent numbers in any base (from 1 to 9). The class has two attributes, the ```base``` (which is a number representing the base) and the ```values``` a tuple of numbers that defines what we have  and redefine some operators (i.e. __add__, __lt__, __sub__)...


In [114]:
class MyNumber:
    
    def __init__(self, base, values):
        self.base = base
        warn = False
        for v in values:
            if(v >= base):
                print("Error.Values must be lower than base")
                print("Can't create n. with base {} and values {}".format(base,values))
                warn = True
        if(not warn):
            self.values = values
        else:
            self.values = None
    
    def toDecimal(self):
        res = 0 
        L = len(self.values)
        for i in range(L):
                res += self.values[i] * self.base**(L-1 - i) 
        return res

    
    def __str__(self):
        return "Base: {}\nValues:{}".format(self.base, self.values)
    
    def __add__(self, other):
        return self.toDecimal() + other.toDecimal()
    
    def __lt__(self, other):
        return self.toDecimal() < other.toDecimal()
    
    def toDecimalString(self):
        L = len(self.values)
        res = str(self.values[0]) + "*" +str(self.base ** (L-1))
        for i in range(1,L):
                res += " + " + str(self.values[i]) + "*" + str(self.base**(L-1 - i))
        return res    

mn = MyNumber(10,(1,2,3))
print(mn)
print("{} = {}".format(mn.toDecimal(), mn.toDecimalString()))
mn2 = MyNumber(4, (1,2,3))
print("\n{}".format(mn2))
print("{} = {}".format(mn2.toDecimal(), mn2.toDecimalString()))
mn3 = mn + mn2
print("\nmn+mn2:{}".format(mn3))
print("\n")
mn4 = MyNumber(3,(7,1,1))
print("\n")
print("{} < {}? {}".format(mn.toDecimal(),mn2.toDecimal(),mn < mn2))
print("{} == {}? {}".format(mn.toDecimal(),mn2.toDecimal(),mn == mn2))
print("{} > {}? {}".format(mn.toDecimal(),mn2.toDecimal(),mn > mn2))


Base: 10
Values:(1, 2, 3)
123 = 1*100 + 2*10 + 3*1

Base: 4
Values:(1, 2, 3)
27 = 1*16 + 2*4 + 3*1

mn+mn2:150


Error.Values must be lower than base
Can't create n. with base 3 and values (7, 1, 1)


123 < 27? False
123 == 27? False
123 > 27? True


## Inheritance and overriding




## Lambda functions 

**Example:** Let's align the first 100 bases of the first entry of the file [contigs82.fasta](file_samples/contigs82.fasta) to the Malus Domestica genome.

**NOTE: this can take several minutes.**

In [None]:
from Bio.PDB import *

pdbl = PDBList()
structures = ["3C2K", "3C2L"]
el = pdbl.download_pdb_files(structures, 
                             file_format = "mmCif", 
                             pdir = "file_samples/")


## Exercises


1. Create a Sequence class that can contain DNA sequences assuming "A,G,C,T" as the only characters allowed. Implement a ```complement``` method that complements the sequence, a ```computeGC``` method that returns the GC content (i.e. number of G+C/total length of sequence), redefine the "+" operator (```__add__```) to concatenate the sequence with another sequence in input and the ```__str__``` so that given a DNA sequence "ACTCG" will print it as:
```
5'-ACTCG-3'
3'-TGAGC-5'
```

<div class="tggle" onclick="toggleVisibility('ex1');">Show/Hide Solution</div>
<div id="ex1" style="display:none;">

In [68]:
%reset -f

class Sequence:
    def __init__(self, s):
        self.sequence = s
        
    def __add__(self, other): 
        return Sequence(self.sequence + other.sequence)
    
    def complement(self):
        out = ""
        for el in self.sequence:
            if el == "A":
                out += "T" 
            elif el == "T":
                out += "A" 
            elif el == "C":
                out += "G" 
            else:
                out += "C"
        return out
    
    def computeGC(self):
        gc = self.sequence.count("G") + self.sequence.count("C") 
        return gc/len(self.sequence)
    
    def __str__(self):
        s 
        return "5\'-" + self.sequence + "-3\'\n3\'-" + self.complement() + "-5\'"

    
s = Sequence("TACATGCC")
s1 = Sequence("ACTCG")
print("S (gc:{}):\n{}".format(s.computeGC(),s))
print("\nS1 (gc:{}):\n{}".format(s1.computeGC(),s1))
sm = s + s1
print("\nS+S1 (gc:{}):\n{}".format(sm.computeGC(),sm))

S (gc:0.5):
5'-TACATGCC-3'
3'-ATGTACGG-5'

S1 (gc:0.6):
5'-ACTCG-3'
3'-TGAGC-5'

S+S1 (gc:0.5384615384615384):
5'-TACATGCCACTCG-3'
3'-ATGTACGGTGAGC-5'


</div>

2. Write a Person class with the following attributes: name, surname and mailbox. A mailbox is a list that contains string messages sent to the person. Each message entry should be a tuple ```(name, surname, message)``` where ```name``` and ```surname``` are the name and surname of the sender of the message, while ```message``` is a string with the text of the message.  Implement the following methods:

a. ```getName``` :  gets the name of the Person;

b. ```getSurname``` :  gets the surname of the Person;

c. ```sendMessage``` :  that has a ```Person``` and the string with the message in input and sends the message to the specified Person;

d. ```checkMailbox``` :  returns the number of messages in the mailbox;

e. ```readMessages``` :  returns all the messages in the mailbox as a list;

f. ```checkMessagesFrom``` :  checks and prints any messages coming from a specific Person (in input);

e. ```clearMailbox``` :  clears the mailbox (does not return anything);

Test the class with the following conversation:
```
luca = Person("Luca", "Bianco")
alberto = Person("Alberto", "Montresor")
david = Person("David", "Leoni")
stranger = ""

luca.sendMessage(alberto, "Hi Alberto, hope things are fine.")
alberto.sendMessage(luca, "I am fine, thanks. Yourself?")
luca.sendMessage(alberto, "Great. Cheers. How about David?")
alberto.sendMessage(david, "You OK?")
david.sendMessage(alberto, "Yep. Thanks")
alberto.sendMessage(luca, "All OK")
luca.sendMessage(stranger, "Who are you?")
```
and check the mailbox of all the Persons in ```[luca,alberto,david]```.

<div class="tggle" onclick="toggleVisibility('ex2');">Show/Hide Solution</div>
<div id="ex2" style="display:none;">

In [37]:
%reset -f

class Person:
    def __init__(self, name, surname):
        self.__n = name
        self.__s = surname
        self.mailbox = []
    
    
    def getName(self):
        return self.__n
    
    def getSurname(self):
        return self.__s
    
    def sendMessage(self, otherPerson, msg):
        if(type(otherPerson) == Person):
            otherPerson.mailbox.append((self.__n, self.__s, msg))
        else:
            print("Unable to send message. Person not found")
    
    def checkMailbox(self):
        if(len(self.mailbox) == 0 ):
            return 0
        else:
            return len(self.mailbox)
        
    def readMessages(self):
        return self.mailbox
    
    def checkMessagesFrom(self, otherPerson):
        if(type(otherPerson) == Person):
            msgs = [x for x in self.mailbox 
                        if x[0] == otherPerson.getName() 
                        and x[1] == otherPerson.getSurname()
                   ]
            return msgs
        else:
            return []
    
    def clearMailbox(self):
        self.mailbox.clear()
        
    def __str__(self):
        return str(self.__n) + " " + str(self.__s)
    

luca = Person("Luca", "Bianco")
alberto = Person("Alberto", "Montresor")
david = Person("David", "Leoni")
stranger = ""

luca.sendMessage(alberto, "Hi Alberto, hope things are fine.")
alberto.sendMessage(luca, "I am fine, thanks. Yourself?")
luca.sendMessage(alberto, "Great. Cheers. How about David?")
alberto.sendMessage(david, "You OK?")
david.sendMessage(alberto, "Yep. Thanks")
alberto.sendMessage(luca, "All OK")
luca.sendMessage(stranger, "Who are you?")

for p in [luca, alberto,david]:
    print("\n{} has {} new message(s):".format(p,p.checkMailbox()))
    for m in p.readMessages():
        print(" - from {} {}: {}".format(m[0],
                                        m[1],
                                        m[2]))

for p in [luca, alberto, david]:
    for p1 in [luca, alberto, david]:
        if(p != p1):
            msgs = p.checkMessagesFrom(p1)
            if(len(msgs) > 0):
                print("\n")
                for m in msgs:
                    print("{} --> {}: {}".format(p1,p,m[2]))
            else:
                print("\n{} does not have messages from {}".format(p, p1))

luca.clearMailbox()
print("\n{} has {} new message(s).".format(luca,luca.checkMailbox()))

Unable to send message. Person not found

Luca Bianco has 2 new message(s):
 - from Alberto Montresor: I am fine, thanks. Yourself?
 - from Alberto Montresor: All OK

Alberto Montresor has 3 new message(s):
 - from Luca Bianco: Hi Alberto, hope things are fine.
 - from Luca Bianco: Great. Cheers. How about David?
 - from David Leoni: Yep. Thanks

David Leoni has 1 new message(s):
 - from Alberto Montresor: You OK?


Alberto Montresor --> Luca Bianco: I am fine, thanks. Yourself?
Alberto Montresor --> Luca Bianco: All OK

Luca Bianco does not have messages from David Leoni


Luca Bianco --> Alberto Montresor: Hi Alberto, hope things are fine.
Luca Bianco --> Alberto Montresor: Great. Cheers. How about David?


David Leoni --> Alberto Montresor: Yep. Thanks

David Leoni does not have messages from Luca Bianco


Alberto Montresor --> David Leoni: You OK?

Luca Bianco has 0 new message(s).


</div>

3. Create a SNP class...

```
Alignments of MDC020656.85
	MDC020656.85: 1939-2593
	gi|125995253|dbj|AB270792.1|: 201263-201917
	Score:820.917 AlignLen:579 Id/Len:0.8812785388127854
	MDC020656.85: 1446-1935
	gi|125995253|dbj|AB270792.1|: 306490-306017
	Score:582.873 AlignLen:428 Id/Len:0.8629032258064516
    ....
    ....
```

that is reporting the HSP with query start-end position, subject start-end position, score, alignment length and number of identities / alignment length. 

<div class="tggle" onclick="toggleVisibility('ex0');">Show/Hide Solution</div>
<div id="ex0" style="display:none;">

In [None]:
%reset -f

from Bio.Blast import NCBIXML


def filterHSPs(align, minBitscore = 0, minAlignLen = 0, minPercIdent = 0.1):
    ret = []

    for h in align.hsps:
            b = h.bits
            i = h.identities 
            al = h.align_length
            toOut = ((b > minBitscore) and 
                    (al > minAlignLen) and
                    (i/al > minPercIdent))

            
            
            if(toOut):
                qs = h.query_start
                ss = h.sbjct_start
                qe = h.query_end
                se = h.sbjct_end
                ret.append([qs,qe, ss,se, b, i, al])
             
    return ret        

result_handle = open("file_samples/blast_res_apple.xml")


for res in NCBIXML.parse(result_handle):
    print("Alignments of {}".format(res.query))
    for align in res.alignments:
        filtered = filterHSPs(align, 300, 50, 0.8)
        if(len(filtered) > 0):
            for h in filtered:
                title = align.title.split( )[0]
                print("\t{}: {}-{}".format(res.query,h[0],h[1]))
                print("\t{}: {}-{}".format(title,h[2],h[3]))
                print("\tScore:{} AlignLen:{} Id/Len:{}".format(
                                                                h[4],
                                                                h[5],
                                                                h[5]/h[6]
                                                               ))
            
    

            

result_handle.close()

</div>

</div>

4. Write a python function that aligns the sequences  in the file created in exercise 3. ([here](file_samples/starch_sequences.fasta) you can find mine) against the NCBI nr database limiting the hits to the Malus Domestica organism (parameter entrez_query='"Malus Domestica" [Organism]' in qblast)and prints to screen the following info for each hsp: 
    1. The title;
    2. Score and e-value;
    3. The number of alignments on the same subject, the number of identities and positives and the alignment length;
    4. The number of mismatches and the list of their positions (hint: you can use the match string and look for " ").
   

    
<div class="tggle" onclick="toggleVisibility('ex2');">Show/Hide Solution</div>
<div id="ex2" style="display:none;">

In [None]:
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

fasta_string = open("file_samples/starch_sequences.fasta").read()

res_handle = NCBIWWW.qblast("blastn", "nt", fasta_string, 
                            entrez_query='"Malus Domestica" [Organism]'
                           )

for align in NCBIXML.parse(res_handle):
    
    for a in align.alignments:
            print("Align Title:{}".format(a.title))
            
            for h in a.hsps:
                s = h.score
                e = h.expect
                n = h.num_alignments
                i = h.identities
                p = h.positives
                m = h.match 
                al = h.align_length
                misM = [str(x) for x in range(len(m)) if m[x] == " "]
                print("Score: {} E-val: {}".format(s,e))
                print("N.aligns:{} Ident:{} Pos.:{} Align len:{}".format(
                    n,i,p,al))
                if(len(misM)):
                    print("Num mismatches:",len(misM))
                    print("Mismatch pos:", ",".join(misM))
                else:
                    print("No mismatches")
                print("")
            

res_handle.close()



</div>

5. Write a python function ```getPublicationInfo(title_term,other_term)``` that retrieves the first 20 pubmed publications having the ```title_term``` in the title and ```other_term``` somewhere else in the text (hint use: "Title" and "[Other Term]" as esearch parameter term). For each publication print: 
    1. the title
    2. authors 
    3. journal 
    4. year of publication (hint: get and split properly the "PubDate" entry)
    5. a link to the pubmed entry (hint: it is the string "https://www.ncbi.nlm.nih.gov/pubmed/" followed by the pubmed id ("eid" entry of the dictionary "ArticleIds"). es: https://www.ncbi.nlm.nih.gov/pubmed/26919684

Hint: to see how to combine search terms test them here: [https://www.ncbi.nlm.nih.gov/pubmed/advanced](https://www.ncbi.nlm.nih.gov/pubmed/advanced).

Test your code calling ```getPublicationInfo("apple","drought")```

<div class="tggle" onclick="toggleVisibility('ex3');">Show/Hide Solution</div>
<div id="ex3" style="display:none;">

In [None]:

from Bio import Entrez

def getPublicationInfo(title_term,other_term): 
    Entrez.email = "my_email"
    s_term = title_term + " [Title] AND " + other_term + " [Other Term]"
    handle = Entrez.esearch(db="pubmed", term=s_term)
    res = Entrez.read(handle)
#uncomment to see all info
#    for el in res.keys():
#        print(el , " : ", res[el])
#
#    print("")
    for ids in res["IdList"]:    
        handle = Entrez.esummary(db="pubmed",  id = ids)
        res = Entrez.read(handle)
        #uncomment to see all info
        #print(res)
        for r in res:
            print(r["Title"])
            print(",".join(r["AuthorList"]))
            print(r["Source"])
            print(r["PubDate"].split()[0])
            print("https://www.ncbi.nlm.nih.gov/pubmed/" + r["ArticleIds"]["eid"])
            print("")
            
getPublicationInfo("apple","drought")

</div>

6. Write some python code to retrieve the structure of two forms of the aspartate transcarbamoylase (PDB ids: 4FYW and 1D09). If you are interested, read more about the Aspartate Transcarbamoylase [here](http://pdb101.rcsb.org/motm/215). Write a function that gets the .cif file name and prints:

    1. the number of chains, residues and atoms present in the file;
    2. a histogram of the residues (plotting it with matplotlib) that are not water (encoded as "HOH");
    3. a link to an online tool to visualize the 3D structure. The link will be "http://www.rcsb.org/pdb/ngl/ngl.do?pdbid=" followed by the PDB id of the protein (e.g. 1d09).

<div class="tggle" onclick="toggleVisibility('ex4');">Show/Hide Solution</div>
<div id="ex4" style="display:none;">

In [None]:
from Bio.PDB import *
import matplotlib.pyplot as plt

def printCifInfo(filename):
    
    parser = MMCIFParser(QUIET=True) #To disable warnings
    id = filename.split("/")[1].split(".")[0]

    structure = parser.get_structure(id, filename)
    chains = structure.get_chains()
    residues = structure.get_residues()
    
    atoms = structure.get_atoms()
    res_histo = {}
    resCnt = 0 #need this because while reading the residues 
               #I am pulling stuff out of the iterator
    for res in residues:
        rname = res.get_resname()
        if(rname != "HOH"):
            if( rname not in res_histo):
                res_histo[rname] = 1
            else:
                res_histo[rname] += 1
        resCnt += 1    
    plt.figure(figsize=(15,5))
    plt.bar(res_histo.keys(), res_histo.values())
    plt.show()
    print("Number of chains: {}".format(len(list(chains))))
    print("Number of residues: {}".format(resCnt))
    print("Number of atoms: {}".format(len(list(atoms))))
    print("http://www.rcsb.org/pdb/ngl/ngl.do?pdbid=" + id) 

pdbl = PDBList()
structures = ["1D09", "4FYW"]
el = pdbl.download_pdb_files(structures, file_format = "mmCif", pdir = "file_samples/")

printCifInfo("file_samples/1d09.cif")
printCifInfo("file_samples/4fyw.cif")

In [None]:
</div>