# CONSENSUS MOTIF

we can form a consensus string, denoted Consensus(Motifs), from the most popular nucleotides in each column of the motif matrix (ties are broken arbitrarily). If we select Motifs correctly from the collection of upstream regions, then Consensus(Motifs) provides a candidate regulatory motif for these regions. For example, as shown below, the consensus string for the NF-ÎºB binding sites is "TCGGGGATTTCC".

We can implement Consensus(Motifs) using Count(Motifs) as a subroutine. To do so, note that the j-th symbol of this consensus string is equal to the symbol corresponding to a maximum element in column j of Count(Motifs).

To implement this idea in Python, first set k equal to the length of Motifs[0] (as we did before) and count equal to the count matrix of Motifs.


```python
k = len(Motifs[0])
    count = Count(Motifs)
```

Then, initialize an empty consensus string, and range through each column of the count matrix, adding the maximum element from column j at step j.

```python
consensus = ""
    for j in range(k):
        m = 0
        frequentSymbol = ""
        for symbol in "ACGT":
            if count[symbol][j] > m:
                m = count[symbol][j]
                frequentSymbol = symbol
        consensus += frequentSymbol
```

- Sample Input:
```python
AACGTA
CCCGTT
CACCTT
GGATTA
TTCCGG
```
- Sample Output:

CACCTA

In [2]:
# Insert your Count(Motifs) function here.
def Count(Motifs):
    count = {}
    k = len(Motifs[0])
    for symbol in "ATGC":
        count[symbol] = []
        for j in range(k):
            count[symbol].append(0)
    t = len(Motifs)
    for i in range(t):
        for j in range(k):
            symbol = Motifs[i][j]
            count[symbol][j]+= 1
    return count
# Input:  A set of kmers Motifs
# Output: A consensus string of Motifs.
def Consensus(Motifs):
    k = len(Motifs[0])
    count = Count(Motifs)
    consensus = ""
    for j in range(k):
        m = 0
        frequentSymbol = ""
        for symbol in "ACGT":
            if count[symbol][j] > m:
                m = count[symbol][j]
                frequentSymbol = symbol
        consensus += frequentSymbol
    return consensus

Consensus(["AACGTA","CCCGTT","CACCTT","GGATTA","TTCCGG"])

'CACCTA'

### In simpler terms, it's the nucleotide that appears most frequently at each position in the motif matrix.

In [1]:
def count(motif):
    k = len(motif[0])
    count = {}
    for symbol in "ATGC":
        count[symbol] = []
        for j in range(k):
            count[symbol].append(0)
    t = len(motif)
    for i in range(t):
        for j in range(k):
            symbol = motif[i][j]
            count[symbol][j] += 1
    return count


In [None]:
def consensus(motif):
    k = len(motif)
    count = count(motif)
    consensus = ""
    for j in range(k):
        m = 0
        frequentsymbol = ""
        for symbol in "ATGC":
            if count[symbol][j] > m:
                m = count[symbol][j]
                frequentsymbol = symbol
        consensus = consensus + frequentsymbol
    return consensus

In [None]:
def score(motif):
    k = len(motif[0])
    t = len(motif)
    score = 0
    consensus = Consensus(motif)
    for i in range(t):
        for j in range(k):
            if motif[i][j] != consensus[j] :
                score += 1
    return score
def consensus(motif):
    k = len(motif[0])
    consensus = ""
    count = count(motif)
    for j in range(k):
        m = 0 
        frequentsymbol = ""
        for symbol in "ATGC":
            if count[symbol][j] > m :
                m = count[symbol]
                frequentsymbol = symbol
        consensus = consensus + frequentsymbol
    return consensus
def count(motif):
    k = len(motif[0])
    count={}
    for symbol in "ATGC":
        count[symbol] = []
        for j in range(k):
            count[symbol][j].append(0)
    t = len(motif)
    for i in range(t):
        for j in range(k):
            symbol = motif[i][j]
            count[symbol][j] += 1
    return count

In [2]:
def pr(genome,profile):
    n = len(genome)
    p = 1
    for i in range(n):
        p = p * profile[genome[i]][i]
    return p