# DNA Motif analysis using motif module

This notebook provides a basic implementation and analysis of DNA motifs using Python.

### Objectives:
- Build a **count matrix** to track nucleotide occurrences at each motif position.
- Generate a **profile matrix** to compute nucleotide frequencies.
- Determine the **consensus sequence**, which represents the most common nucleotides across motifs.

### Contents:
- Loading motif data
- Computing the count matrix
- Calculating the profile matrix
- Extracting the consensus motif
- (Optional) Visualizing the matrices for better interpretation

This notebook is intended as a foundation for motif discovery tasks in bioinformatics and can be extended to include scoring functions, pseudocounts, motif search in sequences, and more.

Count, Profile, Consensus function can be imported from motif.py file in the bin folder.


In [1]:
import bin.motif as mt

In [2]:
# create list of DNA motifs
Motifs = [
    "TCGGGGGTTTTT",
    "CCGGTGACTTAC",
    "ACGGGGATTTTC",
    "TTGGGGACTTTT",
    "AAGGGGACTTCC",
    "TTGGGGACTTCC",
    "TCGGGGATTCAT",
    "TCGGGGATTCCT",
    "TAGGGGAACTAC",
    "TCGGGTATAACC"
]

In [4]:
motif_count = mt.Count(Motifs)

In [5]:
motif_count

{'A': [2, 2, 0, 0, 0, 0, 9, 1, 1, 1, 3, 0],
 'C': [1, 6, 0, 0, 0, 0, 0, 4, 1, 2, 4, 6],
 'G': [0, 0, 10, 10, 9, 9, 1, 0, 0, 0, 0, 0],
 'T': [7, 2, 0, 0, 1, 1, 0, 5, 8, 7, 3, 4]}

#### Visulaization of count matix
for better visulaization of count matrix we can use pandas module to convert count matrix dictionary into data fram

In [8]:
import pandas as pd

In [9]:
motif_df = pd.DataFrame(motif_count).T

In [10]:
motif_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
A,2,2,0,0,0,0,9,1,1,1,3,0
C,1,6,0,0,0,0,0,4,1,2,4,6
G,0,0,10,10,9,9,1,0,0,0,0,0
T,7,2,0,0,1,1,0,5,8,7,3,4


In [7]:
mt.Profile(Motifs)

{'A': [0.2, 0.2, 0.0, 0.0, 0.0, 0.0, 0.9, 0.1, 0.1, 0.1, 0.3, 0.0],
 'C': [0.1, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.1, 0.2, 0.4, 0.6],
 'G': [0.0, 0.0, 1.0, 1.0, 0.9, 0.9, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0],
 'T': [0.7, 0.2, 0.0, 0.0, 0.1, 0.1, 0.0, 0.5, 0.8, 0.7, 0.3, 0.4]}

In [8]:
mt.Consensus(Motifs)

'TCGGGGATTTCC'

In [9]:
mt.Score(Motifs)

30

In [10]:
mt.Entropy(Motifs)

9.916290005356972