# **Exercise 1: Counting DNA Nucleotides**

**Link: http://rosalind.info/problems/dna/**

**Intro:**

Making up all living material, the cell is considered to be the building block of life. The nucleus, a component of most eukaryotic cells, was identified as the hub of cellular activity 150 years ago. Viewed under a light microscope, the nucleus appears only as a darker region of the cell, but as we increase magnification, we find that the nucleus is densely filled with a stew of macromolecules called chromatin. During mitosis (eukaryotic cell division), most of the chromatin condenses into long, thin strings called chromosomes. See Figure 1 for a figure of cells in different stages of mitosis.

One class of the macromolecules contained in chromatin are called nucleic acids. Early 20th century research into the chemical identity of nucleic acids culminated with the conclusion that nucleic acids are polymers, or repeating chains of smaller, similarly structured molecules known as monomers. Because of their tendency to be long and thin, nucleic acid polymers are commonly called strands.

The nucleic acid monomer is called a nucleotide and is used as a unit of strand length (abbreviated to nt). Each nucleotide is formed of three parts: a sugar molecule, a negatively charged ion called a phosphate, and a compound called a nucleobase ("base" for short). Polymerization is achieved as the sugar of one nucleotide bonds to the phosphate of the next nucleotide in the chain, which forms a sugar-phosphate backbone for the nucleic acid strand. A key point is that the nucleotides of a specific type of nucleic acid always contain the same sugar and phosphate molecules, and they differ only in their choice of base. Thus, one strand of a nucleic acid can be differentiated from another based solely on the order of its bases; this ordering of bases defines a nucleic acid's primary structure.

For example, Figure 2 shows a strand of deoxyribose nucleic acid (DNA), in which the sugar is called deoxyribose, and the only four choices for nucleobases are molecules called adenine (A), cytosine (C), guanine (G), and thymine (T).

For reasons we will soon see, DNA is found in all living organisms on Earth, including bacteria; it is even found in many viruses (which are often considered to be nonliving). Because of its importance, we reserve the term genome to refer to the sum total of the DNA contained in an organism's chromosomes.

**Problem:**

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."

**Given:** 
- A DNA string s of length at most 1000 nt.

**Return:** 
- Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in s.

**Sample Dataset:**

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

**Sample Output:**

20 12 17 21

## **Manual and using BioPython**

- Both of them (manual form and Biopython library use the same function - count() 

- **From the sample:**

**String Formatting**

```
%s - String (or any object with a string representation, like numbers)

%d - Integers

%f - Floating point numbers

%.<number of digits>f - Floating point numbers with a fixed amount of digits to the right of the dot.

%x/%X - Integers in hex representation (lowercase/uppercase)

```

In [None]:
# Creating the variable...
dna = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"

# Counting the number of "A", "C", "G" e "T" in the given seq...
a = dna.count("A")
c = dna.count("C")
g = dna.count("G")
t = dna.count("T")
print(a, c, g, t)
print(f"A = {a} C = {c} G = {g} T = {t}")
print("A = %d" % a, "C = %d" % c, "G = %d" % g, "T = %d" % t)

20 12 17 21
A = 20 C = 12 G = 17 T = 21
A = 20 C = 12 G = 17 T = 21


- **From the file:**

In [None]:
dna = "TGGTACGGCTTGGGCCATCGGGCATCACAGCGCCTCAATGTCCTGTTCGTTCCTGGAATAGATCCTCCCTAGTGGCCTGATATGCAGTAGTGAACTAGTTCTCACGTTGCGTTCGAAGATCTCGTTACAGTACGTCTTAGGGGGATCACGAAGCTGGTTAGCCCGATACGGTAGGTGGAGACTTACACACGTCTATACAGTCACTTCGTTCTAGGGACGCTCACGTATGCTGACCATGCAGACCGCGTGTCAGGCTGGTATCGACACGGTTGTGTGAGGTCGTACAGCACATCCAGAGAGTATTGTGACTAATAGCTAATGAGGTGGTGCTCCCCCTACACTTCTTTACGGTACTATCCTGCTTTATACGTAGGGCTACACTACTGTTAAGATATCAGTGCGCAGCTGATTGAATCCTCGATCAGGCTATACTTTCTGAGATCCCTATTGCCCCGCATGATGCGGTGCTGGACCTGTGTCCTCCTCCATCGAAGGGATTAACTTAGTGATCGTACGTGAAAAAAATTCCGACCAATTAATGCAAAACACGGTGTGTAAAAGCAAGAGTATAATCCTCGGGTTGTTAAGGTTTCCTACAATATACAGGCCTCTAAATAATGTCTGAGTGGCACTGAGTTCACAAGGAAGTGGGGAATCGAGGCCGCCAAGTGTCACGCATCGCCAAATGGCTACTAGGGCTCCTGCTGTATTACGGCAATTATATCTGGTCCATTGGAGGCACGGACACTATTCGACATCTCGGAACATAGCGAAGCGCAGCTACATGTGACACCCCGAAGTCAGAGAGATTTTGGTGAGTTGTGGTATTTGAGTCGTCCCTCCAAGGCAGGGTAAGAGGCAATCCAGATAGAAGTTACTTATTCGTTGCCACCCCCAGTATCGCGA"
dna

'TGGTACGGCTTGGGCCATCGGGCATCACAGCGCCTCAATGTCCTGTTCGTTCCTGGAATAGATCCTCCCTAGTGGCCTGATATGCAGTAGTGAACTAGTTCTCACGTTGCGTTCGAAGATCTCGTTACAGTACGTCTTAGGGGGATCACGAAGCTGGTTAGCCCGATACGGTAGGTGGAGACTTACACACGTCTATACAGTCACTTCGTTCTAGGGACGCTCACGTATGCTGACCATGCAGACCGCGTGTCAGGCTGGTATCGACACGGTTGTGTGAGGTCGTACAGCACATCCAGAGAGTATTGTGACTAATAGCTAATGAGGTGGTGCTCCCCCTACACTTCTTTACGGTACTATCCTGCTTTATACGTAGGGCTACACTACTGTTAAGATATCAGTGCGCAGCTGATTGAATCCTCGATCAGGCTATACTTTCTGAGATCCCTATTGCCCCGCATGATGCGGTGCTGGACCTGTGTCCTCCTCCATCGAAGGGATTAACTTAGTGATCGTACGTGAAAAAAATTCCGACCAATTAATGCAAAACACGGTGTGTAAAAGCAAGAGTATAATCCTCGGGTTGTTAAGGTTTCCTACAATATACAGGCCTCTAAATAATGTCTGAGTGGCACTGAGTTCACAAGGAAGTGGGGAATCGAGGCCGCCAAGTGTCACGCATCGCCAAATGGCTACTAGGGCTCCTGCTGTATTACGGCAATTATATCTGGTCCATTGGAGGCACGGACACTATTCGACATCTCGGAACATAGCGAAGCGCAGCTACATGTGACACCCCGAAGTCAGAGAGATTTTGGTGAGTTGTGGTATTTGAGTCGTCCCTCCAAGGCAGGGTAAGAGGCAATCCAGATAGAAGTTACTTATTCGTTGCCACCCCCAGTATCGCGA'

In [None]:
# Counting the number of "A", "C", "G" e "T" in the given seq...
a = dna.count("A")
c = dna.count("C")
g = dna.count("G")
t = dna.count("T")
print(a, c, g, t)

224 215 231 236


# **Exercise 2: Transcribing DNA into RNA**

**Link: http://rosalind.info/problems/rna/**

**Intro:**

In “Counting DNA Nucleotides”, we described the primary structure of a nucleic acid as a polymer of nucleotide units, and we mentioned that the omnipresent nucleic acid DNA is composed of a varied sequence of four bases.

Yet a second nucleic acid exists alongside DNA in the chromatin; this molecule, which possesses a different sugar called ribose, came to be known as ribose nucleic acid, or RNA. RNA differs further from DNA in that it contains a base called uracil in place of thymine; structural differences between DNA and RNA are shown in Figure 1. Biologists initially believed that RNA was only contained in plant cells, whereas DNA was restricted to animal cells. However, this hypothesis dissipated as improved chemical methods discovered both nucleic acids in the cells of all life forms on Earth.

The primary structure of DNA and RNA is so similar because the former serves as a blueprint for the creation of a special kind of RNA molecule called messenger RNA, or mRNA. mRNA is created during RNA transcription, during which a strand of DNA is used as a template for constructing a strand of RNA by copying nucleotides one at a time, where uracil is used in place of thymine.

In eukaryotes, DNA remains in the nucleus, while RNA can enter the far reaches of the cell to carry out DNA's instructions. In future problems, we will examine the process and ramifications of RNA transcription in more detail.

**Problem:**

An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'.

Given a DNA string t corresponding to a coding strand, its transcribed RNA string u is formed by replacing all occurrences of 'T' in t with 'U' in u.

**Given:** 
- A DNA string t having length at most 1000 nt.

**Return:** 
- The transcribed RNA string of t.

**Sample Dataset:**

GATGGAACTTGACTACGTAAATT

**Sample Output:**

GAUGGAACUUGACUACGUAAAUU


## **Manual**

**From the sample:**

In [None]:
import re

dna = "GATGGAACTTGACTACGTAAATT"

while True:
  if "a" in dna or "A" in dna or "t" in dna or "T" in dna or "g" in dna or "G" in dna or "c" in dna or "C" in dna and dna.isalpha():
    rna1 = re.sub("[^atcgATCG]", "", dna)
    rna2 = rna1.replace("T", "U").replace("t", "u")
    print(f"The RNA sequence for the given DNA is: {rna2.upper()}")
    break
  else:
    print("There is something wrong in your input")
    dna = str(input("Please, try again... enter a true DNA sequence: "))
    continue

The RNA sequence for the given DNA is: GAUGGAACUUGACUACGUAAAUU


- **From the file:**

In [None]:
dna = "AATTGGCACTGGCGTCCATGCGGCCCCTTTCAACGGATAGCCGTCCTATCGCAAATGTTTCATTGAGTCCTCCATTCTCCGCGGGCTGCGCCCCTGGGCATCTCCCTTACCCTACGTGGCGTAGAAAGCTGTAAAGGACCAGCGGTGATTTCAATAAAAGGGCCCGGGTGTAGCAAAGCAATACCCGTCTCGTACGCTATCAAATCATTGTAATTTGATATTTCATCGGGACCCGAAGGGCTCACGACACAAACTGCCGCTGCCACAATTGTGGTGGCGACCTCCGCCCCGTCTTCAGACACCAGTTATGCTAGTCGAAAGTACCGCCAGGTATTGCGCCATAATGCGCCAGTAGTTCCATTTAAGTGATACGAGCTTACAACTAATCCCAAGCACGAGTGCATCTGTCATCCATACTAAGCTACTCCGTCAAGGGTGCTGACAACGTTGTTGATTACAGGTGTCCTAGGAACGGCCTATGGCCGGGAGGCCTACTCAGGGGAGACCTTATCAGTGACGCAATGCGAACAAGTTGTAAATACTGTCTTTCATCTCTCACAATCCCAAGAGTAAAATTCCGCTGAGCATGGTTCCATTTGACGCTAATCGGACAAGAACTGCCGTACTTCCACCACGTTTGTCTTGTGAGACTTTCGTCTCTGCAACGACATGACAGCTTCCACGGTGCATATGAACAAATATCGTTCAACGCTGACCCATCGCCAGATAGCCTCCTACCTATGCGCATGGAAGGAACTGCTCCTTCAACCGCCTACAGATATATACGACGTGCCCATCGCTGAGAGCAGTCCGCGACCACTACAATACCCCGTTCCGATTATTAACAATTAGATGTGTTGTCAGGTTCGCCCGTCCTAACGTCTGCGTTACCCCTCTTTTCAGGCCAGTATCCAGGGTGGTATGAAACCAGATGTATTATTCCTCTGGGATTAACGGGTAACGG"
dna

'AATTGGCACTGGCGTCCATGCGGCCCCTTTCAACGGATAGCCGTCCTATCGCAAATGTTTCATTGAGTCCTCCATTCTCCGCGGGCTGCGCCCCTGGGCATCTCCCTTACCCTACGTGGCGTAGAAAGCTGTAAAGGACCAGCGGTGATTTCAATAAAAGGGCCCGGGTGTAGCAAAGCAATACCCGTCTCGTACGCTATCAAATCATTGTAATTTGATATTTCATCGGGACCCGAAGGGCTCACGACACAAACTGCCGCTGCCACAATTGTGGTGGCGACCTCCGCCCCGTCTTCAGACACCAGTTATGCTAGTCGAAAGTACCGCCAGGTATTGCGCCATAATGCGCCAGTAGTTCCATTTAAGTGATACGAGCTTACAACTAATCCCAAGCACGAGTGCATCTGTCATCCATACTAAGCTACTCCGTCAAGGGTGCTGACAACGTTGTTGATTACAGGTGTCCTAGGAACGGCCTATGGCCGGGAGGCCTACTCAGGGGAGACCTTATCAGTGACGCAATGCGAACAAGTTGTAAATACTGTCTTTCATCTCTCACAATCCCAAGAGTAAAATTCCGCTGAGCATGGTTCCATTTGACGCTAATCGGACAAGAACTGCCGTACTTCCACCACGTTTGTCTTGTGAGACTTTCGTCTCTGCAACGACATGACAGCTTCCACGGTGCATATGAACAAATATCGTTCAACGCTGACCCATCGCCAGATAGCCTCCTACCTATGCGCATGGAAGGAACTGCTCCTTCAACCGCCTACAGATATATACGACGTGCCCATCGCTGAGAGCAGTCCGCGACCACTACAATACCCCGTTCCGATTATTAACAATTAGATGTGTTGTCAGGTTCGCCCGTCCTAACGTCTGCGTTACCCCTCTTTTCAGGCCAGTATCCAGGGTGGTATGAAACCAGATGTATTATTCCTCTGGGATTAACGGGTAACGG'

In [None]:
import re

while True:
  if "a" in dna or "A" in dna or "t" in dna or "T" in dna or "g" in dna or "G" in dna or "c" in dna or "C" in dna and dna.isalpha():
    rna1 = re.sub("[^atcgATCG]", "", dna)
    rna2 = rna1.replace("T", "U").replace("t", "u")
    print(rna2.upper())
    break
  else:
    print("There is something wrong in your input")
    dna = str(input("Please, try again... enter a true DNA sequence: "))
    continue

AAUUGGCACUGGCGUCCAUGCGGCCCCUUUCAACGGAUAGCCGUCCUAUCGCAAAUGUUUCAUUGAGUCCUCCAUUCUCCGCGGGCUGCGCCCCUGGGCAUCUCCCUUACCCUACGUGGCGUAGAAAGCUGUAAAGGACCAGCGGUGAUUUCAAUAAAAGGGCCCGGGUGUAGCAAAGCAAUACCCGUCUCGUACGCUAUCAAAUCAUUGUAAUUUGAUAUUUCAUCGGGACCCGAAGGGCUCACGACACAAACUGCCGCUGCCACAAUUGUGGUGGCGACCUCCGCCCCGUCUUCAGACACCAGUUAUGCUAGUCGAAAGUACCGCCAGGUAUUGCGCCAUAAUGCGCCAGUAGUUCCAUUUAAGUGAUACGAGCUUACAACUAAUCCCAAGCACGAGUGCAUCUGUCAUCCAUACUAAGCUACUCCGUCAAGGGUGCUGACAACGUUGUUGAUUACAGGUGUCCUAGGAACGGCCUAUGGCCGGGAGGCCUACUCAGGGGAGACCUUAUCAGUGACGCAAUGCGAACAAGUUGUAAAUACUGUCUUUCAUCUCUCACAAUCCCAAGAGUAAAAUUCCGCUGAGCAUGGUUCCAUUUGACGCUAAUCGGACAAGAACUGCCGUACUUCCACCACGUUUGUCUUGUGAGACUUUCGUCUCUGCAACGACAUGACAGCUUCCACGGUGCAUAUGAACAAAUAUCGUUCAACGCUGACCCAUCGCCAGAUAGCCUCCUACCUAUGCGCAUGGAAGGAACUGCUCCUUCAACCGCCUACAGAUAUAUACGACGUGCCCAUCGCUGAGAGCAGUCCGCGACCACUACAAUACCCCGUUCCGAUUAUUAACAAUUAGAUGUGUUGUCAGGUUCGCCCGUCCUAACGUCUGCGUUACCCCUCUUUUCAGGCCAGUAUCCAGGGUGGUAUGAAACCAGAUGUAUUAUUCCUCUGGGAUUAACGGGUAACGG


## **Using Biopython**

In [None]:
# Instalando as instâncias necessárias...
!pip install biopython
from Bio.Seq import Seq



- **From the sample**

In [None]:
# Criando a variável...
dna = Seq("GATGGAACTTGACTACGTAAATT")
dna

Seq('GATGGAACTTGACTACGTAAATT')

In [None]:
# Fazendo a transcrição num único passo...
dna = template_dna.transcribe()
print(rna)

GAUGGAACUUGACUACGUAAAUU


- **From the file**

In [None]:
# Criando a variável...
dna = Seq("AATTGGCACTGGCGTCCATGCGGCCCCTTTCAACGGATAGCCGTCCTATCGCAAATGTTTCATTGAGTCCTCCATTCTCCGCGGGCTGCGCCCCTGGGCATCTCCCTTACCCTACGTGGCGTAGAAAGCTGTAAAGGACCAGCGGTGATTTCAATAAAAGGGCCCGGGTGTAGCAAAGCAATACCCGTCTCGTACGCTATCAAATCATTGTAATTTGATATTTCATCGGGACCCGAAGGGCTCACGACACAAACTGCCGCTGCCACAATTGTGGTGGCGACCTCCGCCCCGTCTTCAGACACCAGTTATGCTAGTCGAAAGTACCGCCAGGTATTGCGCCATAATGCGCCAGTAGTTCCATTTAAGTGATACGAGCTTACAACTAATCCCAAGCACGAGTGCATCTGTCATCCATACTAAGCTACTCCGTCAAGGGTGCTGACAACGTTGTTGATTACAGGTGTCCTAGGAACGGCCTATGGCCGGGAGGCCTACTCAGGGGAGACCTTATCAGTGACGCAATGCGAACAAGTTGTAAATACTGTCTTTCATCTCTCACAATCCCAAGAGTAAAATTCCGCTGAGCATGGTTCCATTTGACGCTAATCGGACAAGAACTGCCGTACTTCCACCACGTTTGTCTTGTGAGACTTTCGTCTCTGCAACGACATGACAGCTTCCACGGTGCATATGAACAAATATCGTTCAACGCTGACCCATCGCCAGATAGCCTCCTACCTATGCGCATGGAAGGAACTGCTCCTTCAACCGCCTACAGATATATACGACGTGCCCATCGCTGAGAGCAGTCCGCGACCACTACAATACCCCGTTCCGATTATTAACAATTAGATGTGTTGTCAGGTTCGCCCGTCCTAACGTCTGCGTTACCCCTCTTTTCAGGCCAGTATCCAGGGTGGTATGAAACCAGATGTATTATTCCTCTGGGATTAACGGGTAACGG")
dna

Seq('AATTGGCACTGGCGTCCATGCGGCCCCTTTCAACGGATAGCCGTCCTATCGCAA...CGG')

In [None]:
# Fazendo a transcrição num único passo...
rna = dna.transcribe()
print(rna)

AAUUGGCACUGGCGUCCAUGCGGCCCCUUUCAACGGAUAGCCGUCCUAUCGCAAAUGUUUCAUUGAGUCCUCCAUUCUCCGCGGGCUGCGCCCCUGGGCAUCUCCCUUACCCUACGUGGCGUAGAAAGCUGUAAAGGACCAGCGGUGAUUUCAAUAAAAGGGCCCGGGUGUAGCAAAGCAAUACCCGUCUCGUACGCUAUCAAAUCAUUGUAAUUUGAUAUUUCAUCGGGACCCGAAGGGCUCACGACACAAACUGCCGCUGCCACAAUUGUGGUGGCGACCUCCGCCCCGUCUUCAGACACCAGUUAUGCUAGUCGAAAGUACCGCCAGGUAUUGCGCCAUAAUGCGCCAGUAGUUCCAUUUAAGUGAUACGAGCUUACAACUAAUCCCAAGCACGAGUGCAUCUGUCAUCCAUACUAAGCUACUCCGUCAAGGGUGCUGACAACGUUGUUGAUUACAGGUGUCCUAGGAACGGCCUAUGGCCGGGAGGCCUACUCAGGGGAGACCUUAUCAGUGACGCAAUGCGAACAAGUUGUAAAUACUGUCUUUCAUCUCUCACAAUCCCAAGAGUAAAAUUCCGCUGAGCAUGGUUCCAUUUGACGCUAAUCGGACAAGAACUGCCGUACUUCCACCACGUUUGUCUUGUGAGACUUUCGUCUCUGCAACGACAUGACAGCUUCCACGGUGCAUAUGAACAAAUAUCGUUCAACGCUGACCCAUCGCCAGAUAGCCUCCUACCUAUGCGCAUGGAAGGAACUGCUCCUUCAACCGCCUACAGAUAUAUACGACGUGCCCAUCGCUGAGAGCAGUCCGCGACCACUACAAUACCCCGUUCCGAUUAUUAACAAUUAGAUGUGUUGUCAGGUUCGCCCGUCCUAACGUCUGCGUUACCCCUCUUUUCAGGCCAGUAUCCAGGGUGGUAUGAAACCAGAUGUAUUAUUCCUCUGGGAUUAACGGGUAACGG
