<a href="https://colab.research.google.com/github/nitrozyna/Rosalind/blob/master/17_mrna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Problem description:
[Introduction to Random Strings](http://rosalind.info/problems/prob/
)

An **array** is a structure containing an ordered collection of objects (numbers, strings, other arrays, etc.). We let **A[k]** denote the **k-th** value in array A. You may like to think of an array as simply a **matrix** having only one row.

A **random string** is constructed so that the probability of choosing each subsequent symbol is based on a fixed underlying symbol frequency.

**GC-content** offers us natural symbol frequencies for constructing random **DNA strings**. If the GC-content is x, then we set the symbol frequencies of C and G equal to x/2 and the symbol frequencies of A and T equal to (1−x)/2. For example, if the GC-content is 40%, then as we construct the string, the next symbol is 'G'/'C' with probability 0.2, and the next symbol is 'A'/'T' with probability 0.3.

In practice, many probabilities wind up being very small. In order to work with small probabilities, we may plug them into a function that "blows them up" for the sake of comparison. Specifically, the **common logarithm** of x (defined for x>0 and denoted log10(x)) is the exponent to which we must raise 10 to obtain x.

---

### Given: A DNA string s of length at most 100 bp and an array A containing at most 20 numbers between 0 and 1.

### Return: An array B having the same length as A in which B[k] represents the common logarithm of the probability that a random string constructed with the GC-content found in A[k] will match s exactly.

Sample Dataset

>ACGATACAA

>0.129 0.287 0.423 0.476 0.641 0.742 0.783

Sample Output

>-5.737 -5.217 -5.263 -5.360 -5.958 -6.628 -7.009

In [None]:
#@title Importing some modules to make a connection between Colab and Drive to download the current dataset
!pip install PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


In [54]:
#@title Loading test dataset
fileID = "1i8i8d0Jqo5PxwRR7rXc--qE_D3pht23Q" #@param {type:"string"}
downloaded = drive.CreateFile({'id':fileID})
downloaded.GetContentFile('rosalind_prob.txt')  # replace the file name with your file

In [81]:
#@title Functions to calculate the final probabilities

import math

def dna_prob(num,gc_count,at_count):
    gc = math.log(float(num)/2,10)
    at = math.log((1- float(num))/2,10)
    res = gc_count*gc + at_count*at
    return str(round(res,3))

def final_prob(dna,gc_prob):
    gc_count = dna.count("G") + dna.count("C")
    at_count = len(dna) - gc_count
    nums = gc_prob.split()
    final = []
    for num in nums:
        res = dna_prob(num,gc_count,at_count)
        final.append(res)
    return final

In [82]:
#@title Preprcessing
with open('rosalind_prob.txt','r') as f:
    all = f.readlines()
    result = final_prob(all[0].strip(),all[1].strip())

print(" ".join(result))


-80.453 -71.618 -62.748 -59.829 -57.704 -56.478 -54.63 -54.376 -54.031 -54.673 -55.713 -56.959 -61.422 -67.818 -74.041
