# Finding a Motif in DNA

**Problem**

Given two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).

The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].

A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = "AUGCUUCAGAAAGGUCUUACG", then s[2:5] = "UGCU".

The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s (see the Sample below).

**Given:** Two DNA strings s and t (each of length at most 1 kbp).

**Return:** All locations of t as a substring of s.

**Sample Dataset**

GATATATGCATATACTT

ATAT

**Sample Output **

2 4 10

_____________

In [35]:
with open("rosalind_subs.txt", "r") as f: 
    seq = f.readline().strip()
    motif = f.readline().strip()
    
print("Sequence: %s" % seq)
print("Motif: %s" % motif)


Sequence: CCGTAATGTCGTAATGTGTAATGTCAGGTAATGTGTAATGTGACCTCGTAATGTGTCCTCGCCGTAATGTTGTAATGTGTAATGTGTGTGCGATGTGACGGTAATGTCTGTAATGTCTGCGTAATGTGTAATGTGCGTAATGTGGTAATGTCGTAATGTCGTAATGTTAATGTAATGTCAATGTAATGTTAGGTAATGTCTTCACGAGACTTGTAATGTCTGTAATGTTTCGTAATGTGACCAAGTAATGTCCCCAAATTAGTAATGTTTGGGCAAGTCGTAATGTGTAATGTGGTAATGTGTAATGTACGTCCTCAGCTGTAATGTGTAATGTTTAGTAATGTCTCAGAACGGTGTAATGTGGGCGTAATGTCCGTAATGTGTAATGTCTCCCGGTAATGTCACTGGTAATGTGCTGACATCGATGTAATGTGGTAATGTGTAATGTAGTAATGTGTAATGTGTAATGTGGTCGACCTTCGGATGTAATGTGCGTAATGTTCTTCGTAATGTTCAAGTAATGTTAGTAATGTATTTATGGGTAATGTTCAATCGTAATGTGTAATGTGGGCCTGAGAAGGCTGTAATGTGTAATGTTTCACTGGGGTAATGTAGTAATGTCGGTGTAATGTTATCGTAATGTGTAATGTATAGTAATGTGGTAGTAATGTACGGTGCTGGTAATGTGTAATGTTGTAATGTGGTAATGTAGTAATGTGGTAATGTTCTGTAATGTACCTGTAATGTCTACGTAATGTCGTAATGTTCCGTAATGTCCATTTAACGTAATGTACTATTACAGTAATGTGTAATGTTTTGTAATGTAACAGTAATGTGACTGTAATGTCCACGTAATGTGTAATGTATGTAATGTAGAGTAATGTCGTAGTAATGTTGTGGTAATGTTGCATGGTAATGTGTAATGT
Motif: GTAATGTGT


In [36]:
pos = list()
cursor = 0 
while cursor < len(seq): 
    # find the first occurence of the motif in the seq, starting from cursor's position
    p = seq.find(motif, cursor)
    
    # if no motif has been found, finish the search
    if p == -1: 
        break
    
    # otherwise, append the positions' list and move the cursor to the next position
    # Note: we add 1 to p, as Rosalind accepts 1-based (not 0-based) positioning
    pos.append(p+1)    
    cursor = p + 1     
    

print(' '.join(map(str, pos)))
    

11 28 48 72 79 121 280 295 321 376 435 450 457 555 584 637 681 802 852 913
