# Finding a Shared Motif

## Problem
A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

**Given:** A collection of k (k≤100) DNA strings of length at most 1 kbp each in FASTA format.

**Return**: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

_____

**Sample Input**
```
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
```


**Sample Output** <br>
`AC`

## sandbox

In [60]:
def process_fasta(dna_string):
    dna_dict = {}
    for item in dna_string:
        if item[0] == ">":
            current_seq = item.strip()
            dna_dict[current_seq] = ""
        else:
            dna_dict[current_seq] += item.strip()
    return dna_dict

def all_substrings(string):
    substring_list = set( string[i:j +1] for i in range(len(string)) for j in range(i, len(string)) )
    return substring_list

def max_motif(fasta_path):
    with open(fasta_path) as inF:
        dna_list = inF.readlines()

    dna_dict = process_fasta(dna_list)
    compiled_sets = []
    for key in dna_dict:
        compiled_sets.append(all_substrings(dna_dict[key]))
    set_intersect = set.intersection(*compiled_sets)
    print(max(set_intersect, key=len))

In [63]:
my_file = "/Users/jmarks/Desktop/GitHub/jaamarks_notebooks/General/sandbox/ros_file"
my_file = "/Users/jmarks/Downloads/rosalind_lcsm.txt"

max_motif(my_file)

GTAGTTTTTACAAACGCACGATAATCTGTTGGTTGTGCCTACGCTGCGCATTTCAGGTAGGCATGGTCGTGCATTCCGGAGC
