# 433. Minimum Genetic Mutation
A gene string can be represented by an 8-character long string, with choices from 'A', 'C', 'G', and 'T'.

Suppose we need to investigate a mutation from a gene string start to a gene string end where one mutation is defined as one single character changed in the gene string.

    For example, "AACCGGTT" --> "AACCGGTA" is one mutation.

There is also a gene bank bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.

Given the two gene strings start and end and the gene bank bank, return the minimum number of mutations needed to mutate from start to end. If there is no such a mutation, return -1.

Note that the starting point is assumed to be valid, so it might not be included in the bank.

### Idea for Solution
You want to create a graph where two nodes are connected if and only if they have a single letter difference.

Then you want to do a breadth-first search to find the shortest path between the start and the end nodes.

### Solution

In [1]:
import regex as re

In [2]:
# Clearly this problem applies to Single Nucleotide Polymorphisms (SNPs)
# So the question is, how many SNPs to get from the starting sequence to the end sequence
# Also, this question is identical to the game word ladder
# Which means this leetcode question is identical: https://leetcode.com/problems/word-ladder/

class Solution:
    def minMutation(self, start: str, end: str, bank: list) -> int:
        seq_len = len(start)
        # Stop early for trivial substitutions
        if start == end:
            return 0
        if end not in bank:
            return -1

        # If nontrivial, create a graph with all sequences
        graph_mut = {start: []}
        for i in bank:
            graph_mut[i]=[]
            
        depth = self.breadth_first_add_and_search(bank, start, end, seq_len)
        return depth
        
    def single_sub(self, seq: str, bank: list, seq_len: int) -> bool:
        if seq in bank:
            bank.remove(seq)
        neighbors = []
        for nucleotide_index in range(seq_len):
            option = seq[0:nucleotide_index] + "\w" + seq[nucleotide_index+1:]
            for key in bank:
                if re.search(option,key):
                    neighbors.append(key)
        return neighbors

    def breadth_first_add_and_search(self, bank, start, end, seq_len, depth = 0, visited = []):
        queue = [start]
        visited.append(start)
        print("BFAS")
        while queue:
            depth += 1
            node = queue[0]
            queue.remove(node)
            neighbors = self.single_sub(node, bank, seq_len)
            print(neighbors)
            print(depth)
            if end in neighbors:
                return depth
            for node in neighbors:
                if node not in visited:
                    queue.append(node)
        return -1


In [4]:
start = "AACCGGTT"
end = "AAACGGTA"
bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]

start = "CGT"
end = "AGA"
bank = ["CGA","CCA","AGA"]

start = "AAA"
end = "CCC"
bank = ["AAC","ACC","CCC"]

"AACCTTGG"
"AATTCCGG"
["AATTCCGG","AACCTGGG","AACCCCGG","AACCTACC"]

Solution().minMutation(start, end, bank)

BFAS
['AAC']
1
['ACC']
2
['CCC']
3


3