# 433. Minimum Genetic Mutation
A gene string can be represented by an 8-character long string, with choices from 'A', 'C', 'G', and 'T'.

Suppose we need to investigate a mutation from a gene string start to a gene string end where one mutation is defined as one single character changed in the gene string.

    For example, "AACCGGTT" --> "AACCGGTA" is one mutation.

There is also a gene bank bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.

Given the two gene strings start and end and the gene bank bank, return the minimum number of mutations needed to mutate from start to end. If there is no such a mutation, return -1.

Note that the starting point is assumed to be valid, so it might not be included in the bank.

### Idea for Solution
You want to create a graph where two nodes are connected if and only if they have a single letter difference.

Then you want to do a breadth-first search to find the shortest path between the start and the end nodes.

### Solution

In [1]:
import regex as re

In [2]:
# Clearly this problem applies to Single Nucleotide Polymorphisms (SNPs)
# So the question is, how many SNPs to get from the starting sequence to the end sequence
# Also, this question is identical to the game word ladder
# Which means this leetcode question is identical: https://leetcode.com/problems/word-ladder/

class Solution:
    def minMutation(self, start: str, end: str, bank: list) -> int:
        seq_len = len(start)
        bank = set(bank)
        depth = 0
        # Stop early for trivial substitutions
        if start == end:
            return depth
        if end not in bank:
            depth -= 1
            return depth

        # If nontrivial, start a breadth-first search
        queue = [start]
        visited = {start: depth}
        while queue:
            # Select a node and remove it from the queue
            node = queue.pop(0)
            # Now find the children of the unvisited node
            for nucleotide_index in range(seq_len):
                option = node[0:nucleotide_index] + "\w" + node[nucleotide_index+1:]
                for seq in bank:
                    if seq not in visited:
                        if re.search(option, seq):
                            if seq == end:
                                depth = visited[node] + 1
                                return depth
                            visited[seq] = visited[node] + 1
                            queue.append(seq)
        if node != end:
            depth -= 1
        return depth


In [3]:
start = "AACCGGTT"
end = "AAACGGTA"
bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]

Solution().minMutation(start, end, bank)

2