# Question 349

## Description

Soundex is an algorithm used to categorize phonetically, such that two names that sound alike but are spelled differently have the same representation.

Soundex maps every name to a string consisting of one letter and three numbers, like M460.

One version of the algorithm is as follows:

Remove consecutive consonants with the same sound (for example, change ck -> c).
Keep the first letter. The remaining steps only apply to the rest of the string.
Remove all vowels, including y, w, and h.
Replace all consonants with the following digits:
* b, f, p, v → 1
* c, g, j, k, q, s, x, z → 2
* d, t → 3
* l → 4
* m, n → 5
* r → 6

If you don't have three numbers yet, append zeros until you do. Keep the first three numbers.
Using this scheme, Jackson and Jaxen both map to J250.

Implement Soundex.

In [6]:
def soundex(name: str) -> str:
    # step 1: convert letters to uppercase
    name = name.upper()

    # step 2: keep the first letter
    first_letter = name[0]

    # mapping letters to digits
    mapping: dict[str, str] = {
        "B": "1",
        "F": "1",
        "P": "1",
        "V": "1",
        "C": "2",
        "G": "2",
        "J": "2",
        "K": "2",
        "Q": "2",
        "S": "2",
        "X": "2",
        "Z": "2",
        "D": "3",
        "T": "3",
        "L": "4",
        "M": "5",
        "N": "5",
        "R": "6",
    }

    # Step 3: Remove consecutive consonants with the same sound
    encoded_name = name[0]
    for i in range(1, len(name)):
        if mapping.get(name[i], None) != mapping.get(name[i - 1], None):
            encoded_name += name[i]

    # Step 4: Remove all vowels, including y, w, and h
    for char in ["A", "E", "I", "O", "U", "Y", "W", "H"]:
        encoded_name = encoded_name.replace(char, "")

    # Step 5: Replace consonants with corresponding digits
    code = "".join(
        [mapping[char] if char in mapping else char for char in encoded_name]
    )

    # Step 6: If there are fewer than three digits, append zeros
    code += "000"

    # Step 7: Return the first letter followed by the first three digits
    return first_letter + code[1:4]


soundex("Jackson"), soundex("Jaxen")

('J250', 'J250')