# Find and Replace Deluxe:
## How to use a Python dict for find/replace functions 

We're going to go through several examples of find-and-replace tasks of varying complexity. We'll also find some potential hiccups and how to avoid them. We're going to use a simple string that is a coherent message if the code works and is hard to read if it isn't to see if our efforts are working.

### Simple character-character substitution

Replacing one character with another is pretty straightforward using a function. This function takes the string and the dict as arguments, iterates through them, and makes the replacements. You'll notice it has an "if" statement. This is to make sure that the item exists in our dict or else we get a keyError. We can't replace something will the right pair in the dict if the item isn't in the dict, and, if you think about it, that's what we're trying to do if we iterate over every letter and look it up in the dict regardless of whether or not it's there. 

Note: I'm going to comment every line on this one to make sure we're on the same page as this function will serve as the basis of all the ones coming after it. After that, I'll just highlight what is different about the function.

In [33]:

# here is our target string, contaminated with "X"s
s = "HEREXISXAXSAMPLEXSTRING"

# dictionary pairs "X" with "-"
d = {"X":"-"}

# define a functions that takes a string and a dict
def find_replace(string, dictionary):
    # is the item in the dict?
    for item in string:
        # iterate by keys
        if item in dictionary.keys():
            # look up and replace
            string = string.replace(item, dictionary[item])
    # return updated string
    return string

# call the funciton
find_replace(s,d)

'HERE-IS-A-SAMPLE-STRING'

We can see that all the "X"s have been replaced by "-", their paired value.

### Replacing multi-character values

This follows the same basic pattern, but uses a simple regex function. We will get rid of "ABC"s instead of "X"s. We import the re library and modify our string and dict for the test.

In [34]:

import re

s = "HEREABCISABCAABCSAMPLEABCSTRING"

d = {"ABC":"-"}

def find_replace_multi(string, dictionary):
    for item in dictionary.keys():
        # sub item for item's paired value in string
        string = re.sub(item, dictionary[item], string)
    return string

find_replace_multi(s, d)

'HERE-IS-A-SAMPLE-STRING'

### Replacing single and multi character patterns... oh wait, crap....

There is a final kink in this however. What would happen if one of the single character values we wanted to replace is in the multi-character string? Let's walk through the problem from right before we discover it to how we solve it to understand why it occurs not just how to get rid of it.

In [35]:

# the middle "ABC" has been replaced with just an "C"
s = "HEREABCISABCACSAMPLEABCSTRING"

# ...which we see 
find_replace_multi(s,d)

'HERE-IS-ACSAMPLE-STRING'

Alas! It seems we have a variation. One of the "ABC"s is just a "C". No problem, right? We just update the dictionary (spoiler alert: this doesn't work).

In [36]:

# the troublesome string
s = "HEREABCISABCACSAMPLEABCSTRING"

# new dictionary to get rid of "C"
d ={"C":"-", "ABC":"-"}

find_replace_multi(s,d)

'HERE-IS-A-SAMPLE-STRING'

We got rid of the "C", but now have these "AB"s everywhere. 

What's happened here is that the function went through and removed all the "C"s just like we told it to. The side effect, however, is that the pattern "ABC" doesn't exist any more; they've all been turned into "AB-"s so they don't match anything in our dictionary. 

Can we score an easy solve by changing the order of the entries in the dict manually? Let's try putting the "ABC" first to see if it gets replaced first (hint: no)

In [37]:

# the troublesome string
s = "HEREABCISABCACSAMPLEABCSTRING"

# simply switching the order in the dict delcaration itself
d ={"ABC":"-", "C":"-"}

find_replace_multi(s,d)

'HERE-IS-A-SAMPLE-STRING'

Nope! Why? Remember that in the function we are iterating through the keys in the dictionary in the first line of the function after the definition.


Let's look under the hood. We can use a print statement to see what order the keys are going on.

In [38]:
print(d.keys())

dict_keys(['ABC', 'C'])


And there we have it! The "C" comes first in the iteration. 

### So what do we do?
My solution for this problem is to make sure we iterate through the dictionary keys from largest by length to smallest to ensure that we don't replace any pieces of them by accident. We can do this by using the sorted() function and using the reverse = True options (it goes smallest to largest by default). Check it.

In [39]:

s = "HEREABCISABCACSAMPLEABCSTRING"

d ={"C":"-", "ABC":"-"}

def find_replace_multi_ordered(string, dictionary):
    # sort keys by length, in reverse order
    for item in sorted(dictionary.keys(), key = len, reverse = True):
        string = re.sub(item, dictionary[item], string)
    return string

find_replace_multi_ordered(s, d)

'HERE-IS-A-SAMPLE-STRING'

Looks as if we are on the right track. I'll write one more test to be sure. To confirm that the largest is going first, and not simply that "ABC" is going before "C", we'll add another value to the dict (the rest of the code stays the same). I'll use "CSAMPLEABC", because the only way for that to be replaced is if it goes before "C" and "ABC" as BOTH of those strings are in it.

In [40]:
s = "HEREABCISABCACSAMPLEABCSTRING"

d ={"C":"-", "ABC":"-", "CSAMPLEABC":"- :) -"}

def find_replace_multi_ordered(string, dictionary):
    # sort keys by length, in reverse order
    for item in sorted(dictionary.keys(), key = len, reverse = True):
        string = re.sub(item, dictionary[item], string)
    return string

find_replace_multi_ordered(s, d)

'HERE-IS-A- :) -STRING'

And there we have it! Our final function. It's worth noting that, while I made the initial character-character find-and-replace function to work us up to this one, we final function will work for simple character-character substitutions too, so we only need the one, final version for all the tasks here. I'd advise doing so because it has error avoidance with the iteration and the "if" statment.

### If you're the type to stick around after the credits...

...here is a bonus and final note of caution:

What goes wrong if we don't use the new version?

This version of the function avoids situations that seem fine but could cause problems. Python dictionaries are unordered. There is, generally, no reason to store the items in order because you look up things up with keys, not an index like an array. Note that the order stays the same, it's just generated with no concern for sequence.

If you're using a dictionary then, it's thus possible to "accidentally" find and replace in the "right" order. This will produce the exact results you are looking for but might not in the future if the dictionary is ever updated. For example, if "ABC" happened to be replaced before "A", it would look like things were fine, but the next time you run the code from scratch they might not be.