# Match and replace text.

Let's say some gave you a repo where a class exists with the a property `A.bb` but
someone also decided to but in `A.b` together with `A.bbb`.

You needed to refactor some names and rename `bb` to `c`.

`text.replace(old,new,count)` doesn't really play well with the code:

In [1]:
text = """
A.b
A.bb
A.bbb
"""

In [2]:
print(text.replace('bb', 'c'))


A.b
A.c
A.cb



As you can see, the property `A.bbb` became `A.cb` which defeats the purpose.

You could of course include `A.bb` in the text, but that's just playing on the specific example I've been able to come up with.

A better option: Include the **name space** which the text shouldn't be
confused with.

Let's start with a very generic test case:

In [3]:
text = '111011010110111'

We want to replace `11` in the text with `2`, so it becomes:

In [4]:
text = '1110201020111'

We could expand the test case to include replacements which are both shorter, equal to and longer than the original `11` we are looking for:

In [5]:
def test_1():
    name_space = ['111', '11', '1']
    text = '111011010110111'
    old = '11'

    expected = "1110{n}010{n}0111"
    for new in ['2', '22', '222']:
        output = replace(text, old, new, name_space)
        assert output == expected.format(n=new)

To satisfy the test case, we will only need the replace function:

In [6]:
def replace(text, old, new, name_space):
    if name_space and old not in name_space:
        raise ValueError(f"{old} is not in the namespace")

    # eliminate all irrelevant names from the name_space
    reduced_name_space = [n for n in name_space if old in n]
    # and create a bitmap for longer names where the target is within the name
    # but obviously shouldn't be overwritten:
    bytemap = [0 for _ in text]
    names = sorted(reduced_name_space, key=lambda x: len(x), reverse=True)
    for name in names:
        value = 2 if name == old else 1

        index = 0
        for i in range(text.count(name)):
            index = text.index(name, index)
            if bytemap[index] == 0:
                for j, letter in enumerate(name):
                    bytemap[j+index] = value
            index += len(name)
    # at this point the only match that has a 2 in the bitmap will be the
    # target (looking_for)
    if 2 not in bytemap:
        raise ValueError(f"{old} not found")

    new_text = []

    start, end = 0, 0
    while end < len(bytemap):
        try:
            end = bytemap.index(2, start)
        except ValueError:
            end = len(bytemap)
        new_text.append( text[start:end] )

        if end < len(bytemap):
            new_text.append(new)
            start = end + len(old)

    return "".join(new_text)

Now the test will work:

In [7]:
name_space = ['111', '11', '1']
text = '111011010110111'
old = '11'

expected = "1110{n}010{n}0111"
for new in ['2', '22', '222']:
    output = replace(text, old, new, name_space)
    assert output == expected.format(n=new)
    print(old, "-->", new, "=", output)

11 --> 2 = 1110201020111
11 --> 22 = 111022010220111
11 --> 222 = 11102220102220111


Some comments:

- The code is reasonably performant as the name_space is reduced to substrings which contain the `old` string.
- As the search starts with the longest name that contains the substring, the risk of overwrite is zero.
- Updating the bytemap (not bitmap) is also very efficient as we only flip a digit when we have match.
- The search for replacements operates on "chunks", whereby the slicing operation doesn't duplicate the characters, but merely sets a pointer.
- Finally we use `.join` to generate the output string.

For my tooling needs, I'm satisfied.