# An small experiment with "Replace text in Rython" 

Use case 1: replace "ab" by ""

- ```re.sub```
- ```pattern = re.compile``` then ```pattern.sub```
- ```str.replace```

**Conclusion: ```str.replace``` work best**

==========

Use case 2
: replace "a" and "b" by ""

- ```re.sub```
- ```pattern = re.compile``` then ```pattern.sub```
- ```str.replace```

**Conclusion: ```str.replace``` work best**



In [36]:
import re
import random
import string
import datetime

random.seed(1)

## Generate 1 million strings for this exp

In [175]:
def random_string(string_length=8):
    letters = string.ascii_lowercase
    return "".join(random.choice(letters) for i in range(string_length))

In [177]:
def gen_str_list(number_of_str):
    return [random_string() for _ in range(number_of_str)]
        
one_million_str = gen_str_list(1000000)

In [186]:
# how many "ab" in one_one_million_str
print(sum(["ab" in s for s in one_million_str]))

10448


# USE CASE 1

### Experiments 1: Calc time spent on substitute text in a large set of str using ```re.sub```

In [233]:
# behavior checking
sample_string = "abda"
re.sub("a",  "", sample_string, count=1)

'bda'

In [227]:
def using_re_sub_case_1(str_list):
    output_list = []
    for s in str_list:
        output_list.append(re.sub("ab", "", s, count=1))
    return output_list

In [228]:
def using_re_sub_case_2(str_list):
    return [re.sub("ab", "", s, count=1) for s in str_list]

In [229]:
def using_re_sub_case_3(str_list):
    return list(map(lambda s:re.sub("ab", "", s, count=1), str_list))

In [243]:
%timeit using_re_sub_case_1(one_million_str_list)

554 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [244]:
%timeit using_re_sub_case_2(one_million_str_list)


513 ms ± 5.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [245]:
%timeit using_re_sub_case_3(one_million_str_list)


557 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [246]:
assert using_re_sub_case_1(one_million_str) == using_re_sub_case_2(one_million_str) == using_re_sub_case_3(one_million_str)

### Experiments 2: Calc time spent on substitute text in a large set of str using ```str.replace```

In [251]:
# behavior checking
sample_string = "abda"
sample_string.replace("a", "", 1)

'bda'

In [235]:
def using_str_replace_case_1(str_list):
    output_list = []
    for s in str_list:
        output_list.append(s.replace("ab", "", 1))
    return output_list

In [236]:
def using_str_replace_case_2(str_list):
    return [s.replace("ab", "", 1) for s in str_list]

In [237]:
%timeit using_str_replace_case_1(one_million_str_list)

149 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [238]:
%timeit using_str_replace_case_2(one_million_str_list)

119 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [247]:
assert using_str_replace_case_1(one_million_str) == using_str_replace_case_2(one_million_str)

### Experiments 3: Calc time spent on substitute text in a large set of str using ```compile and sub``` (COMPILE EVERYTIME)

In [254]:
# behavior checking
sample_string = "abda"
pattern = re.compile(r"a")
pattern.sub("", sample_string, count=1)

'bda'

In [261]:
def using_re_compile_and_sub_case_1(str_list):
    output_list = []
    pattern = re.compile(r"ab")
    for s in str_list:
        output_list.append(pattern.sub("", s, count=1))
    return output_list

In [267]:
%timeit using_re_compile_and_sub_case_1(one_million_str)

309 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Experiments 3: Calc time spent on substitute text in a large set of str using ```compile and sub``` (COMPILE EVERYTIME)

In [254]:
# behavior checking
sample_string = "abda"
pattern = re.compile(r"a")
pattern.sub("", sample_string, count=1)

'bda'

In [261]:
def using_re_compile_and_sub_case_1(str_list):
    output_list = []
    pattern = re.compile(r"ab")
    for s in str_list:
        output_list.append(pattern.sub("", s, count=1))
    return output_list

In [267]:
%timeit using_re_compile_and_sub_case_1(one_million_str)

309 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Experiments 4: Calc time spent on substitute text in a large set of str using ```compile and sub``` (COMPILE ONCE)

In [272]:
pattern = re.compile(r"ab")
def using_re_compile_and_sub_case_2(str_list):
    output_list = []
    for s in str_list:
        output_list.append(pattern.sub("", s, count=1))
    return output_list

In [273]:
%timeit using_re_compile_and_sub_case_2(one_million_str)

315 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# USE CASE 2

### Experiments 1: Calc time spent on substitute text in a large set of str using ```re.sub```

In [274]:
# behavior checking
sample_string = "abda"
re.sub(r"[ab]",  "", sample_string)

'd'

In [275]:
def using_re_sub_case_1(str_list):
    output_list = []
    for s in str_list:
        output_list.append(re.sub(r"[ab]", "", s))
    return output_list

In [276]:
def using_re_sub_case_2(str_list):
    return [re.sub(r"[ab]", "", s) for s in str_list]

In [277]:
def using_re_sub_case_3(str_list):
    return list(map(lambda s:re.sub(r"[ab]", "", s), str_list))

In [278]:
%timeit using_re_sub_case_1(one_million_str_list)

702 ms ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [279]:
%timeit using_re_sub_case_2(one_million_str_list)


656 ms ± 4.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [280]:
%timeit using_re_sub_case_3(one_million_str_list)


712 ms ± 7.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [282]:
assert using_re_sub_case_1(one_million_str) == using_re_sub_case_2(one_million_str) == using_re_sub_case_3(one_million_str)

### Experiments 2: Calc time spent on substitute text in a large set of str using ```str.replace```

In [283]:
# behavior checking
sample_string = "abda"
sample_string.replace("a", "").replace("b", "")

'd'

In [284]:
def using_str_replace_case_1(str_list):
    output_list = []
    for s in str_list:
        output_list.append(s.replace("a", "").replace("b", ""))
    return output_list

In [285]:
def using_str_replace_case_2(str_list):
    return [s.replace("a", "").replace("b", "") for s in str_list]

In [286]:
%timeit using_str_replace_case_1(one_million_str_list)

264 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [287]:
%timeit using_str_replace_case_2(one_million_str_list)

230 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [288]:
assert using_str_replace_case_1(one_million_str) == using_str_replace_case_2(one_million_str)

### Experiments 3: using ```compile and sub``` (COMPILE EVERYTIME)

In [290]:
# behavior checking
sample_string = "abda"
pattern = re.compile(r"[ab]")
pattern.sub("", sample_string)

'd'

In [291]:
def using_re_compile_and_sub_case_1(str_list):
    output_list = []
    pattern = re.compile(r"[ab]")
    for s in str_list:
        output_list.append(pattern.sub("", s))
    return output_list

In [292]:
%timeit using_re_compile_and_sub_case_1(one_million_str)

360 ms ± 2.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Experiments 4: using ```compile and sub``` (COMPILE ONCE)

In [293]:
pattern = re.compile(r"[ab]")
def using_re_compile_and_sub_case_2(str_list):
    output_list = []
    for s in str_list:
        output_list.append(pattern.sub("", s))
    return output_list

In [294]:
%timeit using_re_compile_and_sub_case_2(one_million_str)

369 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# USE CASE 2

### Experiments 1: Calc time spent on substitute text in a large set of str using ```re.sub```

In [233]:
# behavior checking
sample_string = "abda"
re.sub("a",  "", sample_string, count=1)

'bda'

In [227]:
def using_re_sub_case_1(str_list):
    output_list = []
    for s in str_list:
        output_list.append(re.sub("ab", "", s, count=1))
    return output_list

In [228]:
def using_re_sub_case_2(str_list):
    return [re.sub("ab", "", s, count=1) for s in str_list]

In [229]:
def using_re_sub_case_3(str_list):
    return list(map(lambda s:re.sub("ab", "", s, count=1), str_list))

In [243]:
%timeit using_re_sub_case_1(one_million_str_list)

554 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [244]:
%timeit using_re_sub_case_2(one_million_str_list)


513 ms ± 5.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [245]:
%timeit using_re_sub_case_3(one_million_str_list)


557 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [246]:
assert using_re_sub_case_1(one_million_str) == using_re_sub_case_2(one_million_str) == using_re_sub_case_3(one_million_str)

### Experiments 2: Calc time spent on substitute text in a large set of str using ```str.replace```

In [251]:
# behavior checking
sample_string = "abda"
sample_string.replace("a", "", 1)

'bda'

In [235]:
def using_str_replace_case_1(str_list):
    output_list = []
    for s in str_list:
        output_list.append(s.replace("ab", "", 1))
    return output_list

In [236]:
def using_str_replace_case_2(str_list):
    return [s.replace("ab", "", 1) for s in str_list]

In [237]:
%timeit using_str_replace_case_1(one_million_str_list)

149 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [238]:
%timeit using_str_replace_case_2(one_million_str_list)

119 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [247]:
assert using_str_replace_case_1(one_million_str) == using_str_replace_case_2(one_million_str)

### Experiments 3: Calc time spent on substitute text in a large set of str using ```compile and sub``` (COMPILE EVERYTIME)

In [254]:
# behavior checking
sample_string = "abda"
pattern = re.compile(r"a")
pattern.sub("", sample_string, count=1)

'bda'

In [261]:
def using_re_compile_and_sub_case_1(str_list):
    output_list = []
    pattern = re.compile(r"ab")
    for s in str_list:
        output_list.append(pattern.sub("", s, count=1))
    return output_list

In [267]:
%timeit using_re_compile_and_sub_case_1(one_million_str)

309 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Experiments 3: Calc time spent on substitute text in a large set of str using ```compile and sub``` (COMPILE EVERYTIME)

In [254]:
# behavior checking
sample_string = "abda"
pattern = re.compile(r"a")
pattern.sub("", sample_string, count=1)

'bda'

In [261]:
def using_re_compile_and_sub_case_1(str_list):
    output_list = []
    pattern = re.compile(r"ab")
    for s in str_list:
        output_list.append(pattern.sub("", s, count=1))
    return output_list

In [267]:
%timeit using_re_compile_and_sub_case_1(one_million_str)

309 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Experiments 4: Calc time spent on substitute text in a large set of str using ```compile and sub``` (COMPILE ONCE)

In [272]:
pattern = re.compile(r"ab")
def using_re_compile_and_sub_case_2(str_list):
    output_list = []
    for s in str_list:
        output_list.append(pattern.sub("", s, count=1))
    return output_list

In [273]:
%timeit using_re_compile_and_sub_case_2(one_million_str)

315 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


reading more:
- https://stackoverflow.com/questions/3411771/best-way-to-replace-multiple-characters-in-a-string