# NLP Basics: Regular expression replacements

### Using regular expressions in Python

Python's `re` package is the most commonly used regex resource. More details can be found [here](https://docs.python.org/3/library/re.html).

### Replacing a specific string

In [1]:
import re

pep8_test = 'I try to follow PEP8 guidelines'
pep7_test = 'I try to follow PEP7 guidelines'
peep8_test = 'I try to follow PEEP8 guidelines'

In [2]:
re.findall('[a-z]+', pep8_test)

['try', 'to', 'follow', 'guidelines']

Here, as we used the lower case range, we can notice that the thing that's missing is PEP8 and I. So what that indicates to us is that the regx is case sensitive, and it's only looking for lowercase a through lowercase z. So, lets change to uppercase and see the results. 

In [3]:
re.findall('[A-Z]+', pep8_test)

['I', 'PEP']

The above code captures 'I' and 'PEP' but it doesn't get the 8 in there. But for pep8 what we wanted it to do was capture letters and numbers. 

In [4]:
re.findall('[A-Z]+[0-9]+', pep8_test)

['PEP8']

With the above code line, we were able to find the alpha-numeric words. No, applying the same to other statements.

In [5]:
re.findall('[A-Z]+[0-9]+', pep7_test)

['PEP7']

In [6]:
re.findall('[A-Z]+[0-9]+', peep8_test)

['PEEP8']

Now that we have the proper regx, the second part of this process is to replace the tokens captured by this regx with 'PEP8 Python Styleguide'. So for this we're going to use the subfunction which will search for a regx pattern and then when it finds it it will replace it with a given string.

In [7]:
re.sub('[A-Z]+[0-9]+', 'PEP8 Python Styleguide', pep8_test)

'I try to follow PEP8 Python Styleguide guidelines'

re.sub('[A-Z]+[0-9]+', 'PEP8 Python Styleguide', pep7_test)

In [9]:
re.sub('[A-Z]+[0-9]+', 'PEP8 Python Styleguide', peep8_test)

'I try to follow PEP8 Python Styleguide guidelines'

Now this regx certainly isn't perfect, you can imagine a scenario that it would miss, for instance if there's a space between pep and 8 or if it was lowercase, it would miss both of those. so, some a bit more refining of regx should be done.

### Other examples of regex methods

- re.search()
- re.match()
- re.fullmatch()
- re.finditer()
- re.escape()