# Absolute Basics of Regex

In [16]:
import re

#### re.match - check the begining of a string matches a pattern

In [17]:
result = re.match(r'hello', 'hello world') 
print(result) 

<re.Match object; span=(0, 5), match='hello'>


#### re.search - Searches the entire string for the first location where the pattern matches

In [25]:
result = re.search(r'world', 'hello world')
print(result)

<re.Match object; span=(6, 11), match='world'>


#### re.findall() Purpose: Finds all occurrences of the pattern in the string.

In [19]:
result = re.findall(r'\d+', 'abc123def456ghi789')
print(result)  # Outputs: ['123', '456', '789']

['123', '456', '789']


#### re.split() Purpose: Splits the string by occurrences of the pattern.

In [20]:
result = re.split(r'\s+', 'this is a test')
print(result)  # Outputs: ['this', 'is', 'a', 'test']

['this', 'is', 'a', 'test']


#### re.sub() Purpose: Replaces occurrences of the pattern with a replacement string

In [21]:
result = re.sub(r'\d+', '#', 'abc123def456ghi789')
print(result)  # Outputs: 'abc#def#ghi#'

abc#def#ghi#


##### re.compile() : Compiling a pattern once and reusing it can be faster for operations that require multiple uses of the same pattern.

In [29]:
pattern = re.compile(r'\d+') # menas any digit

In [24]:
##### .: Any character except a newline.
##### ^: Start of the string.
##### $: End of the string.
##### *: 0 or more repetitions.
##### +: 1 or more repetitions.
##### ?: 0 or 1 repetition.
##### {n}: Exactly n repetitions.
##### {n,}: n or more repetitions.
##### {n,m}: Between n and m repetitions.
##### []: Matches any one of the characters inside the brackets.
##### \d: Any digit.
##### \D: Any non-digit.
##### \s: Any whitespace.
##### \S: Any non-whitespace.
##### \w: Any word character (alphanumeric and underscore).
##### \W: Any non-word character.

#### When a match is found using re.match() or re.search(), a match object is returned. This object has several useful methods and attributes:

In [28]:
# print(match.group())  # Outputs: 'world'
# print(match.start())  # Outputs: 6
# print(match.end())    # Outputs: 11
# print(match.span())

# Splitting the Text

#### The split() method of string objects is really meant for very simple cases, and does not allow for multiple delimiters or account for possible whitespace around the delim‐ iters.

In [4]:
text = 'asdf fjdk; afed, fjek,asdf, foo'

In [6]:
re.split(r'[;,\s]\s*', text)

['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

#### r'[;,\s]\s*' :  This is a raw string (denoted by the prefix r), which ensures that backslashes are treated literally in the regex pattern.

#### [;,\s]: This is a character class that matches any one of the characters inside the brackets. It matches:

##### \s*: This matches zero or more whitespace characters following any of the characters matched by 

## . Matching Text at the Start or End of a String

In [7]:
filename = 'spam.txt'
filename.endswith('.txt')

True

#### If you need to check against multiple choices, simply provide a tuple of possibilities to startswith() or endswith()

In [10]:
filenames = [ 'Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h' ]
[name for name in filenames if name.endswith(('.c', '.h')) ]

['foo.c', 'spam.c', 'spam.h']

## Searching String

In [13]:
text = 'yeah, but no, but yeah, but no, but yeah' # Search for the location of the first occurrenc
text.find('no')

10

## String Alingnment

#### For basic alignment of strings, the ljust(), rjust(), and center() methods of strings can be used.

In [None]:
# ljust() --> adjusts the strings left
# rjust() --> adjusts the string right
# center() --> adjusts to the center

In [30]:
text = 'Hello World'
text.ljust(20)

'Hello World         '

In [31]:
text.rjust(20)
' Hello World'

' Hello World'

In [32]:
text.center(20)
' Hello World '

' Hello World '

## Combining Strings

#### One related (and pretty neat) trick is the conversion of data to strings and concatenation at the same time using a generator expression

In [41]:
data = ['ACME', 50, 91.1]
','.join(str(d) for d in data) # this one combiningtwo dtypes together

'ACME,50,91.1'

In [38]:
words = ["Hello", "world", "this", "is", "a", "test"]

In [39]:
''.join(words) # only string can be added like this also

'Helloworldthisisatest'