# Story of backslash:
## Escape Character in Strings
* Within strings, the **backslash( \\ )** is used as an ***escape character***. 
* It allows you to include **special characters** in a string.

Common escape sequences:  
1. ` \n ` : Newline  
2. ` \t ` : Tab  
3. ` \\ ` : Backslash  
4. ` \' ` : Single quote  
5. ` \" ` : Double quote  

## Raw Strings:
* Raw string is also very important concept.
* When a string is prefixed with ***r*** or ***R***,it becomes a raw string. 
* This means that backslashes are treated as ***literals*** and not as ***escape characters***.

## 1. re.search(regex, string):
* **regex** : It's a ***pattern*** you are searching inside ***string***.
* **string** : It's a ***text*** in which ***pattern*** to be found out.
   
* Before using it, You have to first import it:
   1. `import re` then use it like : `re.search()`
   2. `from re import search` Then use it directly : `search()`  
   
* It returns a **match object**. Otherwise returns **None**.  
* This ***match object*** contains a wealth of information:


### Let's see what actually is inside 'match object'
 


In [1]:
from re import search

text = 'abc123abc'

print( search('123',text))

<re.Match object; span=(3, 6), match='123'>


#### <re.Match object; span=(3, 6), match='123'>
* `span=(3, 6)` indicates the portion of ***text*** in which the match was found.   
* `match='123'` indicates which characters from ***text*** matched.

In [3]:
import re
text = 'abcad4235abc'

print( re.search('ad[1-9]',text) )

<re.Match object; span=(3, 6), match='ad4'>


* Here : 
   * **pattern** = 'ad[1-9]'
   *  **text** = 'abcad4235abc'

* Now In match object: `<re.Match object; span=(3, 6), match='ad4'>`
   * span = (3,6) i.e. index=3 to 6
   * match='ad4'.

# Metacharacters

## 1. [ ] -> Square bracketts
* So first a set of characters specified in square brackets ([]) makes up a character class.
    * Example 1 : [0-9]  = [0,1,2,3,4,5,6,7,8,9]   
    * Example 2 : [a-z]  = [a,b,c,d......z]   
    * Example 3 : [a1TpO23] = This is also characte class. It matches a single character : 'a','1','T','p','0',...
    * Example 4 : [0-9A-Fa-f] : This character class matches characters from '0' to '9','A' to 'F', 'a' to 'f'

* It matches a single character.

In [5]:
import re
print( re.search('ba[artz]', 'foobarqux'))
print( re.search('ba[artz]', 'foobazqux'))

<re.Match object; span=(3, 6), match='bar'>
<re.Match object; span=(3, 6), match='baz'>


In [6]:
import re
print( re.search('[a-z]', 'FOObar'))

<re.Match object; span=(3, 4), match='b'>


In [8]:
import re
print( re.search('[0-9a-fA-f]', '--- e0 ---'))

<re.Match object; span=(4, 5), match='e'>


### How to *complement* a character class:
* What is the meaning of **complement** :
    *  It means simply Exclusion or Negation or just Opposite.
* So when you use *complement* in character class, you are implying *Match any character except the ones which are declared inside character class.*

* We can complement a character class by specifying ` ^ ` **as the first character**
* NOTE : If this ` ^ ` *character isn’t the first character in a character class, then it has no special meaning* and matches a literal ` ^ ` character.

In [9]:
import re
print( re.search('[^0-9]', '12345foo'))

<re.Match object; span=(5, 6), match='f'>


It's clear that, Characters which were inside [] aren't matched because we used complement.  


### What If you want to pass it ` ^ ` as literal in regex. 

In [10]:
import re
print( re.search('[#:^]', 'foo^bar:b#q') )  #  Here [#:^] = character class. ^ symbol is meaningless here.

<re.Match object; span=(3, 4), match='^'>


As we can specify a range of characters in a character class by separating characters with a **hyphen**. Like [a-zA-Z0-9].   
So Now Question is :
### How can we use hyphen(-) as a literal?
* There are various ways of achieveing that : 
    1. place it as the first character
    2. Or place it as the last character.
    3. Or escape it with a backslash ` \ `


In [11]:
print( search('[-abc]', '123-456'))

print(re.search('[abc-]', '123-456'))

print(re.search('[ab\-c]', '123-456'))

<re.Match object; span=(3, 4), match='-'>
<re.Match object; span=(3, 4), match='-'>
<re.Match object; span=(3, 4), match='-'>


Similarly, If you want to include '[' , ']' as literal in regex. you can backslash (\) or you can put them as first character in character class.

## 2. dot(.):
* It also matches single character.
* it matches ***any character*** except ***newline***
* For example let's consider this regex : **'foo.bar'**
   * It means the characters `foo`, then any character except newline, then the characters ` bar`

In [1]:
import re

print( re.search('foo.bar', 'fooxbar') )

print(re.search('foo.bar', 'foobar'))

print(re.search('foo.bar', 'foo\nbar'))

<re.Match object; span=(0, 7), match='fooxbar'>
None
None


# 3. `\w` and `\W` : 

* ` \w ` : 
  * It matches any  `uppercase and lowercase letters`, `digits`, and `the underscore ( _ )` characters.
  * So ` \w ` is essentially shorthand for `[a-zA-Z0-9_ ]`
  * Hence, It returns first non-word character.

* ` \W ` : 
  * It basically opposite of ` \w `.
  * It is equivalent to `[^a-zA-Z0-9_ ]`
  * It returns first non-word character.


* ***`Trick to remember : W = Words ( alphanumeric + _ )`***

In [6]:
import re
print(re.search('\w', '#(.a$@&'))
print(re.search('[a-zA-Z0-9_]', '#(.a$@&'))

print("-----------------------------------------------")
print("Demonstration of \W :")

print(re.search('\W', 'a_1*3Qb'))
print(re.search('[^a-zA-Z0-9_]', 'a_1*3Qb')) # Remember here we are complementing it using ^ inside []


<re.Match object; span=(3, 4), match='a'>
<re.Match object; span=(3, 4), match='a'>
-----------------------------------------------
Demonstration of \W :
<re.Match object; span=(3, 4), match='*'>
<re.Match object; span=(3, 4), match='*'>


## 4. ` \d ` and  ` \D `:
* d means digits, 
* ` \d ` matches any decimal digit character. 

* ` \D ` is the opposite to the ` \d `. It matches any character that isn’t a decimal digit

## 4. ` \s ` and ` \S ` : 

# difflib

In [3]:
import difflib

closest_match = difflib.get_close_matches('ot', ['male','female','other'], n=1, cutoff=0.5)
print(closest_match)

['other']
