# Regular Expressions "*cheat sheet*" for understanding
    
[Cool internet site to try out regular expressions](https://regex101.com/)

    have a variety of use cases including:

    - validating user input in HTML forms
    - verifying and parsing text in files, code and applications
    - examining test results
    - finding keywords in emails and web pages

##### Regular expressions operate by moving character by character, from left to right, through a piece of text. When the regular expression finds a character that matches the first piece of the expression, it looks to find a continuous sequence of matching characters.


MENU: 
- 1 Literals
- 2 Alternation OR
- 3 Character Sets
- 4 Wild for Wildcards
- 5 Ranges
- 6 Shorthand Character Classes
- 7 Grouping
- 8 Quantifiers - Fixed
- 9 Anchors
- 10 Review


# 1) Literals

* The regex "a", for example, will match the text "a",   
  and the regex "bananas" will match the text "bananas".
  
  
* The regex "3" will match the "3" in the piece of text "34",   
  and the regex "5 gibbons" will completely match the text "5 gibbons"!

# 2) Alternation OR

* '|' symbol = OR   
  Allows to match either the characters preceding the | OR the characters after the |.  
 >ex:  The regex "baboons|gorillas"   
 will match  
 "**baboons**" in the text "I love baboons" and  "**gorillas**" in the text "I love gorillas".  
 
 >ex: "cat|dog"  
 will match  
 "**cat**" and "**dog**"

# 3) Character Sets

* Denoted by a pair of brackets "**[ ]**" :  
  Let us match one character from a series of characters, allowing for matches with incorrect or different spellings
  
> ex : The regex _con[sc]en[sc]us_  
will match:  
consensus (correct), _concensus_, _consencus_, and _concencus_ (three incorrect spellings)

> ex: regex "_[cat]_"  
will match the characters:  
"*c*", "*a*", or "*t*", but NOT the text "_cat_". 

> ex: regex "**[chr]** at"  
will match:   
"cat", "hat", "rat" 
     

* Carrot "**^**" symbol: called *negated character sets*  
  Placed at the front of a character set, the "^" negates the set, matching any character that is not stated. 
>ex: the regex "[^cat]"  
will match:  
any character that is not "c", "a", or "t", and would completely match each character "d", "o" or "g"

# 4) Wild for Wildcards

They are useful when we do not care about the specific value of a character, but only that a character exists!  
Wildcards will match any single character (letter, number, symbol or whitespace) in a piece of text

* **Wildcards** = "**.**" (so points actuyally)

> ex : Let’s say we want to match any 9-character piece of text.   
The regex "**.........**" (len = 9)  will completely match:  
"**orangutan**" and "**marsupial**" (both len = 9).    

> ex:  
Regex "I ate  . bananas"    
will completely match both:  
"I ate  **3** bananas" and "I ate **8** bananas"!

* **Escape** wildcard by using " **\** "
>ex: " .... **\** . "  
will match:  
"*bear*", "*lion*", "*orca*"  
but not : 
"*mous*e", "*koal*a", "*snai*l" (the last character escapes)

# 5) Ranges

Allow us to specify a range of characters in which we can make a match without having to type out each individual character
* **-** character allows us to specify that we are interested in matching a range of characters, we can match any:
   * single capital letter with the regex **[A-Z]**, 
   * lowercase letter with the regex **[a-z]**, 
   * any digit with the regex **[0-9]**
   * capital or lowercase alphabetical character, we can use the regex **[A-Za-z]**  
   
   
* with **[ ]** we only match one character.

> ex: **[a-c]**  
will match to:  
"**a**", "**b**" or "**c**"

> ex: The regex "I adopted **[2-9] [b-h]** ats"  
will match the text:  
"**I adopted 4 bats**" as well as "**I adopted 8 cats**" and even "**I adopted 5 hats**"

> ex: **[c-k][l-u][b-k]**  
will match to  text:  
**cub**, **dog**, **elk**

# 6) Shorthand Character Classes

While character ranges are extremely useful, they can be cumbersome to write out every single time you want to match common ranges such as those that designate alphabetical characters or digits.  
**->** there are shorthand character classes that represent common ranges, and they make writing regular expressions much simpler.  
These shorthand classes include:

* **shorthand character classes**
   * **\w** = **“word character”** class represents : regex range **[A-Za-z0-9_]**,    
       it matches a single uppercase character, lowercase character, digit or underscore
   * **\d** =  **“digit character”** class represents: regex range **[0-9]**,  
       it matches a single digit character
   * **\s** = **“whitespace character”** class represents : regex range **[ \t\r\n\f\v]**,  
       it matches a single space, tab, carriage return, line break, form feed, or vertical tab*
       
> ex: the regex **\d\s\w\w\w\w\w\w\w**  
matches:  
a digit character, followed by a whitespace character,  followed by 7 word characters = matches the text: "**3 monkeys**"

* **Negated Shorthand Character** classes.  
    These shorthands will match any character that is NOT in the regular shorthand classes.
    * **\W** = **“non-word character”** class represents: regex range **[^A-Za-z0-9_]**,   
      it matches any character that is not included in the range represented by "\w"
    * **\D** = **“non-digit character”** class represents the regex range **[^0-9]**,   
      it matches any character that is not included in the range represented by "\d"
    * **\S** = **“non-whitespace character”** class represents the regex range **[^ \t\r\n\f\v]**,    
      it matches any character that is not included in the range represented by "\s"


# 7) Grouping

Grouping, denoted with the open parenthesis and the closing parenthesis **( )**, lets us group parts of a regular expression together, and allows us to limit alternation to part of the regex.  
For example: **I love baboons|gorillas**,  would completely match the string **I love baboons**, but would not match **I love gorillas**, and would instead match **gorillas**.  
**=> so we need grouping:**

> ex: "**I love (baboons|gorillas)**"   
will match the text:  
"**I love**" + either "**baboons**" OR "**gorillas**"

> ex: "**(puppies|kitty cats) are my favorite!**"  
will match the text:    
"**puppies are my favorite!**" and "**kitty cats are my favorite!**"

# 8) Quantifiers - Fixed

Denote the quantity of characters we want to match
*  **Fixed quantifiers** "**{ }**": indicate the exact *quantity of a character* we wish to match, or allow us to provide a *quantity range* to match on
    * **\w{3}** will match exactly 3 word characters
    * **\w{4,7}** will match at minimum 4 word characters and at maximum 7 word characters
    
Note : quantifiers are considered to be greedy: they will match the greatest quantity of characters they possibly can. For example, the regex **mo{2,4}** will match the text **moooo** in the string "**moooo**", and not return a match of **moo** or **mooo**.
>ex : regex **roa{3}r**  
will match   
**roaaar** (the characters **ro** + **followed by 3 as** + **r**)  

>ex : regex **roa{3,7}r**  
will match:  
the strings **roaaar**, **roaaaaar** and **roaaaaaaar** (the characters **ro** + **at least 3 as** + **at most 7 as** + **r**)

>ex: **squea{3,5}k**  
will match:    
*squeaaak*, *squeaaaak*, *squeaaaaak*


* **Optional quantifiers**: use **"?"**   
Allow us to indicate a character in a regex is optional, or can appear either 0 times or 1 time.

>ex: the regex "**humou?r**"   
matches the characters:  
*humo* +  either 0 occurrences or 1 occurrence of the letter u + r  
(only applies to the character directly before it).

>ex: The regex **The monkey ate a (rotten)? banana**   
will completely match:  
both "*The monkey ate a rotten banana*" and "*The monkey ate a banana*"

>ex: in order to match a question mark *?* in a piece of text we need to use the escape character in our regex.   
The regex:
**Aren't owl monkeys beautiful\?**  
will completely match:    
the text:  _Aren't owl monkeys beautiful?_ (_?_ is included)

>ex: **ho{2,}t**:  
_hoot_, _hoooooot_, _hooooooooooot_




* **Kleene star** : "*****"   
  matches the preceding character 0 or more times.   
  This means that the character _doesn’t need_ to appear, can appear once, or can appear several times.

>ex: The regex **meo*w**    
will match the characters: *me* + followed by 0 or more _os_ + _w_.  
thus match the text:  *mew*, *meow*, *meooow*, and *meoooooooooooow*.


* **Kleene plus**  **+**:   
    which matches the preceding character 1 or more times

>ex: The regex **meo+w**   
will match the characters _me_ + followed by 1 or more _os_ + _w_.   
Thus the regex will match  
_meow_, _meooow_, and _meoooooooooooow_, but NOT match _mew_

>ex: Like all the other metacharacters, in order to match the symbols: "* " and  "+ "  
you need to use the **escape character** "**\\**" in your regex.   
The regex:  "_My cat is a \\*_" will completely match the text "_My cat is a *_".

# 9) Anchors

They are delimiters which will be in the beginning ( ^ ) and at the end of a sentence or text ( \\ ).   
The pattern : ^ bla bla bla \


ex : **"\^Monkeys: my mortal enemy\\$"**  (with no space between the special anchors signs)

will completely match the text:  
_Monkeys: my mortal enemy_  
but NOT match:  _Spider Monkeys: my mortal enemy in the wild_ or _Squirrel Monkeys: my mortal enemy in the wild_


( WHITHOUT Anchors :   
**Monkeys: my mortal enemy**   
will match the text in both:  
_Spider Monkeys: my mortal enemy in the wild_ and _Squirrel Monkeys: my mortal enemy in the wild_)

# Review


* Regular expressions are special sequences of characters that describe a pattern of text that is to be matched


* We can use literals to match the exact characters that we desire


* Alternation, using the pipe symbol |, allows us to match the text preceding or following the |


* Character sets, denoted by a pair of brackets [], let us match one character from a series of characters


* Wildcards, represented by the period or dot ., will match any single character (letter, number, symbol or whitespace)


* Ranges allow us to specify a range of characters in which we can make a match


* Shorthand character classes like \w, \d and \s represent the ranges representing word characters, digit characters, and whitespace characters, respectively


* Groupings, denoted with parentheses (), group parts of a regular expression together, and allows us to limit alternation to part of a regex


* Fixed quantifiers, represented with curly braces {}, let us indicate the exact quantity or a range of quantity of a character we wish to match


* Optional quantifiers, indicated by the question mark ?, allow us to indicate a character in a regex is optional, or can appear either 0 times or 1 time


* The Kleene star, denoted with the asterisk *, is a quantifier that matches the preceding character 0 or more times


* The Kleene plus, denoted by the plus +, matches the preceding character 1 or more times


* The anchor symbols hat ^ and dollar sign $ are used to match text at the start and end of a string, respectively