# Regular Expressions

| Metacharacter |  Meaning                            |
|:--------------|:------------------------------------|
| .             | Any Character                       |
| \w            | A Word                              |
| \W            | Not a Word                          |
| \d            | A Digit                             |
| \D            | Not a Digit                         |
| \s            | Whitespace                          |
| \S            | Not Whitespace                      |
| [xyz]         | A Set of Caracters                  |
| [^xyz]        | Negation of Set                     |
| [a-z]         | Range of Characters                 |
| ^             | Beggining of String                 |
| $             | End of String                       |
| \n            | Newline                             |
| +             | One or More                         |
| *             | Zero or More                        |
| ?             | Zero or One                         |
| {2}           | Exactly 2                           |
| {2,5}         | Between 2 and 5                     |
| {2,}          | More than 2                         |

## Using grepl() function
- grepl(regex, string)
- Returns TRUE if string contains specified regex

In [1]:
example <- "Maryland"
re <- "a"
grepl(re, example)

grepl("land", "Maryland")

In [2]:
# Case matters 
grepl("Land", "Maryland")

In [3]:
# Can ingnore the case
grepl("Land", "Maryland", ignore.case = TRUE)

## Metacharacters: 
- "." represents any character other than a new line
- "+": one or more of the preceding expression should be present
- "*": zero or more of the preceding expression should be present
- "?": zero or one time of the preceding expression should be present

In [4]:
grepl('.', "Maryland")

In [5]:
# Searching for a pattern in a vector
x <- c("abc", "aab", "abb", "acadb")
grepl("a.b", x)

#### Difference between "+", "*", and "?"

In [6]:
grepl('a+', "Maryland")
grepl('x+', "Maryland")

In [7]:
grepl('a*', "Maryland")
grepl('x*', "Maryland")

In [8]:
grepl('a?', "Maryland")
grepl('x?', "Maryland")

## Metacharacters:
- Specify exact number of expressions using curly brackets {}
- "a{5}" exactly 5 times
- "a{2,5}" between 2 and 5 times
- "a{2,}" at least 2 times

In [9]:
# Exactly 2 s
grepl("s{2}", "Mississippi") 

In [10]:
# Between 2 and 3 s
grepl("s{2,3}", "Mississippi")

In [11]:
# Between 2 and 3 i
grepl("i{2,3}", "Mississippi")

In [12]:
# Exactly 2 iss (Adjacent)
grepl("(iss){2}", "Mississippi")

In [13]:
# Exactly 2 ss (Adjacent)
grepl("(ss){2}", "Mississippi")

In [14]:
# Pattern: i followed by any 2 characters 3 times
grepl("(i.{2}){3}", "Mississippi")

## Metacharacters: 
- \w and \W (includes a number)
- \d and \D
- \s and \S, \n and \N, \t and \T

In [15]:
grepl("\\w", "String")
grepl("\\d", "String")
grepl("\\D", "String")

In [16]:
grepl("\\d", "0123456789")
grepl("\\w", "0123456789")

In [17]:
grepl("\\s", "abc     ")
grepl("\\s", "abc")
grepl("\\s", "\t")
grepl("\\s", "\n")
grepl("\\w", "\n")

## Metacharacters:
- Specify specific character sets using straight brackets []
- Specify only vowels: [aeiou]
- Exclude vowels: [^aeiou] 
- Specify a range: [a-z] or [0-9]

In [18]:
grepl("[aeiou]", "rhythm")

In [19]:
# Exclusion
grepl("[^aeiou]", "rhythm")

In [20]:
# Case matters
grepl("[a-m]", "ABC")

In [21]:
grepl("[a-m]", "ABC", ignore.case = TRUE)
grepl("[a-mA-M]", "ABC")

## Metacharacters:
- Matching beggining of a string with "^"
- Matching the end of a string with "$"

In [22]:
s <- c("bab", "aab")

# Begins with a
grepl("^a", s)

In [23]:
# Ends with b with a
grepl("b$", s)

## Metacharacter OR:
- Matches the expression on the left or on the right side of "|"

In [24]:
s <- c("abc", "bcd", "cde")
grepl("a|b", s)

In [25]:
grepl("North|South", c("South Dakota", "North Carolina", "West Virginia"))

## Searching for Symbols
- Use two backlashes to indicate a symbol that is also a metacharacter

In [26]:
grepl("\\+", "1 + 2 = 5")

In [27]:
# Plus sign followed by some characters then an Equals sign
grepl("\\+.*=", "1 + 2 = 5")

<hr>

## Regex expression to match all State names that start and end with a vowel
- Using built in R dataset state.name
- Vector of Strings for each state

In [28]:
head(state.name, 10)

In [29]:
# Start with lower and upper vowels, then optional characters (many), then ends with lower and upper vowel
re <- "^[aeiouAEIOU].*[aeiouAEIOU]$"
states_lgl <- grepl(re, state.name)
head(states_lgl, 10)

In [30]:
state.name[states_lgl]