# Regular Expressions for strings
## What Is It
> A regular expression is a pattern used to match text. It can be made up of literal characters, operators, and other constructs.

\- from https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_regular_expressions

## What Is It Not
- wildcarding or globbing; `*.ps1` is an valid wildcard string, but an invalid regular expression pattern

## When To Use
- pattern matching for name / custom string
    - ensure username / account name follows expected pattern
    - filtering filesystem object names
    - matching involving special characters (`n, for example)
- substitutions
    - string manipulation/replacement operations

## When Not To Use
- most any time else; some examples:
    - IP address (use `[System.Net.IPAddress]`)
    - email address (use `[System.Net.Mail.MailAddress]`)
    - URL inspection (use `[System.Uri]`)
    - data lookup (existing service? https://www.abstractapi.com/api/phone-validation-api, etc)
    - rich object (non-string) comparison / management (native objects in the session, or created from serialized notations like JSON or YAML or XML or...))

## Some Deets (How)
- Character
    - literals -- literal value of the character
    - groups -- vowels `[aeiou]`, Ps and Qs `[pq]`
    - ranges -- some digits `[3-9]`, some letters `[m-s]`
    - classes -- digit `\d`, word `\w`, non-word `\W`, whitespace `\s`, non-whitespace `\S`
    - the "any" character: `.`
- Quantifier
    - `*` Zero or more times
    - `+` One or more times
    - `?` Zero or one time
    - `{n}` Exactly `n` times
    - `{n,m}` At least `n`, but no more than `m` times
    - much more
- Groups, Captures, and Substitutions
    - like butterflies or Pokémon -- gotta capture 'em sometimes for later enjoyment
- more in the docs!
    - single line versus multiline matching
    - greedy versus lazy quantifiers
    - see .NET docs https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

## Examples
### Use Regular Expressions
#### Basics
Some pattern matching, showing use of characters, classes, quantifiers, etc.

In [38]:
## some tests
$arrInputText = Write-Output "Coolio" "187 Undercover Cop" "PowerShell for the win!" "Apt. 213" "Jupyter Notebook"
([ordered]@{
    "oo" = "Literal"
    "ll" = "Literal"
    "\s" = "Whitespace"
    "\d" = "Digit"
    "." = "Any"
    "\." = "Literal"
    "o{2}" = "Quantifier"
    "[IRL]" = "CharacterGroup"
    "[j-p]" = "Range"
    "[x-za-d]" = "Range"
    "\w\s" = "Class"
    "((\w+)\s){2}" = "Quantifiers"
    "((\w+)\s?){2}" = "Quantifiers"
    "((\w+)\s){2,}" = "Quantifiers"
    "^[a-c]" = "AnchorNGroup"
}).GetEnumerator() | Foreach-Object {
    New-Object -Type PSObject -Property ([ordered]@{
        ExampleKind = $_.Value
        InputObject = $strInputText = $arrInputText | Get-Random
        Pattern = $_.Name
        DoesMatch = $strInputText -match $_.Name
    })
}


[32;1mExampleKind    InputObject             Pattern       DoesMatch[0m
[32;1m-----------    -----------             -------       ---------[0m
Literal        Jupyter Notebook        oo                 True
Literal        187 Undercover Cop      ll                False
Whitespace     Jupyter Notebook        \s                 True
Digit          Coolio                  \d                False
Any            Jupyter Notebook        .                  True
Literal        Coolio                  \.                False
Quantifier     Coolio                  o{2}               True
CharacterGroup PowerShell for the win! [IRL]              True
Range          Jupyter Notebook        [j-p]              True
Range          Coolio                  [x-za-d]           True
Class          Coolio                  \w\s              False
Quantifiers    Coolio                  ((\w+)\s){2}      False
Quantifiers    Apt. 213                ((\w+)\s?){2}      True
Quantifiers    187 Undercover Co

### Beyond Booleans
Some things beyond just returning booleans for matching.

In [28]:
## filtering filesystem object names:
#    normal wildcard
Get-ChildItem -Path PowerShell*


    Directory: C:\temp\GitThings\PowerShellSkills\docs

[32;1mMode                 LastWriteTime         Length Name[0m
[32;1m----                 -------------         ------ ----[0m
-a---           3/24/2022  7:46 AM           2412 PowerShellFileTypes.md
-a---           3/16/2022  8:52 PM           5456 PowerShellModules.md
-a---           7/11/2022  2:25 PM          16546 PowerShellOutputStreamsAndTranscription.ipynb



In [30]:
## filtering filesystem object names:
#    with regular expression character group _and_ wildcard (feature/behavior of PowerShell session); so, "PowerShell" followed by one of the characters in the group
Get-ChildItem -Path PowerShell[fo]*


    Directory: C:\temp\GitThings\PowerShellSkills\docs

[32;1mMode                 LastWriteTime         Length Name[0m
[32;1m----                 -------------         ------ ----[0m
-a---           3/24/2022  7:46 AM           2412 PowerShellFileTypes.md
-a---           7/11/2022  2:25 PM          16546 PowerShellOutputStreamsAndTranscription.ipynb



### Grouping and Captures
If our source of data is some paragraph of non-structured blah that someone transcribed from microfiche and we absolutely have to try to mine data out of it, we may employ some of the other constructs of Regular Expressions like [Grouping Captures](https://learn.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressionshttps://learn.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions).

The gist:  we can use a Grouping construct to "capture" for later reference particuar parts of the input text.  Some examples:

In [43]:
## some unstructured data from which we'll try to mine some data, porcelain style (all fragile)
$strMyMessyInput = @"
Who is it? Bobby Batches
What they do? Cornstruction Worker 🌽
When hired? Jan 22, 1999
"@

$arrMatches = [System.Text.RegularExpressions.Regex]::Matches($strMyMessyInput, "^(?<topic>[^\?]+)\?\s+(?<answer>.+)$", [System.Text.RegularExpressions.RegexOptions]::Multiline)
## just return the match objects
$arrMatches
Write-Verbose -Verbose "Let's see just one named capture group"
($arrMatches | Select-Object -First 1).Groups["topic"]

## try to mine some data from the named capture groups
Write-Verbose -Verbose "And, trying to make a reusable object for later goodness"
$arrMatches | Foreach-Object -Begin {$hshTmpProperties = [ordered]@{}} -Process {
    $hshTmpProperties[$_.Groups["topic"].Value] = $_.Groups["answer"].Value
}
$hshTmpProperties | Format-Table -AutoSize



[32;1mGroups    : [0m{0, topic, answer}
[32;1mSuccess   : [0mTrue
[32;1mName      : [0m0
[32;1mCaptures  : [0m{0}
[32;1mIndex     : [0m0
[32;1mLength    : [0m24
[32;1mValue     : [0mWho is it? Bobby Batches
[32;1mValueSpan : [0m

[32;1mGroups    : [0m{0, topic, answer}
[32;1mSuccess   : [0mTrue
[32;1mName      : [0m0
[32;1mCaptures  : [0m{0}
[32;1mIndex     : [0m25
[32;1mLength    : [0m37
[32;1mValue     : [0mWhat they do? Cornstruction Worker 🌽
[32;1mValueSpan : [0m

[32;1mGroups    : [0m{0, topic, answer}
[32;1mSuccess   : [0mTrue
[32;1mName      : [0m0
[32;1mCaptures  : [0m{0}
[32;1mIndex     : [0m63
[32;1mLength    : [0m24
[32;1mValue     : [0mWhen hired? Jan 22, 1999
[32;1mValueSpan : [0m

[93mVERBOSE: Let's see just one named capture group[0m
[32;1mSuccess   : [0mTrue
[32;1mName      : [0mtopic
[32;1mCaptures  : [0m{topic}
[32;1mIndex     : [0m0
[32;1mLength    : [0m9
[32;1mValue     : [0mWho is it
[32;1mValueSpan : 

In [26]:
## or, see how we can solve in another way, w/o getting to party w RegEx mysteries
$strMyMessyInput | ConvertFrom-StringData -Delimiter ?


[32;1mName                           Value[0m
[32;1m----                           -----[0m
Who is it                      Bobby Batches
What they do                   Cornstruction Worker 🌽
When hired                     Jan 22, 1999



### Use Other Solutions
Some other things that, while have many times employed regular expressions, may be better suited by object models or standard APIs. The first few examples here leverage the existing .NET object model, and methods of corresponding objects to essentially validate if a value is legitimate for the given object type:

#### URI (Uniform Resource Identifier), of which URLs are a subset apparently

In [51]:
## check to see if a value is a legit URI
Write-Output https://coolstuff.com http:\\blahh.com https://github.com/vNugglets/?repoName=vDNetworking /blahh/moreGoodness gopher://old.timey.gopher.server.com/mySchtuff | Foreach-Object {
    New-Object -Type PSObject -Property ([ordered]@{
        InputObject = $_
        IsWellFormedURI = [System.Uri]::IsWellFormedUriString($_, [System.UriKind]::RelativeOrAbsolute)
    })
}


[32;1mInputObject                                         IsWellFormedURI[0m
[32;1m-----------                                         ---------------[0m
https://coolstuff.com                                          True
http:\\blahh.com                                              False
https://github.com/vNugglets/?repoName=vDNetworking            True
/blahh/moreGoodness                                            True
gopher://old.timey.gopher.server.com/mySchtuff                 True



In [67]:
## use the subsequent rich object for accessing particular properties of the given object
$oSomeUri = $null ## var into which to put new URI object if success
Write-Output https://blah.com https:metalmayhem.com/rock https://mylullz.com/😹 https://github.com/vNugglets/?repoName=vDNetworking | Foreach-Object {
    New-Object -Type PSObject -Property ([ordered]@{
        InputObject = $_
        Valid = [System.Uri]::TryCreate($_, [System.UriKind]::Absolute, [ref]$oSomeUri)
        DnsSafeHost = $oSomeUri.DnsSafeHost
        AbsolutePath = $oSomeUri.AbsolutePath
    })
}


[32;1mInputObject                                         Valid DnsSafeHost AbsolutePath[0m
[32;1m-----------                                         ----- ----------- ------------[0m
https://blah.com                                     True blah.com    /
https:metalmayhem.com/rock                          False             
https://mylullz.com/😹                               True mylullz.com /%F0%9F%98%B9
https://github.com/vNugglets/?repoName=vDNetworking  True github.com  /vNugglets/



In [65]:
## and, can use the objects to make other rich objects, vs. RegEx "fun"
([System.Uri]"https://github.com/vNugglets/?repoName=vDNetworking&lastCommit=20221225").Query.Trim("?").Split("&") | ConvertFrom-StringData


[32;1mName                           Value[0m
[32;1m----                           -----[0m
repoName                       vDNetworking
lastCommit                     20221225



#### Email address
There's a class for that!

In [86]:
## see if these are of legit email address format
$oSomeEmailAddr = $null
Write-Output dickie@pants.com k.windstein@none@heavier.com onramp@pshellsummit.org | Foreach-Object {
    New-Object -Type PSObject -Property ([ordered]@{
        InputObject = $_
        Valid = [System.Net.Mail.MailAddress]::TryCreate($_, [ref]$oSomeEmailAddr)
        User = $oSomeEmailAddr.User
        Host = $oSomeEmailAddr.Host
        ValidHost = if ($oSomeEmailAddr) {$null -ne $(try {Resolve-DnsName -Name $oSomeEmailAddr.Host -ErrorAction:Stop} catch {})} else {$false}
    })
}


[32;1mInputObject : [0mdickie@pants.com
[32;1mValid       : [0mTrue
[32;1mUser        : [0mdickie
[32;1mHost        : [0mpants.com
[32;1mValidHost   : [0mTrue

[32;1mInputObject : [0mk.windstein@none@heavier.com
[32;1mValid       : [0mFalse
[32;1mUser        : [0m
[32;1mHost        : [0m
[32;1mValidHost   : [0mFalse

[32;1mInputObject : [0monramp@pshellsummit.org
[32;1mValid       : [0mTrue
[32;1mUser        : [0monramp
[32;1mHost        : [0mpshellsummit.org
[32;1mValidHost   : [0mFalse




#### IP Address
And, we know, there's a class for that, too. No need to mess w `/([0-2]\d{1,2}\.)?(\d{1,3}\.?){3}/`, "oh, wait, that allows 291.333.333.333, now what, oh wait, that's also only IPv4" hassle..

In [95]:
## see if these are of IP address format; 
$oSomeIPAddr = $null

Write-Output 10.2.3.4 (254..256 | Foreach-Object {"10.0.0.$_"}) 40.37 172.3.3 fec0:0:0:ffff::1 | Foreach-Object {$_} | Foreach-Object {
    New-Object -Type PSObject -Property ([ordered]@{
        InputObject = $_
        Valid = [System.Net.IpAddress]::TryParse($_, [ref]$oSomeIPAddr)
        AddressFamily = $oSomeIPAddr.AddressFamily
        IPAddressToString = $oSomeIPAddr.IPAddressToString
    })
}


[32;1mInputObject      Valid  AddressFamily IPAddressToString[0m
[32;1m-----------      -----  ------------- -----------------[0m
10.2.3.4          True   InterNetwork 10.2.3.4
10.0.0.254        True   InterNetwork 10.0.0.254
10.0.0.255        True   InterNetwork 10.0.0.255
10.0.0.256       False                
40.37             True   InterNetwork 40.0.0.37
172.3.3           True   InterNetwork 172.3.0.3
fec0:0:0:ffff::1  True InterNetworkV6 fec0:0:0:ffff::1



#### Phone number
Leverage goodness already developed and supported, vs trying to create from scratch!  How: use an API for it! This example uses the free API service from abstractapi.com

In [4]:
$oInvokeRestMethodParams = @{Uri = "https://phonevalidation.abstractapi.com/v1/"; Method = "Get"; Body = @{api_key = "MyApiKeyHere"}}
Write-Output 13172776666 "1 (800) 266-8228" 1900TParkBoys | Foreach-Object {
    $oInvokeRestMethodParams.Body["phone"] = $_
    # Invoke-RestMethod @oInvokeRestMethodParams
    while (-not ($oPhoneInfo = try {Invoke-RestMethod @oInvokeRestMethodParams} catch {})) {Write-Verbose -Verbose "d'oh, exceeded API request rate; sleepy time"; Start-Sleep -Seconds 1}
    $oPhoneInfo
}


[32;1mphone    : [0m13172776666
[32;1mvalid    : [0mTrue
[32;1mformat   : [0m@{international=+13172776666; local=(317) 277-6666}
[32;1mcountry  : [0m@{code=US; name=United States; prefix=+1}
[32;1mlocation : [0mIndiana
[32;1mtype     : [0munknown
[32;1mcarrier  : [0m

[93mVERBOSE: d'oh, exceeded API request rate; sleepy time[0m
[32;1mphone    : [0m1 (800) 266-8228
[32;1mvalid    : [0mTrue
[32;1mformat   : [0m@{international=+18002668228; local=(800) 266-8228}
[32;1mcountry  : [0m@{code=US; name=United States; prefix=+1}
[32;1mlocation : [0m
[32;1mtype     : [0mtoll free
[32;1mcarrier  : [0m

[93mVERBOSE: d'oh, exceeded API request rate; sleepy time[0m
[32;1mphone    : [0m1900TParkBoys
[32;1mvalid    : [0mFalse
[32;1mformat   : [0m@{international=; local=}
[32;1mcountry  : [0m@{name=; code=; prefix=}
[32;1mlocation : [0m
[32;1mtype     : [0m
[32;1mcarrier  : [0m


