### Disclaimer
Please note, the Vantage Functions via SQLAlchemy feature is a preview/beta code release with limited functionality (the “Code”). As such, you acknowledge that the Code is experimental in nature and that the Code is provided “AS IS” and may not be functional on any machine or in any environment. TERADATA DISCLAIMS ALL WARRANTIES RELATING TO THE CODE, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES AGAINST INFRINGEMENT OF THIRD-PARTY RIGHTS, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

TERADATA SHALL NOT BE RESPONSIBLE OR LIABLE WITH RESPECT TO ANY SUBJECT MATTER OF THE CODE UNDER ANY CONTRACT, NEGLIGENCE, STRICT LIABILITY OR OTHER THEORY 
    (A) FOR LOSS OR INACCURACY OF DATA OR COST OF PROCUREMENT OF SUBSTITUTE GOODS, SERVICES OR TECHNOLOGY, OR 
    (B) FOR ANY INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO LOSS OF REVENUES AND LOSS OF PROFITS. TERADATA SHALL NOT BE RESPONSIBLE FOR ANY MATTER BEYOND ITS REASONABLE CONTROL.

Notwithstanding anything to the contrary: 
    (a) Teradata will have no obligation of any kind with respect to any Code-related comments, suggestions, design changes or improvements that you elect to provide to Teradata in either verbal or written form (collectively, “Feedback”), and 
    (b) Teradata and its affiliates are hereby free to use any ideas, concepts, know-how or techniques, in whole or in part, contained in Feedback: 
        (i) for any purpose whatsoever, including developing, manufacturing, and/or marketing products and/or services incorporating Feedback in whole or in part, and 
        (ii) without any restrictions or limitations, including requiring the payment of any license fees, royalties, or other consideration. 

In [1]:
# In this notebook we shall cover examples for following Regular Expression Functions:
# SQL Documentation: https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/c2fX4dzxCcDJFKqXbyQtTA
    # 1. REGEXP_SUBSTR
    # 2. REGEXP_REPLACE
    # 3. REGEXP_INSTR
    # 4. REGEXP_SIMILAR

In [2]:
# Get the connection to the Vantage using create_context()
from teradataml import *
import getpass
td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))

# Load the example dataset.
load_example_data("GLM", ["admissions_train"])

Hostname: ········
Username: ········
Password: ········


In [3]:
# Create the DataFrame on 'admissions_train' table
admissions_train = DataFrame("admissions_train")
admissions_train

   masters   gpa     stats programming  admitted
id                                              
22     yes  3.46    Novice    Beginner         0
36      no  3.00  Advanced      Novice         0
15     yes  4.00  Advanced    Advanced         1
38     yes  2.65  Advanced    Beginner         1
5       no  3.44    Novice      Novice         0
17      no  3.83  Advanced    Advanced         1
34     yes  3.85  Advanced    Beginner         0
13      no  4.00  Advanced      Novice         1
26     yes  3.57  Advanced    Advanced         1
19     yes  1.98  Advanced    Advanced         0

In [4]:
def print_variables(df, columns):
    print("Equivalent SQL: {}".format(df.show_query()))
    print("\n")
    print(" ************************* DataFrame ********************* ")
    print(df)
    print("\n\n")
    print(" ************************* DataFrame.dtypes ********************* ")
    print(df.dtypes)
    print("\n\n")
    if isinstance(columns, str):
        columns = [columns]
    for col in columns:
        coltype = df.__getattr__(col).type
        if isinstance(coltype, sqlalchemy.sql.sqltypes.NullType):
            coltype = "NullType"
        print(" '{}' Column Type: {}".format(col, coltype))

In [5]:
# Import func from SQLAlchemy to use the same for executing built-in functions in Teradata
from sqlalchemy import func

In [6]:
# Before we move on with examples, one should read below just to understand how teradataml DataFrame and 
# it's columns are used to create a SQLAlchemy ClauseElement/Expression.

# Often in below examples one would see something like this: 'admissions_train.admitted.expression'
# Here in the above expression,
#    'admissions_train' is 'teradataml DataFrame'
#    'admitted' is 'column name' in teradataml DataFrame 'admissions_train'
#    Thus, 
#        'admissions_train.admitted' together forms a ColumnExpression.
#    expression allows us to use teradata ColumnExpression to be treated as SQLAlchemy Expression.
#    Thus,
#        'admissions_train.admitted.expression' gives us an expression that can be used with SQLAlchemy clauseElements.

## REGEXP_SUBSTR Function

In [7]:
# Extracts a substring from source_string that matches a regular expression specified by regexp_string.
# Syntax:
#        REGEXP_SUBSTR(source_string/column, regexp_string, position_arg, occurrence_arg, match_arg)
#        where,
#              TODO

In [8]:
regex_substr_ = func.REGEXP_SUBSTR(admissions_train.stats.expression, "vice", 1, 1, 'c')
type(regex_substr_)

sqlalchemy.sql.functions.Function

In [9]:
regex_substr_df = admissions_train.assign(regex_substr_col=regex_substr_)
print_variables(regex_substr_df, "regex_substr_col")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, REGEXP_SUBSTR(stats, 'vice', 1, 1, 'c') AS regex_substr_col from "admissions_train"


 ************************* DataFrame ********************* 
   masters   gpa     stats programming  admitted regex_substr_col
id                                                               
15     yes  4.00  Advanced    Advanced         1             None
7      yes  2.33    Novice      Novice         1             vice
22     yes  3.46    Novice    Beginner         0             vice
17      no  3.83  Advanced    Advanced         1             None
13      no  4.00  Advanced      Novice         1             None
38     yes  2.65  Advanced    Beginner         1             None
26     yes  3.57  Advanced    Advanced         1             None
5       no  3.44    Novice      Novice         0             vice
34     yes  3.85  Advanced    Beginner         0             No

## REGEXP_REPLACE Function

In [10]:
# Replaces portions of source_string that match regexp_string with the replace_string.
# Syntax:
#        REGEXP_REPLACE(source_string/column, regexp_string, replace_string, position_arg, occurrence_arg, match_arg)
#        where,
#              TODO

In [11]:
regex_replace_ = func.REGEXP_REPLACE(admissions_train.stats.expression, "vice", "wUcMe", 1, 1, 'c')
type(regex_replace_)

sqlalchemy.sql.functions.Function

In [12]:
regex_replace_df = admissions_train.assign(regex_replace_col=regex_replace_)
print_variables(regex_replace_df, "regex_replace_col")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, REGEXP_REPLACE(stats, 'vice', 'wUcMe', 1, 1, 'c') AS regex_replace_col from "admissions_train"


 ************************* DataFrame ********************* 
   masters   gpa     stats programming  admitted regex_replace_col
id                                                                
22     yes  3.46    Novice    Beginner         0           NowUcMe
36      no  3.00  Advanced      Novice         0          Advanced
15     yes  4.00  Advanced    Advanced         1          Advanced
38     yes  2.65  Advanced    Beginner         1          Advanced
5       no  3.44    Novice      Novice         0           NowUcMe
17      no  3.83  Advanced    Advanced         1          Advanced
34     yes  3.85  Advanced    Beginner         0          Advanced
13      no  4.00  Advanced      Novice         1          Advanced
26     yes  3.57  Advanced    Advanced    

## REGEXP_INSTR function

In [13]:
# Searches source_string for a match to regexp_string.
# Syntax:
#        REGEXP_INSTR(source_string/column, regexp_string, position_arg, occurrence_arg, return_opt, match_arg)
#        where,
#              TODO

In [14]:
regex_instr_ = func.REGEXP_INSTR(admissions_train.stats.expression, "vice", 1, 1, 1, 'c')
type(regex_instr_)

sqlalchemy.sql.functions.Function

In [15]:
regex_instr_df = admissions_train.assign(regex_instr_col=regex_instr_)
print_variables(regex_instr_df, "regex_instr_col")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, REGEXP_INSTR(stats, 'vice', 1, 1, 1, 'c') AS regex_instr_col from "admissions_train"


 ************************* DataFrame ********************* 
   masters   gpa     stats programming  admitted  regex_instr_col
id                                                               
15     yes  4.00  Advanced    Advanced         1                0
7      yes  2.33    Novice      Novice         1                7
22     yes  3.46    Novice    Beginner         0                7
17      no  3.83  Advanced    Advanced         1                0
13      no  4.00  Advanced      Novice         1                0
38     yes  2.65  Advanced    Beginner         1                0
26     yes  3.57  Advanced    Advanced         1                0
5       no  3.44    Novice      Novice         0                7
34     yes  3.85  Advanced    Beginner         0              

## REGEXP_SIMILAR Function

In [16]:
# Compares source_string to regexp_string and returns integer value.
# Syntax:
#        REGEXP_SIMILAR(source_string/column, regexp_string, match_arg)
#        where,
#              TODO

In [17]:
regex_similar_ = func.REGEXP_SIMILAR(admissions_train.stats.expression, "ce", 'c')
type(regex_similar_)

sqlalchemy.sql.functions.Function

In [18]:
regex_similar_df = admissions_train.assign(regex_similar_col=regex_similar_)
print_variables(regex_similar_df, "regex_similar_col")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, REGEXP_SIMILAR(stats, 'ce', 'c') AS regex_similar_col from "admissions_train"


 ************************* DataFrame ********************* 
   masters   gpa     stats programming  admitted  regex_similar_col
id                                                                 
22     yes  3.46    Novice    Beginner         0                  0
36      no  3.00  Advanced      Novice         0                  0
15     yes  4.00  Advanced    Advanced         1                  0
38     yes  2.65  Advanced    Beginner         1                  0
5       no  3.44    Novice      Novice         0                  0
17      no  3.83  Advanced    Advanced         1                  0
34     yes  3.85  Advanced    Beginner         0                  0
13      no  4.00  Advanced      Novice         1                  0
26     yes  3.57  Advanced    Advanced         1 

In [19]:
## REGEXP_SPLIT_TO_TABLE Function - Its a table function, so we cannot use the same in assign.

In [20]:
# One must run remove_context() to close the connection and garbage collect internally generated objects.
remove_context()

True