### Disclaimer
Please note, the Vantage Functions via SQLAlchemy feature is a preview/beta code release with limited functionality (the “Code”). As such, you acknowledge that the Code is experimental in nature and that the Code is provided “AS IS” and may not be functional on any machine or in any environment. TERADATA DISCLAIMS ALL WARRANTIES RELATING TO THE CODE, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES AGAINST INFRINGEMENT OF THIRD-PARTY RIGHTS, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

TERADATA SHALL NOT BE RESPONSIBLE OR LIABLE WITH RESPECT TO ANY SUBJECT MATTER OF THE CODE UNDER ANY CONTRACT, NEGLIGENCE, STRICT LIABILITY OR OTHER THEORY 
    (A) FOR LOSS OR INACCURACY OF DATA OR COST OF PROCUREMENT OF SUBSTITUTE GOODS, SERVICES OR TECHNOLOGY, OR 
    (B) FOR ANY INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO LOSS OF REVENUES AND LOSS OF PROFITS. TERADATA SHALL NOT BE RESPONSIBLE FOR ANY MATTER BEYOND ITS REASONABLE CONTROL.

Notwithstanding anything to the contrary: 
    (a) Teradata will have no obligation of any kind with respect to any Code-related comments, suggestions, design changes or improvements that you elect to provide to Teradata in either verbal or written form (collectively, “Feedback”), and 
    (b) Teradata and its affiliates are hereby free to use any ideas, concepts, know-how or techniques, in whole or in part, contained in Feedback: 
        (i) for any purpose whatsoever, including developing, manufacturing, and/or marketing products and/or services incorporating Feedback in whole or in part, and 
        (ii) without any restrictions or limitations, including requiring the payment of any license fees, royalties, or other consideration. 

In [1]:
# Get the connection to the Vantage using create_context()
from teradataml import *
import getpass
td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))

# Load the example dataset.
load_example_data("GLM", ["admissions_train"])

Hostname: ········
Username: ········
Password: ········


In [2]:
# Create the DataFrame on 'admissions_train' table
admissions_train = DataFrame("admissions_train")
admissions_train

   masters   gpa     stats programming  admitted
id                                              
15     yes  4.00  Advanced    Advanced         1
7      yes  2.33    Novice      Novice         1
22     yes  3.46    Novice    Beginner         0
17      no  3.83  Advanced    Advanced         1
13      no  4.00  Advanced      Novice         1
38     yes  2.65  Advanced    Beginner         1
26     yes  3.57  Advanced    Advanced         1
5       no  3.44    Novice      Novice         0
34     yes  3.85  Advanced    Beginner         0
40     yes  3.95    Novice    Beginner         0

In [3]:
# Helper function to print some details of dataframe.
def print_variables(df, columns):
    print("Equivalent SQL: {}".format(df.show_query()))
    print("\n")
    print(" ************************* DataFrame ********************* ")
    print(df)
    print("\n\n")
    print(" ************************* DataFrame.dtypes ********************* ")
    print(df.dtypes)
    print("\n\n")
    if isinstance(columns, str):
        columns = [columns]
    for col in columns:
        coltype = df.__getattr__(col).type
        if isinstance(coltype, sqlalchemy.sql.sqltypes.NullType):
            coltype = "NullType"
        print(" '{}' Column Type: {}".format(col, coltype))

# Using SQLAlchemy ClasueElements in teradataml DataFrame.assign()

In [4]:
# Before we move on with examples, one should read below just to understand how teradataml DataFrame and 
# it's columns are used to create a SQLAlchemy ClauseElement/Expression.

# Often in below examples one would see something like this: 'admissions_train.admitted.expression'
# Here in the above expression,
#    'admissions_train' is 'teradataml DataFrame'
#    'admitted' is 'column name' in teradataml DataFrame 'admissions_train'
#    Thus, 
#        'admissions_train.admitted' together forms a ColumnExpression.
#    expression allows us to use teradata ColumnExpression to be treated as SQLAlchemy Expression.
#    Thus,
#        'admissions_train.admitted.expression' gives us an expression that can be used with SQLAlchemy clauseElements.

## Using SQLAlchemy CASE expression

In [5]:
# Using SQLAlchemy case expression. 
# Return 1 for cases where masters value is "yes", otherwise return 0.
from sqlalchemy import case

# Create case object using SQLAlchemy case
caseobj = case(
    [
        (admissions_train.masters.expression == "yes", 1)
    ],
    else_=0
)
type(caseobj)

sqlalchemy.sql.elements.Case

In [6]:
# Use the Case object in assign() to output the results.
# Example, here also depicts what SQL query will look like for this execution.
df = admissions_train.assign(masters_num=caseobj)
print_variables(df, "masters_num")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, CASE WHEN (masters = 'yes') THEN 1 ELSE 0 END AS masters_num from "admissions_train"


 ************************* DataFrame ********************* 
   masters   gpa     stats programming  admitted masters_num
id                                                          
5       no  3.44    Novice      Novice         0           0
34     yes  3.85  Advanced    Beginner         0           1
13      no  4.00  Advanced      Novice         1           0
40     yes  3.95    Novice    Beginner         0           1
22     yes  3.46    Novice    Beginner         0           1
19     yes  1.98  Advanced    Advanced         0           1
36      no  3.00  Advanced      Novice         0           0
15     yes  4.00  Advanced    Advanced         1           1
7      yes  2.33    Novice      Novice         1           1
17      no  3.83  Advanced    Advanced         1   

## Using SQLAlchemy CAST, Numeric

In [7]:
# Using SQLAlchemy CAST, Numeric, to CAST admitted column to float value.
from sqlalchemy import cast, Numeric
admitted_num = cast(admissions_train.admitted.expression, Numeric(10, 4))
type(admitted_num)

sqlalchemy.sql.elements.Cast

In [8]:
# Use the Cast object in assign() to output the results.
# Example, here also depicts what SQL query will look like for this execution.
df = admissions_train.assign(casted_admitted=admitted_num)
print_variables(df, "casted_admitted")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, CAST(admitted AS NUMERIC(10, 4)) AS casted_admitted from "admissions_train"


 ************************* DataFrame ********************* 
   masters   gpa     stats programming  admitted casted_admitted
id                                                              
15     yes  4.00  Advanced    Advanced         1          1.0000
7      yes  2.33    Novice      Novice         1          1.0000
22     yes  3.46    Novice    Beginner         0           .0000
17      no  3.83  Advanced    Advanced         1          1.0000
13      no  4.00  Advanced      Novice         1          1.0000
38     yes  2.65  Advanced    Beginner         1          1.0000
26     yes  3.57  Advanced    Advanced         1          1.0000
5       no  3.44    Novice      Novice         0           .0000
34     yes  3.85  Advanced    Beginner         0           .0000
40     yes  3.95

## Using DISTINCT

In [9]:
# Using DISTINCT on a single column.
# Example here shows, how to get the distinct values for a single column.
from sqlalchemy import distinct
admitted_dist = admissions_train.admitted.expression.distinct()
type(admitted_dist)

sqlalchemy.sql.elements.UnaryExpression

In [10]:
# Use the UnaryExpression for distinct in DataFrame.assign() to get the DISTINCT values for admitted column.
# Note: 
#      One must use 'drop_columns=True', so that assign executes without any error.
df = admissions_train.assign(drop_columns=True, distinct_admitted=admitted_dist)
print_variables(df, "distinct_admitted")

Equivalent SQL: select DISTINCT admitted AS distinct_admitted from "admissions_train"


 ************************* DataFrame ********************* 
   distinct_admitted
0                  0
1                  1



 ************************* DataFrame.dtypes ********************* 
distinct_admitted    int



 'distinct_admitted' Column Type: INTEGER


In [11]:
# Another example of using UnaryExpression for distinct in DataFrame.assign() 
# to get the DISTINCT values for programming column.
# Note: 
#      One must use 'drop_columns=True', so that assign executes without any error.
df = admissions_train.assign(True, distinct_programming=admissions_train.programming.expression.distinct())
print_variables(df, "distinct_programming")

Equivalent SQL: select DISTINCT programming AS distinct_programming from "admissions_train"


 ************************* DataFrame ********************* 
  distinct_programming
0             Beginner
1             Advanced
2               Novice



 ************************* DataFrame.dtypes ********************* 
distinct_programming    str



 'distinct_programming' Column Type: VARCHAR


In [12]:
# Example, where if drop_columns = False in DataFrame.assign() and we use DISTINCT with it.
# Cannot use DISTINCT with drop_columns argument is assign set to False.
# Query generted will be SELECT col1, col2, DISTINCT col3 .. FROM table of this form and is of incorrect Syntax 
# and SQL does not allow this.
df = admissions_train.assign(distinct_programming=admissions_train.programming.expression.distinct())
print_variables(df, "distinct_programming")

Equivalent SQL: select id AS id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted, DISTINCT programming AS distinct_programming from "admissions_train"


 ************************* DataFrame ********************* 


TeradataMlException: [Teradata][teradataml](TDML_2102) Failed to execute SQL: '[Version 17.0.0.2] [Session 3058] [Teradata Database] [Error 3706] Syntax error: expected something between ',' and the 'DISTINCT' keyword.
 at gosqldriver/teradatasql.(*teradataConnection).formatDatabaseError TeradataConnection.go:1138
 at gosqldriver/teradatasql.(*teradataConnection).makeChainedDatabaseError TeradataConnection.go:1154
 at gosqldriver/teradatasql.(*teradataConnection).processErrorParcel TeradataConnection.go:1217
 at gosqldriver/teradatasql.(*TeradataRows).processResponseBundle TeradataRows.go:1716
 at gosqldriver/teradatasql.(*TeradataRows).executeSQLRequest TeradataRows.go:552
 at gosqldriver/teradatasql.newTeradataRows TeradataRows.go:418
 at gosqldriver/teradatasql.(*teradataStatement).QueryContext TeradataStatement.go:122
 at gosqldriver/teradatasql.(*teradataConnection).QueryContext TeradataConnection.go:2083
 at database/sql.ctxDriverQuery ctxutil.go:48
 at database/sql.(*DB).queryDC.func1 sql.go:1464
 at database/sql.withLock sql.go:3032
 at database/sql.(*DB).queryDC sql.go:1459
 at database/sql.(*Conn).QueryContext sql.go:1701
 at main.goCreateRows goside.go:652
 at main._cgoexpwrap_212fad278f55_goCreateRows _cgo_gotypes.go:357
 at runtime.call64 asm_amd64.s:574
 at runtime.cgocallbackg1 cgocall.go:316
 at runtime.cgocallbackg cgocall.go:194
 at runtime.cgocallback_gofunc asm_amd64.s:826
 at runtime.goexit asm_amd64.s:2361'

## SQLAlchemy EXTRACT expression

In [13]:
from sqlalchemy import extract
load_example_data("dwt", "ville_climatedata")
ville_climatedata = DataFrame("ville_climatedata")
ville_climatedata.columns



['city', 'period', 'temp_f', 'pressure_mbar', 'dewpoint_f']

In [14]:
ville_climatedata

                    period  temp_f  pressure_mbar  dewpoint_f
city                                                         
greenville  20100101 02:00    33.9         1020.1        28.4
greenville  20100101 04:00    33.1         1020.2        28.0
greenville  20100101 05:00    32.7         1020.0        27.9
greenville  20100101 06:00    32.5         1020.4        27.7
greenville  20100101 08:00    32.1         1021.3        27.5
greenville  20100101 09:00    33.8         1021.7        28.3
greenville  20100101 07:00    32.3         1020.8        27.6
greenville  20100101 03:00    33.5         1020.2        28.3
greenville  20100101 01:00    34.4         1020.2        28.7
greenville  20100101 00:00    34.9         1020.6        28.8

In [15]:
# Extract HOURS values from timestamp in period column
extobj = extract('hour', ville_climatedata.period.expression)
type(extobj)

sqlalchemy.sql.elements.Extract

In [16]:
df = ville_climatedata.assign(sqlalchemy_hours=extobj)
print_variables(df, "sqlalchemy_hours")

Equivalent SQL: select city AS city, period AS period, temp_f AS temp_f, pressure_mbar AS pressure_mbar, dewpoint_f AS dewpoint_f, EXTRACT(hour FROM period) AS sqlalchemy_hours from "ville_climatedata"


 ************************* DataFrame ********************* 
                   period  temp_f  pressure_mbar  dewpoint_f sqlalchemy_hours
city                                                                         
knoxville  20100101 02:00    33.7         1019.9        28.1                2
knoxville  20100101 04:00    32.8         1020.0        27.7                4
knoxville  20100101 05:00    32.4         1019.9        27.5                5
knoxville  20100101 06:00    32.2         1020.2        27.4                6
knoxville  20100101 08:00    31.8         1021.1        27.2                8
knoxville  20100101 09:00    33.5         1021.5        28.0                9
knoxville  20100101 07:00    32.0         1020.6        27.3                7
knoxville  20100101 03:00    33.2 

## Using aggregate functions using SQLAlchemy

In [17]:
# Get the count of rows using func.count() in DataFrame.assign().
# Note: 
#      1. One must use 'drop_columns=True', so that assign executes without any error.
#      2. Using groupby() before assign will not have any effec. No grouping is performed.
from sqlalchemy import func
df = ville_climatedata.assign(True, sqlalchemy_hours=func.count())
print_variables(df, "sqlalchemy_hours")

Equivalent SQL: select count(*) AS sqlalchemy_hours from "ville_climatedata"


 ************************* DataFrame ********************* 
   sqlalchemy_hours
0               120



 ************************* DataFrame.dtypes ********************* 
sqlalchemy_hours    int



 'sqlalchemy_hours' Column Type: INTEGER


In [18]:
# One must run remove_context() to close the connection and garbage collect internally generated objects.
remove_context()

True