Skip to content

mortie23/python-teradata

Repository files navigation

Reading and Writing data between Python and Teradata

Local Python environment setup

For this run through I am using:

  • Python 3.8.10
  • installed on Windows Subsystem for Linux

Create a new virtual environment and install the requirements.

python3 -m venv tdsql
pip install -r requirements.txt
python3 -m ipykernel install --user --name tdsql --display-name tdsql

Teradata

I am using a Teradata Express 16.20 virtual machine running with VMWare Workstation Player 16.2.5. Our hosts file is mapping the virtual machine's IP address to tdvm

cat /etc/hosts
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateHosts = false
127.0.0.1       localhost
192.168.190.128 tdvm

Setup database structures

This step is oviously not required, however it is here for completeness sake. Running the teradata-setup.sql script to create the required database and provide access to the appropriate user.

-- Create Database
-- Assumes you have a production analytical datastore (ADS)
CREATE DATABASE PRD_ADS_PYTHON FROM PRD_ADS AS
   PERM = 0
   NO FALLBACK
   NO BEFORE JOURNAL
   NO AFTER JOURNAL
;
COMMENT ON PRD_ADS_PYTHON AS 'Python development' ;
GRANT ALL ON PRD_ADS_PYTHON TO PRDDBA WITH GRANT OPTION ;

CREATE DATABASE PRD_ADS_PYTHON_NFL_DB FROM PRD_ADS_PYTHON AS
   PERM = 0
   NO FALLBACK
   NO BEFORE JOURNAL
   NO AFTER JOURNAL
;
COMMENT ON PRD_ADS_PYTHON_NFL_DB AS 'NFL Data loaded using Python' ;
GRANT ALL ON PRD_ADS_PYTHON_NFL_DB TO PRDDBA WITH GRANT OPTION ;

-- Give Space
-- Python
CREATE DATABASE GS_TEMP FROM PRD_ADS AS
PERM = 1E9
;
GIVE GS_TEMP TO PRD_ADS_PYTHON
;
DROP DATABASE GS_TEMP
;
-- Python NFL
CREATE DATABASE GS_TEMP FROM PRD_ADS_PYTHON AS
PERM = 5E6
;
GIVE GS_TEMP TO PRD_ADS_PYTHON_NFL_DB
;
DROP DATABASE GS_TEMP
;

-- Users
CREATE USER MORTCH FROM PRD_USERS AS
  PASSWORD = MORTCH
  PERM = 0
  NO Fallback
  NO Before Journal NO After Journal
  DEFAULT ROLE = ALL
;

-- Roles
CREATE ROLE R_PYTHON_PRD;
GRANT R_PYTHON_PRD TO MORTCH;

-- Python
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, DROP TABLE ON PRD_ADS_PYTHON_NFL_DB TO R_PYTHON_PRD ;
GRANT SELECT, CREATE VIEW, DROP VIEW ON PRD_ADS_PYTHON_NFL_DB TO R_PYTHON_PRD ;

-- Test
CREATE TABLE PRD_ADS_PYTHON_NFL_DB.TEST (
  COL1 INTEGER
)
PRIMARY INDEX(COL1)
;
INSERT INTO PRD_ADS_PYTHON_NFL_DB.TEST VALUES(1);

Setup the table structures

Firstly test connecting to it from Teradata Studio IDE:

Runing this (python-nfl-ddl.sql) SQL script in you favourite SQL IDE (for example Teradata Studio).

Python execution in VSCode

I'm going to using Python interactive in VSCode to run this. In this mode we can run the "Cells" (blocks of code between # %% seperately and view tabular results in a more visual way).

For example, running the python-teradata.sql script up until this point and then in the interactive window typeing table and hitting shift-enter.

# %%
query = f"""
select
  TableName
  , count(*) as ColumnCount
from
  dbc.columnsv
where
  databasename='{database}'
group by
  TableName
order by
  TableName
"""
tables = pd.read_sql(query, con)

Loading data from CSV

The key function that loads data to a table from a CSV file is here:

def insert_rows_from_csv(
    tablename: str,
    columncount: int,
):
    """Insert rows from a CSV into a Teradat table

    Args:
        tablename (str): the name of the table
        columncount (int): the number of columns in the table
    """
    columncount_questionmark = ", ?" * (columncount - 1)
    exec_string = f"{{fn teradata_read_csv(./data/{tablename.lower()}.csv)}}insert into {tablename} (?{columncount_questionmark})"
    cur.execute(exec_string)

The script then has a function that loops over all the tables and runs the insert_rows_from_csv function. After the loads have completed, running a select using the Pandas read_sql method demonstrates that we have sucessfully loaded the tables from the CSV files.

Cleaning up

We also have a script to truncate all the rows from the tables.

# Truncate all tables
table_load = table_load.sort_values(by="order", ascending=False)
for index, row in table_load.iterrows():
    print(row["TableName"])
    cur.execute(f"delete from {row['TableName']}")

About

Testing reading writing data from Python to Teradata

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages