Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semicolon after INSERT fails on DB2 for IBM i. #63

Closed
rosscoleman opened this issue Jul 26, 2023 · 21 comments
Closed

Semicolon after INSERT fails on DB2 for IBM i. #63

rosscoleman opened this issue Jul 26, 2023 · 21 comments

Comments

@rosscoleman
Copy link

I was originally trying to do bulk inserts with arrow-odbc-py, but the stack traces didn't show me the original database error code and message. So, I switched to Rust, for a simpler reproducible example.

My goal was to get insert_into_table working with DB2 for IBM i. I believe the root cause is that function insert_statement_text hard-codes a semicolon, and the ODBC drivers for DB2 for IBM i does not allow statements to end in a semicolon, nor does it allow a script of statements separated by a semicolon.

Code

For a table named TSTPANDAS (originally testing inserts from Pandas) with columns ROW_NUM and RAND_FLOAT.

Warning: I really don't know Rust, though it's reminding me of C++, Java, and Kotlin. I hacked this together from examples in your docs.

use std::env;
use std::sync::Arc;

use arrow::{
    array::{Float32Array, Int32Array},
    record_batch::RecordBatch,
};
use arrow_array::RecordBatchIterator;
use arrow_odbc::{
    insert_into_table,
    odbc_api::{ConnectionOptions, Environment, Error},
};

fn main() -> Result<(), Error> {
    // If you do not do anything fancy it is recommended to have only one Environment in the
    // entire process.
    let environment = Environment::new()?;

    let user = env::var("db2_user").unwrap_or_default();
    let password = env::var("db2_password").unwrap_or_default();

    // Connect using a DSN. Alternatively we could have used a connection string
    let connection = environment.connect(
        "scupddatawhse_dev",
        &user,
        &password,
        ConnectionOptions::default(),
    )?;

    // https://docs.rs/arrow/latest/arrow/#tabular-representation
    let col_1 = Arc::new(Int32Array::from_iter([1, 2, 3])) as _;
    let col_2 = Arc::new(Float32Array::from_iter([1., 6.3, 4.])) as _;

    let record_batch =
        RecordBatch::try_from_iter([("row_num", col_1), ("rand_float", col_2)]).unwrap();

    // https://docs.rs/arrow-array/44.0.0/arrow_array/trait.RecordBatchReader.html
    let batches: Vec<RecordBatch> = vec![record_batch.clone(), record_batch.clone()];

    let mut reader = RecordBatchIterator::new(batches.into_iter().map(Ok), record_batch.schema());
    let batch_size = 1000;
    insert_into_table(&connection, &mut reader, "TSTPANDAS", batch_size).expect("Inserted into TSTPANDAS");

    Ok(())
}

Output:

rcoleman12@ross-T480-linux:~/git/test_arrow_odbc$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/test_arrow_odbc`
thread 'main' panicked at 'Inserted into TSTPANDAS: PreparingInsertStatement { source: Diagnostics { record: State: 42000, Native error: -104, Message: [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0104 - Token ; was not valid. Valid tokens: <END-OF-STATEMENT>., function: "SQLPrepare" }, sql: "INSERT INTO TSTPANDAS (row_num, rand_float) VALUES (?, ?);" }', src/main.rs:42:74
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@pacman82
Copy link
Owner

Hello @rosscoleman ,

thanks for the great reproducing example. I wonder why you did not see the original database error with arrow-odbc-py. Fatal errors are supposed to be forwarded as exceptions into Python code. You can also see non-fatal errors by enable logging:

https://arrow-odbc.readthedocs.io/en/latest/arrow_odbc.html#arrow_odbc.log_to_stderr

With regards to your error diagnosis. arrow-odbc does indeed generate statments with a semicolon at the end:

format!("INSERT INTO {table} ({columns}) VALUES ({values});")

Easy enough to fix, though! Hardest part is probably getting a reproducing test, so the error does not regress. This would require testing against IBM db2 for the first time. In case you want to contribute: I would not care whether this is tested on level of arrow-odbc or arrow-odbc-py. Otherwise my spare time is spread a bit thin. I can see myself tackling this earliest this or next weekend.

Best, Markus

@rosscoleman
Copy link
Author

I should be able to help with testing from a Linux laptop. I can't give you access to DB2 for IBM i, because I'm using a system from my employer.

There are three databases with the marketing name DB2, but they each have separate code bases. The SQL dialects are very similar if not the same.

  • DB2 on IBM z (mainframe)
  • DB2 on IBM i (midrange, i.e. AS/400) - runs on IBM Power architecture CPUs
  • DB2 for Linux, Unix, Windows

Here's further evidence that it's the semicolon, using pyodbc.

$ python
Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:11) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> cnxn = pyodbc.connect("DSN=scupddatawhse_dev")
>>> crsr = cnxn.cursor()
>>> crsr.execute("INSERT INTO tstpandas (row_num, rand_float) VALUES (?, ?);", (1, 0.1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyodbc.ProgrammingError: ('42000', '[42000] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0104 - Token ; was not valid. Valid tokens: <END-OF-STATEMENT>. (-104) (SQLPrepare)')
>>> crsr.execute("INSERT INTO tstpandas (row_num, rand_float) VALUES (?, ?)", (2, 0.2))
<pyodbc.Cursor object at 0x7fa4faa60f30>

@rosscoleman
Copy link
Author

I will see if I can reproduce the error with DB2 for Linux running on Docker. It's really a separate database from DB2 on IBM i, but it may be similar enough. If that works, I should be able to write a few integration tests in Python and create a pull request.

https://community.ibm.com/community/user/datamanagement/blogs/vijaya-katikireddy/2023/02/04/db2-community

@rosscoleman
Copy link
Author

I am trying this in docker-compose.yaml I think I may have too old of a Docker or docker-compose version, but I wanted to give you an update.

# Docker compose file used for local development

services:
  # Microsoft SQL database used for testing
  mssql:
    image: mcr.microsoft.com/mssql/server:2019-CU5-ubuntu-18.04
    ports:
      - 1433:1433

    environment:
      - MSSQL_SA_PASSWORD=My@Test@Password1
    command: ["/opt/mssql/bin/sqlservr", "--accept-eula", "--reset-sa-password"]
  # DB2 Community Edition
  # https://community.ibm.com/community/user/datamanagement/blogs/vijaya-katikireddy/2023/02/04/db2-community
  # Old location: https://hub.docker.com/r/ibmcom/db2
  # docker run -itd --name mydb2 --privileged=true -p 50000:50000 -e LICENSE=accept -e DB2INST1_PASSWORD=testpassword -e DBNAME=testdb -v $HOME/db2_community:/database icr.io/db2_community/db2:latest
  db2:
    image: icr.io/db2_community/db2:latest
    ports:
      - 50000:50000
    #privileged: true
    environment:
      - LICENSE=accept
      - DB2INST1_PASSWORD=testpassword
      - DBNAME=testdb
    volumes:
      - $HOME/db2_community:/database

@pacman82
Copy link
Owner

pacman82 commented Aug 8, 2023

Thanks @rosscoleman , I actually already managed to successfully start a db2 in a container, the part which is not straight forward for me is to install its ODBC driver in the testing environment. I would need a repeatable way which works in a GitHub action.

Best, Markus

@rosscoleman
Copy link
Author

I'm not finding DB2 (for Linux, Unix, Windows) ODBC drivers in an RPM or DEB repository. It's looking like you need an IBM ID to download drivers. That's an odd choice when the DB2 Community docker image is available without an IBM ID, so I'm wondering if I'm just not looking in the right spot. Even more odd is that the DB2 for IBM i drivers "ibm-iaccess" are in a DEB repository, but there's no open source or freeware server for DB2 on IBM i.

Anyway, I found these two links. If you only want the ODBC and CLI drivers, you can pick one of the smaller bundles.

https://www.ibm.com/docs/en/db2/11.5?topic=dsd-installing-data-server-driver-package-software-linux-unix-operating-systems
https://www.ibm.com/docs/en/db2/11.5?topic=drivers-obtaining-data-server-driver-odbc-cli-software

Actually, the easiest seems to be clicking on the latest fix pack here:
https://www.ibm.com/support/pages/download-fix-packs-version-ibm-data-server-client-packages
then getting to a page like
https://www.ibm.com/support/pages/node/6830885

@rosscoleman
Copy link
Author

rosscoleman commented Aug 23, 2023

I wanted to see how to configure an ODBC connection to DB2 LUW using unixODBC. If that's possible, there ought to be a way to do it in github actions. Might have to put the driver tarball in git LFS.

I'm doing this on a laptop running Ubuntu 22.04. I'm getting stuck with the unixODBC configuration.


DB2 ODBC/CLI driver install

Download from:
https://www.ibm.com/support/pages/download-fix-packs-version-ibm-data-server-client-packages

mkdir -p /opt/ibm/
cp v11.5.8_linuxx64_odbc_cli.tar.gz /opt/ibm/
cd /opt/ibm/
tar xvf v11.5.8_linuxx64_odbc_cli.tar.gz

export PATH=/opt/ibm/odbc_cli/clidriver/bin/:$PATH
export LD_LIBRARY_PATH=/opt/ibm/odbc_cli/clidriver/lib/

DB2 Community on Docker

Now start DB2 LUW in a docker container

docker run -itd --name mydb2 --privileged=true -p 50000:50000 -e LICENSE=accept -e DB2INST1_PASSWORD=testpassword -e DBNAME=testdb -v $HOME/db2_community:/database icr.io/db2_community/db2:latest

DB2 CLI/ODBC config

https://www.ibm.com/docs/en/db2-warehouse?topic=package-configuring

sudo chgrp ross /opt/ibm/odbc_cli/clidriver/cfg/db2dsdriver.cfg
sudo chmod 664 /opt/ibm/odbc_cli/clidriver/cfg/db2dsdriver.cfg

cd $HOME

db2cli writecfg add -dsn db2docker -database testdb -host localhost -port 50000

Now validate

ross@ross-T480-linux:~$ db2cli validate -dsn db2docker -connect -user db2inst1 -passwd testpassword

===============================================================================
Client information for the current copy:
===============================================================================

Client Package Type       : IBM Data Server Driver For ODBC and CLI
Client Version (level/bit): DB2 v11.5.8.0 (s2209201700/64-bit)
Client Platform           : Linux/X8664
Install/Instance Path     : /opt/ibm/odbc_cli/clidriver
DB2DSDRIVER_CFG_PATH value: <not-set>
db2dsdriver.cfg Path      : /opt/ibm/odbc_cli/clidriver/cfg/db2dsdriver.cfg
DB2CLIINIPATH value       : <not-set>
db2cli.ini Path           : /opt/ibm/odbc_cli/clidriver/cfg/db2cli.ini
db2diag.log Path          : /opt/ibm/odbc_cli/clidriver/db2dump/db2diag.log

===============================================================================
db2dsdriver.cfg schema validation for the entire file:
===============================================================================

Success: The system db2dsdriver.cfg schema validation completed successfully without any errors.

===============================================================================
db2cli.ini validation for data source name "db2docker":
===============================================================================

Note: The validation utility could not find the configuration file db2cli.ini. 
The file is searched at "/opt/ibm/odbc_cli/clidriver/cfg/db2cli.ini".

===============================================================================
db2dsdriver.cfg validation for data source name "db2docker":
===============================================================================

[ Parameters used for the connection ]

Keywords                  Valid For     Value
---------------------------------------------------------------------------
DATABASE                  CLI,.NET,ESQL testdb
HOSTNAME                  CLI,.NET,ESQL localhost
PORT                      CLI,.NET,ESQL 50000

===============================================================================
Connection attempt for data source name "db2docker":
===============================================================================

[SUCCESS]

===============================================================================
The validation is completed.
===============================================================================

Now, try a test script


ross@ross-T480-linux:~$ echo "VALUES current_schema;" > db2_test_script.sql
ross@ross-T480-linux:~$ db2cli execsql -dsn db2docker -user db2inst1 -passwd testpassword -inputsql db2_test_script.sql 
IBM DATABASE 2 Interactive CLI Sample Program
(C) COPYRIGHT International Business Machines Corp. 1993,1996
All Rights Reserved
Licensed Materials - Property of IBM
US Government Users Restricted Rights - Use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
> VALUES current_schema;
FetchAll:  Columns: 1
  1 
  DB2INST1
FetchAll: 1 rows fetched.
>

unixODBC config for DB2 LUW

Ok, now, here's where I get stuck

Edit /etc/odbcinst.ini:

[Db2]
Description = Db2 Driver
Driver = /opt/ibm/odbc_cli/clidriver/lib/libdb2o.so
fileusage=1
dontdlclose=1

https://www.ibm.com/docs/en/db2/11.5?topic=keywords-dbalias
https://www.ibm.com/docs/en/db2/11.5?topic=file-data-server-driver-configuration-example

$HOME/.odbc.ini

[db2dockerodbc]
DRIVER=DB2
DBAlias=db2docker

However, this fails

ross@ross-T480-linux:~$ isql -v db2dockerodbc db2inst1 testpassword
[     ][unixODBC][IBM][CLI Driver] SQL1531N  The connection failed because the name specified with the DSN connection string keyword could not be found in either the db2dsdriver.cfg configuration file or the db2cli.ini configuration file.  Data source name specified in the connection string: "DB2DOCKERODBC".

[ISQL]ERROR: Could not SQLConnect

@rosscoleman
Copy link
Author

@rosscoleman
Copy link
Author

rosscoleman commented Aug 24, 2023

I figured it out myself! Pasting here, in case my StackOverflow question gets deleted by zealous mods.


I found what I was doing wrong! DBAlias is a keyword for db2cli.ini, and it doesn't go in .odbc.ini

Instead, you need the DSN name in .odbc.ini to match what's in db2driver.cfg, which is db2docker in my example. (FYI, it looks like there are other DB2 config files where you can define a DSN, also.)

So, this works:
ODBC Driver config

ross@ross-T480-linux:~$ cat /etc/odbcinst.ini 
[DB2]
Description = DB2 Driver
Driver = /opt/ibm/odbc_cli/clidriver/lib/libdb2o.so
fileusage=1
dontdlclose=1

IBM driver config

ross@ross-T480-linux:~$ cat /opt/ibm/odbc_cli/clidriver/cfg/db2dsdriver.cfg
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<configuration>
  <dsncollection>
    <dsn alias="db2docker" host="localhost" name="testdb" port="50000"/>
  </dsncollection>

  <databases>
    <database host="localhost" name="testdb" port="50000"/>
  </databases>

</configuration>

ODBC connection config

ross@ross-T480-linux:~$ cat ~/.odbc.ini 
[db2docker]
Description = testdb on DB2 Community on Docker 
DRIVER=DB2

Then, this works

ross@ross-T480-linux:~$ isql -v db2docker db2inst1 testpassword
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> VALUES current_date
+-----------+
| 1         |
+-----------+
| 2023-08-24|
+-----------+
SQLRowCount returns -1
1 rows fetched
SQL> VALUES current_schema
+---------------------------------------------------------------------------------------------------------------------------------+
| 1                                                                                                                               |
+---------------------------------------------------------------------------------------------------------------------------------+
| DB2INST1                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns -1
1 rows fetched
SQL> quit
ross@ross-T480-linux:~$

@rosscoleman
Copy link
Author

rosscoleman commented Aug 24, 2023

OK. Now that I figured out how to connect to DB2 LUW Community (LUW = Linux, Unix, Windows) using ODBC, I'm testing if the ODBC driver rejects semicolons.

import pyodbc

uid = 'db2inst1'
pwd = 'testpassword'

cnxn = pyodbc.connect(f"DSN=db2docker;UID={uid};PWD={pwd}")
crsr = cnxn.cursor()
crsr.execute("VALUES current_schema")
curr_schema = crsr.fetchone()[0]
print(curr_schema)

crsr.execute("DROP TABLE IF EXISTS mytable")

create_sql = """CREATE TABLE mytable (
  myint INTEGER,
  myfloat FLOAT
)"""
crsr.execute(create_sql)
crsr.commit()

print("INSERT 2, 0.2 with no semicolon")
crsr.execute("INSERT INTO mytable (myint, myfloat) VALUES (2, 0.2)")
crsr.commit()
print("INSERT 3, 0.3 with a semicolon")
crsr.execute("INSERT INTO mytable (myint, myfloat) VALUES (3, 0.3);")
crsr.commit()

crsr.execute("SELECT count(1) FROM mytable")
row_count = crsr.fetchone()[0]
print(f"mytable has {row_count} rows")

crsr.close()
cnxn.close()

There is no error in the output, using DB2 LUW on Docker:

(venv) ross@ross-T480-linux:~/git/db2_pyodbc$ python db2_pyodbc.py 
DB2INST1
INSERT 2, 0.2 with no semicolon
INSERT 3, 0.3 with a semicolon
mytable has 2 rows

However, if I modify that program above to use a DB2 on IBM i connection from a dev box at my employer:

(venv) ross@ross-T480-linux:~/git/db2_pyodbc$ python ibmi_pyodbc.py 
*LIBL
INSERT 2, 0.2 with no semicolon
INSERT 3, 0.3 with a semicolon
Traceback (most recent call last):
  File "/home/ross/git/db2_pyodbc/ibmi_pyodbc.py", line 22, in <module>
    crsr.execute("INSERT INTO raclibdw.mytable (myint, myfloat) VALUES (3, 0.3);")
pyodbc.ProgrammingError: ('42000', '[42000] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0104 - Token ; was not valid. Valid tokens: <END-OF-STATEMENT>. (-104) (SQLExecDirectW)')
(venv) ross@ross-T480-linux:~/git/db2_pyodbc$

@rosscoleman
Copy link
Author

I think I have figured out how you could start a DB2 LUW test suite, but it would not help reproduce this issue.

I tried this using arrow-odbc-py and DB2 LUW running on Docker, and this error doesn't happen. We would have to use DB2 on IBM i, and it requires proprietary hardware using IBM POWER architecture CPUs and expensive software licenses.

Therefore, I think the best option is just to write a unit test that fails if a generated SQL statement ends with a semicolon. Leave code comments that this is to emulate the behavior of DB2 on IBM i. That's the best idea I have for automated testing.

@pacman82
Copy link
Owner

Hello @rosscoleman ,

thanks for all the work you put into this. I'll explore if I can use your work to extend the test suite for odbc-api. I would have liked to explain my reasons for insistence on a test bit earlier to you. Yet my full time job increasingly leaks into my open source spare time. I have some concerns that the semicolon might just be the tip of the iceberg. Users of odbc2parquet discovered that the length of variadic strings (i.e. VARCHAR) is reported incorrectly in the string length indicator returned by the Db2 ODBC driver (See: pacman82/odbc-api#398). Causing odbc2parquet to cut these strings short or pad them with garbage. All I have of this behavior though is anecdotal evidence, no reproducing test.

As you laid out, reproducing the issue with the semicolon is to costly. In this case I'll now just adjust the formatting string with the semicolon and make a release. While I am a big fan of the designs TDD yields, I try to not make it dogma. I do not think I need a test that a constant formatting string does not end in a semicolon. It would not serve to validate any of my assumptions, or change its behavior if IBM updates their tech stack. A comment will suffice. After the release please look carefully at exported strings. If you discover an issue, I would welcome if you opened an issue here. Maybe this can be reproduced with the work you already put into it, and then worked around.

Sadly I have not found out how to report Bugs in their ODBC driver to IBM, yet. Maybe I can find a workaround.

Best, Markus

@pacman82
Copy link
Owner

arrow-odbc-py 1.2.3 has been released, featuring a missing semicolon at the end of the insert statement.

Just realized that inserting strings you won't be affected by the wrong string length indicator fetching data. Sorry for the confusion.

@rosscoleman
Copy link
Author

Thank you! I found a better (more scriptable) location to get the ODBC/CLI drivers for DB2 LUW (i.e. DB2 Community on Docker):

https://public.dhe.ibm.com/ibmdl/export/pub/software/data/db2/drivers/odbc_cli/

I found that link at: https://github.com/ibmdb/python-ibmdb/blob/master/README.md

@rosscoleman
Copy link
Author

arrow-odbc-py==1.2.3 does fix insert_into_table for me!

You said, "After the release please look carefully at exported strings." How do I do that? Is there a way to log them? I only see them when there's an error, right now.

@pacman82
Copy link
Owner

Yeah, I was confusing usecases though. What I meant is then you use arrow-odbc to fetch values from the database. In this case I would compare some strings manually with what is actually stored in the database table. It should be obivous if the strings are appended with garbled characters or cut short or invalid UTF-8 altogther, which is what I would expect. If they look fine you would be likely in the clear.

@rosscoleman
Copy link
Author

rosscoleman commented Sep 6, 2023

I'm not noticing problems with text fields, but they've all been CHAR(30) or shorter columns. Do I need to check longer VARCHAR columns?

I am noticing this with both Ubuntu 20.04 and 22.04. Thus, I think it's not the unixODBC version. Probably the ODBC version supported by ibm-iaccess drviers for IBM i. With, arrow_odbc.log_to_stderr(1):

WARN - State: 01000, Native error: 0, Message: [unixODBC][Driver Manager]Driver does not support the requested version

Did you upgrade the required version of ODBC between arrow-odbc-py 1.1.3 and 1.2.3?
Old versions of arrow-odbc-py had that warning, too. I just hadn't enabled arrow-odbc.log_to_stderr before. Do you know where this warning is coming from?

@pacman82
Copy link
Owner

pacman82 commented Sep 6, 2023

Yeah, at the beginning the program declares the version of the ODBC standard which it is going to use. Currently it declares version 3.8, yet I think unixODBC might only support 3.5. Maybe I should just switch to 3.5 if compiling for non-windows systems. 🤔

Given the experience of other users with IBM db2 I would expect these problems reading data, not writing it. If you have read it, and see no issues, this would be good news. As I said, could not reproduce these issues myself yet, just have second hand experience to go by.

@rosscoleman
Copy link
Author

I think this is resolved. The other things I asked about are separate issues.

@pacman82
Copy link
Owner

WARN - State: 01000, Native error: 0, Message: [unixODBC][Driver Manager]Driver does not support the requested version

I checked unixODBC does support API version 3.8. This message just means that the ODBC driver you are using does not, and is developed with an earlier version of ODBC in mind.

@rosscoleman
Copy link
Author

Thanks. That's helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants