Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle NUMBER(*,0) format causes column parsing to stop #45

Closed
tb0n3zz opened this issue Dec 5, 2019 · 2 comments · Fixed by #47
Closed

Oracle NUMBER(*,0) format causes column parsing to stop #45

tb0n3zz opened this issue Dec 5, 2019 · 2 comments · Fixed by #47
Assignees
Labels
bug Something isn't working

Comments

@tb0n3zz
Copy link

tb0n3zz commented Dec 5, 2019

Very excited to find this -- just what I need, so thank you very much for creating it! I created a test to directly call Oracle 12.2.0.1 and run dbms_metadata.get_ddl, which is the standard method for producing DDL from the Oracle database using the CLI. It produces CREATE TABLE statements like this, which I then ran through ddlparse to convert to BigQuery -- but I found that most of the columns were missing in the resulting sql. Here's a simple runnable example where I removed the (*,0) from the first NUMBER column (COL2_WORKS):

from ddlparse.ddlparse import DdlParse


def main():

    oracle_ct = """
  CREATE TABLE "MYSCHEMA"."MYTABLE"
   (	"COL1" VARCHAR2(30) NOT NULL ENABLE,
	"COL2_WORKS" NUMBER NOT NULL ENABLE,
	"COL3" VARCHAR2(30) NOT NULL ENABLE,
	"COL4_BROKEN_AFTER_THIS" NUMBER(*,0) NOT NULL ENABLE,
	"COL5" VARCHAR2(8) NOT NULL ENABLE,
	"COL6" VARCHAR2(3) NOT NULL ENABLE
   ) ;
"""

    table = DdlParse().parse(ddl=oracle_ct, source_database=DdlParse.DATABASE.oracle)
    print(table.to_bigquery_ddl())


if __name__ == "__main__":
    main()

Note the NUMBER(*,0 designation. I found in testing in on many tables that as soon as ddlparse hits any NUMBER(*,0) data type column, it stops processing any more columns, so I get BigQuery DDL like this:

#standardSQL
CREATE TABLE `project.MYSCHEMA.MYTABLE`
(
  COL1 STRING NOT NULL,
  COL2_WORKS FLOAT64 NOT NULL,
  COL3 STRING NOT NULL,
  COL4_BROKEN_AFTER_THIS FLOAT64
)

I stepped through the code/breakpoints enough to understand that the issue is happening in the original column parsing, not in the conversion to BQ DDL. The BQ conversion is only called for the 4 columns.

I suspect the * is breaking a regex in ddlparse or pyparse. I'm out of time to figure out where, and I'd imagine you will be a lot faster at it since you are familiar with how you're using them, but I wanted to share this much since I got this far. For now, easily fixed by just preprocessing my Oracle CREATE TABLE strings to remove anything with the *.

Additional note: it is turning NUMBER(*,0) into a FLOAT64 when NUMBER(*,0) is an integer - but perhaps that's because of the broken regex as well / it's being cut off.

Thanks again for the code!

@tb0n3zz
Copy link
Author

tb0n3zz commented Dec 5, 2019

Figured I'd share the other thing I ran into and workaround -- preprocessing input DDL from Oracle as follows. Replacing with (38,0) makes (*,0) correctly convert to an INT64.

    # remove (*,0) from NUMBER defs that are integers anyway (38 is the max in variable data type, gets replaced anyway)
    massaged_ddl = re.sub(r'NUMBER\(\*,0\)', r'NUMBER(38,0)', oraddl)
    # chokes on this, tries to process like a column
    massaged_ddl = re.sub(r'SUPPLEMENTAL LOG DATA \(.*\) COLUMNS.*\n', '', massaged_ddl)

@shinichi-takii
Copy link
Owner

@tb0n3zz
Thank you for using, and a issue report.
I fixed issue to v1.4.0 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants