Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setinputsizes(CLOB) is extremely slow, do we need to setinputsizes(CLOB)? #189

Closed
zzzeek opened this issue Jun 20, 2018 · 26 comments
Closed
Labels

Comments

@zzzeek
Copy link

zzzeek commented Jun 20, 2018

Hey Anthony -

today on setinputsizes(), apparently CLOB is incredibly slow. Omitting it, an INSERT seems to proceed without issue, and I've tried somewhat larger text sizes up to a few dozen megs in size and it still round trips correctly. The only thing setinputiszes(CLOB) seems to get me is that I can persist a blank string '' without it turning into NULL.

Can I get some clarity on this one? test program below produces:

0.05559845399693586
11.789099370995245

e.g., calling setinputsizes(CLOB) makes a modest insert of 1000 rows 200 times slower, with cx_Oracle 6.3.

import cx_Oracle
import decimal

conn = cx_Oracle.connect(
    user="scott",
    password="tiger",
    dsn=cx_Oracle.makedsn(
        "oracle1120", 1521,
    )
)

def setup():
    cursor = conn.cursor()
    try:
        cursor.execute("drop table t")
    except:
        pass

    cursor.execute("""
    CREATE TABLE t (
        x CLOB
    )
    """)
    return cursor



def go1(cursor):
    cursor.executemany(
        "insert into t (x) values (:x)",
        [
            {"x": "text %s" % i} for i in range(1000)
        ]
    )

def go2(cursor):
    cursor.setinputsizes(x=cx_Oracle.CLOB)
    cursor.executemany(
        "insert into t (x) values (:x)",
        [
            {"x": "text %s" % i} for i in range(1000)
        ]
    )

import timeit

cursor = setup()
print(timeit.timeit("go1(cursor)", "from __main__ import go1, cursor", number=5))

cursor = setup()
print(timeit.timeit("go2(cursor)", "from __main__ import go2, cursor", number=5))


@anthony-tuininga
Copy link
Member

This is expected behaviour. There is no reason to use setinputsizes(CLOB) for inserts. If you do, a temporary CLOB is created and multiple round-trips to the database are required. Depending on your database and client configuration there can be a significant reduction in speed, as you noticed! I have no reason to believe this has changed, but if you see different behaviour with different versions of cx_Oracle, please let me know.

Just for reference, I did this with my own installation and got 0.07 seconds for the first case and 2.3 seconds for the second case.

@zzzeek
Copy link
Author

zzzeek commented Jun 20, 2018

There is no reason to use setinputsizes(CLOB) for inserts.

this is so endlessly confusing. when would I want to use setinputsizes(CLOB)? why is it even there? how are users supposed to know all of these hidden gotchas?

@zzzeek
Copy link
Author

zzzeek commented Jun 20, 2018

additionally, how do I solve the empty string becomes NULL issue without setinputsizes(CLOB) ?

@cjbj
Copy link
Member

cjbj commented Jun 20, 2018

@anthony-tuininga do we need to tweak samples/ReturnLobsAsStrings.py ?

@zzzeek I just got off the phone from some doc writers who are just getting started on writing some user doc. It may take them a while, but it's moving forward.

Empty string == NULL is an 'Oracle thing', no? :)

@zzzeek
Copy link
Author

zzzeek commented Jun 20, 2018

@cjbj if you use setinputsizes(CLOB) for the INSERT, the empty string in the INSERT successfully round-trips as an empty string coming back. Not sure if perhaps setinputsizes(CHAR) or something would do the same thing.

@zzzeek
Copy link
Author

zzzeek commented Jun 20, 2018

nope...only setinputsizes(CLOB)

@naor2013
Copy link

Not setting setinputsizes seem to try to insert any text as a varchar since I get an error if I try to insert strings that has more than 4000 chars. Is there any workaround for that maybe? Can I specify in the query itself that I want to upload the string as a clob?

@anthony-tuininga
Copy link
Member

@naor2013, what error are you getting? Modifying @zzzeek's example to insert data greater than 4000 characters works as expected. Also include your Oracle client and database versions.

@cjbj, what did we need to tweak in samples/ReturnLobsAsStrings.py?

@zzzeek, for reasons that have are shrouded in mystery (mostly due to the fact that they were made so long ago!) the value "" and NULL are considered equivalent for strings, but not for CLOBs. There isn't anything that can be done about it at this point.

@anthony-tuininga
Copy link
Member

There is no reason to use setinputsizes(CLOB) for inserts.

this is so endlessly confusing. when would I want to use setinputsizes(CLOB)? why is it even there? how are users supposed to know all of these hidden gotchas?

The only reason you would want to use setinputsizes(CLOB) is because you have two PL/SQL procedures with an overload (one that accepts strings and one that accepts CLOBs) and you want to make sure you are calling the one that accepts CLOBs. I wouldn't consider that even remotely a common scenario -- so in general you don't need it. Oracle has a lot of baggage that has been carried along over the years, just because it has been around for so long and been so successful. I am trying to help you navigate those waters and perhaps not doing a good a job as you might like. :-) Feel free to keep asking questions, though! At some point perhaps I'll explain it right...

Note as well that you are writing a generic module for accessing and populating data. Most people just writing applications never run into this sort of thing. Things "just work" and if they don't in some situations they can make an adjustment and move on. You don't have that luxury, unfortunately!

@anthony-tuininga
Copy link
Member

FYI, using cx_Oracle 5.3 you need to use cursor.setinputsizes(cx_Oracle.LONG_STRING) or you get the error "ORA-01461: can bind a LONG value only for insert into a LONG column". This is not needed any longer in cx_Oracle 6.x but won't do any harm there either. I also note with interest that 5.3 requires almost 10x longer to process the CLOB version than cx_Oracle 6.x and is almost 50% faster for the non-CLOB case, too. Frankly, I'd drop support for cx_Oracle 5.x to make your life simpler. Almost all of the downloads for cx_Oracle are now for 6.x anyway.

@naor2013
Copy link

I'm actually getting this issue you're talking about with the LONG value thing but I'm using cx_oracle 6.3.1! Do you know what might cause this issue?
What does it mean to use setinputsizes as LONG_STRING and how is it any different than setting it as a CLOB?

@anthony-tuininga
Copy link
Member

Using type cx_Oracle.CLOB will create a temporary CLOB, which requires storage in the temporary tablespace and performance is poor (as noted). Setting LONG_STRING instead simply passes the string directly through to Oracle. There is a limit (currently) of about 1 GB but that's generally not a concern. :-)

Can you provide a test case that demonstrates the issue you are getting? It may be a configuration difference as I don't experience this problem at all. But using setinputsizes(cx_Oracle.LONG_STRING) doesn't hurt anything either in this case.

@naor2013
Copy link

My issue is with sqlalchemy's code so I don't think I can find a specific case on my end that causes this issue but would replacing setinputsizes(CLOB) with setinputsizes(LONG_STRING) in sqlalchemy fix the performance issue? (And the LONG issue too I guess)

@anthony-tuininga
Copy link
Member

Unless you plan to insert data larger than 1 GB, I think replacing CLOB with LONG_STRING would solve your problem and get good performance. If your data is larger than 1 GB you'll have other issues, I think! :-)

@naor2013
Copy link

What about other types like Varchar for examle? Do they need some type of setinputsizes before insert? Do they have an optional type you can declare like LONG_STRING? Or is CLOB the only one that potentially can have setinputsizes before you insert to it?

@anthony-tuininga
Copy link
Member

Varchar data is limited to 4000 bytes unless extended strings are enabled (12.2) in which case they can go up to 32767 bytes. They don't need to have any sort of setinputsizes() used. However, there are other string types in Oracle's world -- namely cx_Oracle.FIXED_CHAR, cx_Oracle.FIXED_NCHAR and cx_Oracle.NCHAR. So it depends on what you're inserting into and the configuration of your database and client environments as well as the data that you are inserting. There is no one answer to this question, unfortunately, much to @zzzeek's chagrin! I can provide additional details on the "rules" and "behaviours" of the different types if that would be helpful.

@zzzeek
Copy link
Author

zzzeek commented Jun 21, 2018

This issue thread has been a liberating moment. I'm actually traveling this week but at the same time formulating a plan to add an event hook to the setinputsizes step - so I'll maybe adjust this clob thing a bit, but after that, it becomes the end user implements event hooks to do whatever they need on a per statement basis. The existence of this issue and many others are what I can point towards in defense of somewhat pushing this out to the end users. But the cx oracle dialect has come a long way in 1.2 and by 1.3 I hope to be done with having to adjust for these typing issues because the answer is, "it depends".

@anthony-tuininga
Copy link
Member

Yes, indeed, "it depends"! We'd like to make sure that the default values are the most useful but in the end it all depends on the end user's needs and configuration. So I'm happy to be able to "liberate" you. :-)

@anthony-tuininga
Copy link
Member

Closing due to lack of activity.

@zzzeek
Copy link
Author

zzzeek commented Sep 21, 2018

hey anthony -

I'm also noticing the '' / None thing when I set an outputtype handler from a SELECT that specifies cursor.var(cx_Oracle.STRING) instead of cursor.var(cx_Oracle.CLOB) (e.g. the default_type passed in). So I'd rather use this:

                # default_type = cx_Oracle.CLOB ( or NCLOB maybe)
                return cursor.var(
                    default_type, size, cursor.arraysize,
                    outconverter=lambda value: value.read()
                )

Is there an unreasonable performance hit doing it that way?

@anthony-tuininga
Copy link
Member

anthony-tuininga commented Sep 21, 2018

Yes, there would be a definite performance hit using LOBs -- and it would be very noticeable -- so I would avoid it unless it is needed. I understand that the whole Oracle thing of "" == null can be annoying, but that's the way it is, unfortunately -- and I don't think its worth the performance penalty to differentiate.

@anthony-tuininga
Copy link
Member

To be clear, the performance difference is due to the extra round-trips that are required to perform the read() on the LOB that is returned. For local databases the difference might be negligible. For databases that are across a high latency network the difference will be very significant indeed!

@zzzeek
Copy link
Author

zzzeek commented Sep 21, 2018

OK but if the column in the DB is CLOB, and I'm just doing "SELECT * FROM table", CLOB is what cx_Oracle gets back, I should...use the STRING outputtypehandler on it ? Or is the performance hit baked in because it's a CLOB in the actual database?

@zzzeek
Copy link
Author

zzzeek commented Sep 21, 2018

that is, without doing a CAST first

@anthony-tuininga
Copy link
Member

It doesn't matter if the database has a CLOB. It matters if the a LOB descriptor is used on the client. The default, as you noted, is to return a LOB. If you use an output type handler to fetch the LOB directly as a string the extra round trips are avoided. Just be aware that there are limits on the size of the LOB -- but if you are doing a lob.read() on the result when you are finished you'll experience them then, too -- but with the extra round trips hampering your performance! Hope that's clear enough.

@zzzeek
Copy link
Author

zzzeek commented Sep 21, 2018

OK good enough, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants