Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdbc_fetch_size no effect? #103

Open
LxiaoGirl opened this issue Dec 18, 2015 · 16 comments
Open

jdbc_fetch_size no effect? #103

LxiaoGirl opened this issue Dec 18, 2015 · 16 comments
Labels

Comments

@LxiaoGirl
Copy link

logstash 1.5.5
jdbc 1.0.0

error log:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid24312.hprof ...
Heap dump file created [1159466364 bytes in 21.477 secs]

i read
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-jdbc_page_size
add parameter jdbc_fetch_size.
but,no effect.
I want to increase memory?

@jaykay153
Copy link

This isn't working for me either using postgresql. Looking at the postgres documentation, the code needs to explicitly set the connection autocommit mode to false e.g. conn.setAutoCommit(false);
https://jdbc.postgresql.org/documentation/91/query.html#fetchsize-example

For data size of 10 million plus, this is a must (jdbc_paging_enabled is too slow).

@jaykay153
Copy link

Can this be prioritized please?

@suyograo
Copy link
Contributor

thanks for the issue, we'll get this soon.

@LxiaoGirl
Copy link
Author

fixed?

@edeak
Copy link

edeak commented Jan 20, 2016

I guess pagination could be made faster if the following statement is executed:
select t.* from (select id from t where id > :LAST_ID limit :LIMIT) q join t on q.id = t.id

I use this trick to iterate through millions of records and its quite fast, the only thing to be solved is getting the last id and use it as a variable inside the plugin.

@untergeek
Copy link
Contributor

the only thing to be solved is getting the last id and use it as a variable inside the plugin.

This was committed in #108 and can be gained by updating to version 3.0.0 of the plugin.

Some interim documentation: #57 (comment)

@jaykay153
Copy link

Many tables don't have the last ID column. What about the jdbc_fetch_size fix please?

@jjunqueira
Copy link

+1, please fix!

The paging is slower and because of the way that the count query is implemented for paging(just by wrapping the defined statement) I am seeing Incorrect Key File errors in MySQL.

An alternative might also be allowing a way to define the count query statement?

Let me know if I can do anything to help. Thank you for your hard work on this project!

@rspuler
Copy link

rspuler commented Jul 6, 2016

Serious problem for me too. This plugin does not work with postgresql driver and full import of big tables. Allways results in an OutOfMemoryError due to the reason that jaykay153 has given.

@rspuler
Copy link

rspuler commented Jul 6, 2016

Workaround is not to use the official postgresql drivers but this one here:
https://github.com/impossibl/pgjdbc-ng which seams to work (probably be setting autocommit to false, when fetchSize is set)

@axhiao
Copy link

axhiao commented Feb 22, 2017

It seems that an additional parameter ?useCursorFetch=true needs to be added to the connection string of mysql 5.x.

izhigalko added a commit to izhigalko/logstash-input-jdbc that referenced this issue Apr 7, 2017
izhigalko added a commit to izhigalko/logstash-input-jdbc that referenced this issue Apr 7, 2017
izhigalko added a commit to izhigalko/logstash-input-jdbc that referenced this issue May 4, 2017
andrewvc pushed a commit to andrewvc/logstash-input-jdbc that referenced this issue May 18, 2017
@njvack
Copy link

njvack commented Jun 29, 2017

It sounds as though the logstash jdbc plugin itself can't force the connection to use a cursor for fetching rows; you need to tell the JDBC driver to do it. That's okay. However! It would be very very helpful if the plugin raised a big warning if you try to set jdbc_fetch_size and your connection doesn't support it. Or at least mention this in the documentation.

For MySQL users, @axhiao is right; using useCursorFetch=true in your connect string lets you import all the rows without problem.

@untergeek untergeek removed their assignment Nov 8, 2017
@ryanrozich
Copy link

ryanrozich commented Nov 13, 2017

Trying to do a big load from redshift (10's of millions of rows) into elasticsearch. Gave logstash 12G of heap and it still ran out of memory. jdbc_fetch_size was set to 10000 - it looks like this bug is my problem.

Also looks like for redshfit, there are no driver configuration options that let you set autocommit to off.

http://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-options.html

You get an error from your SQL if you try to using set autocommit=off; the error says

ERROR: SET AUTOCOMMIT TO OFF is no longer supported

Is it possible to have the JDBC input set AutoCommit(false) whenever jdbc_fetch_size param is set? Especially since there doesn't seem to be any way to set this in the JDBC connection string or in the SQL statement?

Any other ideas for how to get large amounts of data out of Redshift and into ES using logstash without running out of memory?

@q3yi
Copy link

q3yi commented Dec 4, 2017

@rspuler is pgjdbc-ng still working? I got error message like:

No serializer found for class com.impossibl.postgres.datetime.ISODateFormat$Parser and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: com.impossibl.postgres.jdbc.PGArray["resultSet"]->com.impossibl.postgres.jdbc.PGResultSet["statement"]->com.impossibl.postgres.jdbc.PGSimpleStatement["connection"]->com.impossibl.postgres.jdbc.PGConnectionImpl["dateFormatter"]->com.impossibl.postgres.datetime.ISODateFormat["parser"])

It seems that logstash jdbc input plugin does not support pgjdbc-ng

@1tayH
Copy link

1tayH commented Nov 21, 2018

Any way to deal with this (Postgres)? Like @jaykay153 said, the official driver doesn't set autoCommit to false when using setFetchSize
I've tried using pgjdbc-ng but since my query is taking too long to complete I'm getting an exception:

# my_pipeline.conf
...
jdbc_driver_library => "/usr/share/logstash/drivers/pgjdbc-ng-0.7-complete.jar"
jdbc_driver_class => "com.impossibl.postgres.jdbc.PGDriver"
statement => 'SELECT pg_sleep(600)'
...
logstash_1  | [2018-11-21T11:06:45,205][ERROR][logstash.inputs.jdbc     ] Java::JavaSql::SQLException: java.nio.channels.ClosedChannelException: SELECT pg_sleep(600)
logstash_1  | [2018-11-21T11:06:45,216][WARN ][logstash.inputs.jdbc     ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::JavaSql::SQLException: java.nio.channels.ClosedChannelException>}
logstash_1  | Nov 21, 2018 11:06:48 AM com.impossibl.postgres.jdbc.ThreadedHousekeeper$HousekeeperReference cleanup
logstash_1  | WARNING: Cleaning up leaked connection ( jdbc:pgsql://sndbox-postgres-staging-large.c8zqbprwuvhi.us-east-2.rds.amazonaws.com/ebdb )
logstash_1  | Allocation occurred @
logstash_1  |   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
logstash_1  |   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
logstash_1  |   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
logstash_1  |   at java.lang.reflect.Method.invoke(Method.java:498)
logstash_1  |   at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:423)

That being said a simple select * FROM x works fine.

@zalseryani
Copy link

What about oracle , how to change fetch size from the jdbc connection string.

Any update on this issue !!?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests