Use a prepared statement instead of string replacement when :sql_last_value is provided #233

frankschmitt · 2017-09-06T07:26:14Z

This might actually be not a bug in the code, but a poor documentation choice (if it's possible to provide the sql_last_value as part of the parameters hash with a ? placeholder instead of using :sql_last_value) - I'm not 100% sure.

According to the docs, something like this should be used to filter by sql_last_value:

input {

  jdbc {
    statement => "SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > :sql_last_value""    
    use_column_value => true
    tracking_column => "id"
    tracking_column_type => "numeric"
    clean_run => true
 }

}

However, contrary to my expectations, this doesn't use a prepared statement with a bind variable for sql_last_value - it does a string replacement instead, as can be seen in the logstash logs, as well as in the v$sql database view (I'm using Oracle):

SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > 107560.0

thus generating a new unique SQL statement every time.

That leads to a couple of problems:

it forces a hard parse of the SQL statement for every run (consuming DB resources)
it trashes the shared pool (the area where Oracle stores the parsed SQL statements)
it is prone to SQL injection attacks if an attacker manages to replace the stored metadata

Logstash Version: 5.3.0
Operating System: Dockerized Linux (FROM docker.elastic.co/logstash/logstash:5.3.0)

Suggested Fix
Use a prepared statement, and provide sql_last_value as a parameter. The underlying Sequel library provides the necessary infrastructure, as far as I can tell from the docs.
If necessary, I can provide further information and start working on a patch / pull request myself.

The text was updated successfully, but these errors were encountered:

raulk89 · 2019-03-06T06:42:03Z

Any idea when this will be fixed/changed..? Looking forward to it.
Bug or poor documentation choice, either way, this is bad, especially for production databases.

DUSHES2517 · 2019-03-28T06:26:34Z

This question is relevant for us. Now there is no solution to this problem?

guyboertje · 2019-04-18T13:22:59Z

Sequel being an ORM is mostly used to define DB interactions using objects that generate the SQL dialect statements that are then submitted for execution. For example:

DB[:mytable].select(:id, :name, :age).where(Sequel.lit('age > ?', 21)) # SELECT id, name, age FROM mytable WHERE age > 21
DB[:mytable].first(2) # SELECT * FROM mytable LIMIT 2
DB[:mytable].count(:id) # SELECT count(id) AS count FROM mytable LIMIT 1

This is usable if you are writing a web application that has a db schema design that the application itself enforces - the ORM bit because you will map a field on a html form to a column in a table or a value displayed in a web page to a column from a record in a catalogue etc.

The Sequel docs show how to use prepared statements thus:

ps = DB[:mytable].where(id: :$ident).prepare(:first, :select_by_id)

Now I don't yet know the implications of the "type" as set with :first (maybe we would use :select) and :select_by_idis the name/key used in Sequel and in the server side execution.
I presume to use the greater than N literal we would use:

ps = DB[:mytable].where(Sequel.lit('id > ?', :$ident)).prepare(:select, :select_next)

The example given in Sequel is shown using the SQL generation via the ORM. It is not easy to see whether prepare can be called on a raw statement without detailed reading of the Sequel code. In addition, the prepared statements are stored on the server and executed by name repeatedly, if multiple jdbc inputs are specified in the config then a "constant" name in the plugin code will cause overwrites, a user defined prepared statement name will need to be specified (each plugin can have an id but this is not often specified and if not a random one is generated, there would be a net buildup of unused prepared statements in the DB if the same statement with a new name was submitted on each LS restart).

In ETL/BI tools it is quite common to allow users to specify a SQL statement execute. This is useful because it is a kind of WYSIWYG mechanism - using some kind of interface to the DB one can build the statement until it yields the correct results or be given one by a DBA. Ironically, the statements that most benefit from being prepared are the most complex ones.

It may be possible to build the prepared statement using the raw SQL statement as supplied by the user, I have not tried it but I suspect we will be moving into a different (underused) part of the Sequel code base. I can do some experiments but the big challenge is to get confidence because the basic tests in the plugin don't really verify the more complex statements out in the wild. Another concern is when :sql_last_value occurs multiple times in the SQL statement. I can only really live test on Postgres.

guyboertje · 2019-04-30T15:26:32Z

I have had success in making this a reality. Testing on Postgres for now. Experimental plugin given out for testing. PR to follow.

raulk89 · 2020-05-08T17:14:21Z

Have I got this right, that this was fixed..?
If yes, then which version of logstash it was fixed.

Raul

guyboertje closed this as completed in c7c91c4 Sep 9, 2019

jsvd assigned guyboertje Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a prepared statement instead of string replacement when :sql_last_value is provided #233

Use a prepared statement instead of string replacement when :sql_last_value is provided #233

frankschmitt commented Sep 6, 2017

raulk89 commented Mar 6, 2019

DUSHES2517 commented Mar 28, 2019

guyboertje commented Apr 18, 2019

guyboertje commented Apr 30, 2019

raulk89 commented May 8, 2020

Use a prepared statement instead of string replacement when :sql_last_value is provided #233

Use a prepared statement instead of string replacement when :sql_last_value is provided #233

Comments

frankschmitt commented Sep 6, 2017

raulk89 commented Mar 6, 2019

DUSHES2517 commented Mar 28, 2019

guyboertje commented Apr 18, 2019

guyboertje commented Apr 30, 2019

raulk89 commented May 8, 2020