Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a prepared statement instead of string replacement when :sql_last_value is provided #233

Closed
frankschmitt opened this issue Sep 6, 2017 · 5 comments
Assignees

Comments

@frankschmitt
Copy link

This might actually be not a bug in the code, but a poor documentation choice (if it's possible to provide the sql_last_value as part of the parameters hash with a ? placeholder instead of using :sql_last_value) - I'm not 100% sure.

According to the docs, something like this should be used to filter by sql_last_value:

input {

  jdbc {
    statement => "SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > :sql_last_value""    
    use_column_value => true
    tracking_column => "id"
    tracking_column_type => "numeric"
    clean_run => true
 }

}

However, contrary to my expectations, this doesn't use a prepared statement with a bind variable for sql_last_value - it does a string replacement instead, as can be seen in the logstash logs, as well as in the v$sql database view (I'm using Oracle):

SELECT id, mycolumn1, mycolumn2 FROM my_table WHERE id > 107560.0

thus generating a new unique SQL statement every time.

That leads to a couple of problems:

  • it forces a hard parse of the SQL statement for every run (consuming DB resources)
  • it trashes the shared pool (the area where Oracle stores the parsed SQL statements)
  • it is prone to SQL injection attacks if an attacker manages to replace the stored metadata

Logstash Version: 5.3.0
Operating System: Dockerized Linux (FROM docker.elastic.co/logstash/logstash:5.3.0)

Suggested Fix
Use a prepared statement, and provide sql_last_value as a parameter. The underlying Sequel library provides the necessary infrastructure, as far as I can tell from the docs.
If necessary, I can provide further information and start working on a patch / pull request myself.

@raulk89
Copy link

raulk89 commented Mar 6, 2019

Any idea when this will be fixed/changed..? Looking forward to it.
Bug or poor documentation choice, either way, this is bad, especially for production databases.

@DUSHES2517
Copy link

This question is relevant for us. Now there is no solution to this problem?

@guyboertje
Copy link
Contributor

Sequel being an ORM is mostly used to define DB interactions using objects that generate the SQL dialect statements that are then submitted for execution. For example:

DB[:mytable].select(:id, :name, :age).where(Sequel.lit('age > ?', 21)) # SELECT id, name, age FROM mytable WHERE age > 21
DB[:mytable].first(2) # SELECT * FROM mytable LIMIT 2
DB[:mytable].count(:id) # SELECT count(id) AS count FROM mytable LIMIT 1

This is usable if you are writing a web application that has a db schema design that the application itself enforces - the ORM bit because you will map a field on a html form to a column in a table or a value displayed in a web page to a column from a record in a catalogue etc.

The Sequel docs show how to use prepared statements thus:

ps = DB[:mytable].where(id: :$ident).prepare(:first, :select_by_id)

Now I don't yet know the implications of the "type" as set with :first (maybe we would use :select) and :select_by_idis the name/key used in Sequel and in the server side execution.
I presume to use the greater than N literal we would use:

ps = DB[:mytable].where(Sequel.lit('id > ?', :$ident)).prepare(:select, :select_next)

The example given in Sequel is shown using the SQL generation via the ORM. It is not easy to see whether prepare can be called on a raw statement without detailed reading of the Sequel code. In addition, the prepared statements are stored on the server and executed by name repeatedly, if multiple jdbc inputs are specified in the config then a "constant" name in the plugin code will cause overwrites, a user defined prepared statement name will need to be specified (each plugin can have an id but this is not often specified and if not a random one is generated, there would be a net buildup of unused prepared statements in the DB if the same statement with a new name was submitted on each LS restart).

In ETL/BI tools it is quite common to allow users to specify a SQL statement execute. This is useful because it is a kind of WYSIWYG mechanism - using some kind of interface to the DB one can build the statement until it yields the correct results or be given one by a DBA. Ironically, the statements that most benefit from being prepared are the most complex ones.

It may be possible to build the prepared statement using the raw SQL statement as supplied by the user, I have not tried it but I suspect we will be moving into a different (underused) part of the Sequel code base. I can do some experiments but the big challenge is to get confidence because the basic tests in the plugin don't really verify the more complex statements out in the wild. Another concern is when :sql_last_value occurs multiple times in the SQL statement. I can only really live test on Postgres.

@guyboertje
Copy link
Contributor

I have had success in making this a reality. Testing on Postgres for now. Experimental plugin given out for testing. PR to follow.

@raulk89
Copy link

raulk89 commented May 8, 2020

Have I got this right, that this was fixed..?
If yes, then which version of logstash it was fixed.

Raul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants