Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping issues #63

Closed
moiristo opened this issue Jun 25, 2013 · 11 comments
Closed

Escaping issues #63

moiristo opened this issue Jun 25, 2013 · 11 comments

Comments

@moiristo
Copy link

Some of our indexed content occasionally contains gibberish that is not escaped properly by riddle. Consider the following example:

query = Riddle::Query.snippets("\\'", 'foo_core', 'foo')
=> "CALL SNIPPETS('\\\\'', 'foo_core', 'foo')"

ThinkingSphinx::Connection.new.query query
=> Mysql2::Error: sphinxql: syntax error, unexpected QUOTED_STRING, expecting ')' near '', 'foo_core', 'foo')'

I was wondering whether it is a good idea to use mysql2's escaping capabilities instead (Mysql2::Client#escape), or have you chosen not to do that for a particular reason? A working example:

ThinkingSphinx::Connection.new.client.escape("\\'")
=> "\\\\\\'"

query = "CALL SNIPPETS('\\\\\\'', 'foo_core', 'foo')"
ThinkingSphinx::Connection.new.query query
Mysql2::Error: unknown local index 'foo_core' in search request # Query ok!
@pat
Copy link
Owner

pat commented Jun 25, 2013

I don't auto-escape queries (in searches or excerpts) because sometimes quotes are deliberate, or @ symbols, etc - and they impact how Sphinx ranks search results. That said, there is Riddle::Query.escape, which performs in a manner quite similar (perhaps identical) to Mysql2::Client#escape, and is built to match the SphinxQL escaping rules.

@moiristo
Copy link
Author

I see.. but the thing is that I hit this behaviour when I request the excerpt of a search result ('result.excerpts[:content]'), yielding the following trace:

[GEM_ROOT]/gems/thinking-sphinx-3.0.3/lib/thinking_sphinx/connection.rb:74 in query
[GEM_ROOT]/gems/thinking-sphinx-3.0.3/lib/thinking_sphinx/connection.rb:74 in query
[GEM_ROOT]/gems/thinking-sphinx-3.0.3/lib/thinking_sphinx/excerpter.rb:16 in excerpt!
[GEM_ROOT]/gems/thinking-sphinx-3.0.3/lib/thinking_sphinx/panes/excerpts_pane.rb:34 in method_missing

Do you mean I should (Riddle-) escape the text in the 'content' accessor of the result?

@pat
Copy link
Owner

pat commented Jun 26, 2013

Sorry, I'm understanding now - it's not the query that's got characters needing escaping, it's the content you're passing through. I'll get a fix sorted.

@pat
Copy link
Owner

pat commented Jun 26, 2013

So, I'd forgotten that Riddle already escapes single quotes in both queries and content values for snippets calls. Currently trying to puzzle through where the escaping should happen, and to what extent.

@moiristo
Copy link
Author

I think that the quote escaping for snippets is just not sufficient, as it may return an invalid sphinxql query string. I don't know if this can also happen for other queries.

@epidemian
Copy link
Contributor

Hehe, we stumbled across this same issue when using the ThinkingSphinx excerpts on data that contains a backslash followed by a single quotation mark (i.e. a "\\'" string in Ruby).

This snippet fails:

ThinkingSphinx::Excerpter.new('sample_index', 'words').excerpt!("foo \\' bar")
# => Mysql2::Error: sphinxql: syntax error, unexpected IDENT, expecting ')' near 'bar', 'sample_index', 'words', '<span class="match">' AS before_match, '</span>' AS after_match, ' &#8230; ' AS chunk_separator)'

This is because the escaping mechanism in Riddle::Query.spnippets is not very thorough. For example (note the puts):

irb(main):115:0> puts Riddle::Query.snippets("foo\\'bar", 'index', 'words') # Generates invalid SQL.
CALL SNIPPETS('foo\\'bar', 'index', 'words')
=> nil
irb(main):116:0> puts Riddle::Query.snippets("foo\\nbar", 'index', 'words') # Generates a newline in the SQL while the original string didn't have one.
CALL SNIPPETS('foo\nbar', 'index', 'words')
=> nil

For now, we included this monkey patch in a Rails initializer:

module Riddle::Query
  def self.snippets(data, index, query, options = nil)
    data = quote_string(data)
    query = quote_string(query)

    options = ', ' + options.keys.collect { |key|
      value = translate_value options[key]
      value = "'#{quote_string(value)}'" if value.is_a?(String)

      "#{value} AS #{key}"
    }.join(', ') unless options.nil?

    "CALL SNIPPETS('#{data}', '#{index}', '#{query}'#{options})"
  end

  def self.quote_string(string)
    Mysql2::Client.escape(string)
  end
end

The Mysql2::Client.escape method takes care of everything, and i think it's reasonable to use it in Riddle::Query.snippets, but i'm hesitant about doing a pull request with that change as it wouldn't be very consistent with the rest of Riddle code. E.g. Mysql2::Client.escape doesn't seem to be used anywhere else, and i don't know if its functionality overlaps with the Riddle.escape and Riddle::Query.escape methods. I'd like to know @pat's opinion/insights on that 😃

I also noticed that there's a Riddle::Client#excerpts method. What's the difference of that and Riddle::Query.snippets and why does ThinkginSphinx use the later?

@pat
Copy link
Owner

pat commented Jul 12, 2013

Hi Demian

I'm open to switching the single quote escaping to Riddle::Query.escape (Riddle.escape is for non-SphinxQL Sphinx interactions) - seems that would fix this issue.

I wasn't familiar with Mysql2::Client.escape, which is why I've not been using it, but not sure if it's quite the same as what Sphinx considers as escaped. If you want to replace the inner workings of Riddle::Query.escape with Mysql2::Client.escape, I'm open to that provided the specs for the former still pass. Let's not remove the Riddle method though.

Pat

On 12/07/2013, at 6:02 AM, Demian Ferreiro wrote:

Hehe, we stumbled across this same issue when using the ThinkingSphinx excerpts on data that contains a backslash followed by a single quotation mark (i.e. a "'" string in Ruby).

This snippet fails:

ThinkingSphinx::Excerpter.new('sample_index', 'words').excerpt!("foo ' bar")

=> Mysql2::Error: sphinxql: syntax error, unexpected IDENT, expecting ')' near 'bar', 'sample_index', 'words', '' AS before_match, '' AS after_match, ' … ' AS chunk_separator)'

This is because the escaping mechanism in Riddle::Query.spnippets is not very thorough. For example (note the puts):

irb(main):115:0> puts Riddle::Query.snippets("foo'bar", 'index', 'words') # Generates invalid SQL.
CALL SNIPPETS('foo'bar', 'index', 'words')
=> nil
irb(main):116:0> puts Riddle::Query.snippets("foo\nbar", 'index', 'words') # Generates a newline in the SQL while the original string didn't have one.
CALL SNIPPETS('foo\nbar', 'index', 'words')
=> nil
For now, we included this monkey patch in a Rails initializer:

module Riddle::Query

def self.snippets(data, index, query, options = nil)

data = quote_string(data)

query = quote_string(query)

options = ', ' + options.keys.collect { |key|

value = translate_value options[key]

value = "'#{quote_string(value)}'" if value.is_a?(String)

"#{value} AS #{key}"

}.join(', ') unless options.nil?

"CALL SNIPPETS('#{data}', '#{index}', '#{query}'#{options})"

end

def self.quote_string(string)

Mysql2::Client.escape(string)

end
end
The Mysql2::Client.escape method takes care of everything, and i think it's reasonable to use it in Riddle::Query.snippets, but i'm hesitant about doing a pull request with that change as it wouldn't be very consistent with the rest of Riddle code. E.g. Mysql2::Client.escape doesn't seem to be used anywhere else, and i don't know if its functionality overlaps with the Riddle.escape and Riddle::Query.escape methods. I'd like to know @pat's opinion/insights on that

I also noticed that there's a Riddle::Client#excerpts method. What's the difference of that and Riddle::Query.snippets and why does ThinkginSphinx use the later?


Reply to this email directly or view it on GitHub.

@epidemian
Copy link
Contributor

@pat, i gave this a spin today and i'm not convinced on changing the implementation of Riddle::Query.escape. That function seems to be for escaping special SphinxQL characters, and is not used in the Riddle code (except for the specs) as far as i could see (so i guess Thinking Sphinx and maybe others use it).

The behaviour i'm looking for in the excerpts is for them to escape the special SQL characters so they are not interpreted as special characters by MySQL (see http://dev.mysql.com/doc/refman/5.0/en/string-literals.html#character-escape-sequences).

For example, if a field has a backslash followed by an "n", i'd like those two characters to be preserved in the excerpts. The current implementation does not escape the backslash, so the two characters are interpreted as a newline by MySQL:

irb(main):010:0> ThinkingSphinx::Excerpter.new('sample_index', 'nothing').excerpt!("foo \\n bar")
=> "foo \n bar"

Using the monkey patch i mentioned above, the same thing works as expected:

irb(main):002:0> ThinkingSphinx::Excerpter.new('sample_index', 'nothing').excerpt!("foo \\n bar")
=> "foo \\n bar"

Same thing goes for other MySQL escaped characters.


I think it'd de better to leave Riddle::Query.escape as it is right now and use Mysql2::Client.escape for these strings in the snippets, as they need to be escaped for MySQL to not break and don't need to escape special SphinxQL characters.

The same thing could be applied to the matching strings in SELECT statements (maybe here) so no random SyntaxError can occur if a user enters a query with a bad combination of characters:

irb(main):032:0> MyModel.search("goodbye \\( world")
ThinkingSphinx::SyntaxError: index my_model_core,my_model_delta: syntax error, unexpected $end near ''

irb(main):033:0> MyModel.search("goodbye \\' world")
ThinkingSphinx::SyntaxError: sphinxql: syntax error, unexpected IDENT, expecting ')' near 'world @sphinx_internal_class_name (MyModel)') AND sphinx_deleted = 0 LIMIT 0, 20; SHOW META'

Would you be interested in a pull request changing the behaviour of the Riddle::Query.snippets function to use Mysql2::Client.escape instead of just escaping the single quotes? Maybe changing the select queries too but that'll take me more work to make sure it works with TS. And are there any other cases that we should escape as SQL?

@pat
Copy link
Owner

pat commented Jul 18, 2013

What you're saying makes sense. A pull request for Riddle::Query.snippets to use Mysql2::Client.escape sounds wise.

@pat pat closed this as completed in fceffb9 Jul 21, 2013
@pat
Copy link
Owner

pat commented Jul 21, 2013

Ended up mixing Mysql2::Client.escape into the mix, along with some acceptance specs to confirm that edge cases always end up query-safe. If you find situations where that's not the case, let me know.

Thanks for all the discussion here @moiristo and @epidemian (and @mipearson in #66), certainly helped clarify the problem for me.

@epidemian
Copy link
Contributor

Thanks for the support, @pat. Sorry for not sending a PR before 😓

I think this implementation does not cover the use case of fully preserving the text of snippets when there are no matches, but i'll open a separate issue to discuss that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants