Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define quote escaping semantics. #1645

Closed
jordansissel opened this issue Aug 18, 2014 · 42 comments
Closed

Define quote escaping semantics. #1645

jordansissel opened this issue Aug 18, 2014 · 42 comments
Assignees

Comments

@jordansissel
Copy link
Contributor

jordansissel commented Aug 18, 2014

This is requested numerous places, but I was reminded from #895

Escaping quotes and control characters are impossible to do today in Logstash config. "\n" is literally backslash and lowercase n. """ is literally backslash and doublequote.

Logstash supports single and double-quoted text values, and we should support escapes in both the same exact way:

Proposal

Add a new setting, config.support_escapes that, when set, will enable the following behaviors on quoted strings in the Logstash config:

  • quote-backslash-quote-quote should be text with a single quote as value
  • doublequote-backslash-doublequote-doublequote should be text with a single double-quote as value
  • Because it is expected by users, "\t" and '\t' should be a single tab character as value (ASCII 09)
  • Because it is expected by users, "\n" and '\n' should be a single line feed character as value (ASCII 10)
  • Because of \t and \n, we should also support "\r" and '\r' as a single carriage return (ASCII 13)
  • "\\" and '\\' should be text with a single backslash (written as two backslashes in the config)
  • Unicode text is fully supported and does not need escaping

Workarounds today:

  • double-quote: Use single-quoted strings: setting => 'I said, "Hello!"'
  • single-quote: Use double-quoted strings: setting => "I said, 'Hello!'"
  • backslash: Use a single backslash: setting => "C:\some\path"
  • tab: use an actual tab character. In some terminal-based editors, you can insert an actual tab character by pressing Control+V then pressing Tab afterwards.
  • newline: use an actual newline character:
setting => "this
is
multiple
lines"
@wiibaa
Copy link
Contributor

wiibaa commented Aug 19, 2014

For reference see related issue on jira and linked use cases (csv quote, exec command, mutate string)

@jordansissel
Copy link
Contributor Author

@wiibaa Thanks for linking in that jira ticket ❤️

@nigelzor
Copy link

'\' should be text with a single backslash

Seems to be missing from the proposal. Without this, there's no way to encode the sequence backslash-n without getting a newline instead.

'' should be text with a single backslash

Will be problematic, since that looks exactly like the start of a legitimate multi-line string (single quote, followed by whitespace, then arbitrary text). If multi-line strings aren't legal, or have additional escaping requirements, then that should be documented.

nigelzor added a commit to nigelzor/logstash that referenced this issue Aug 20, 2014
@jordansissel
Copy link
Contributor Author

Ugh markdown ruined my escaping, I guess. I did not intend to propose that
quote backslash backslash quote would be omitted ands also did not intend
that quote backslash quote would be valid.

On Tuesday, August 19, 2014, Neil Gentleman notifications@github.com
wrote:

'' should be text with a single backslash

Seems to be missing from the proposal. Without this, there's no way to
encode the sequence backslash-n without getting a newline instead.

'' should be text with a single backslash

Will be problematic, since that looks exactly like the start of a
legitimate multi-line string (single quote, followed by whitespace, then
arbitrary text). If multi-line strings aren't legal, or have additional
escaping requirements, then that should be documented.


Reply to this email directly or view it on GitHub
#1645 (comment)
.

@wiibaa
Copy link
Contributor

wiibaa commented Aug 20, 2014

Just giving my 2 cents, I have the feeling that the "historical reason" is that a regex-string would requires escaping of all backslashes to be interpreted correctly by the filter using them a lot (multiline and GROK) and it would be silly as stated by Jordan in the past, so the escaping was built-in in the config parser and a TODO was added in logstash 1.0.x and 1.1.x to manage proper regex config element.

Maybe allowing real regex object where it is due and then allowing string to be string (as mostly expected in ruby/java/bash) would be an alternative.

@jordansissel
Copy link
Contributor Author

@nigelzor I fixed the rendering problem markdown had. It was turning quote-backslash-backslash-quote into a single backslash.

@jordansissel
Copy link
Contributor Author

@wiibaa in logstash 1.2 and beyond you can do if [somefield] =~ /someregex/

I always intended to interpret backslash-as-escapes but never actually got around to it. It burdens users infrequently, but as we have more users, probability makes that affect a larger population of users, so I wanted to file about fixing it now.

@wiibaa
Copy link
Contributor

wiibaa commented Aug 20, 2014

@jordansissel what I meant is that you cannot do it inside filter config to disambigue between a pure string and a regex. So internally, sometimes a string is interpreted as a regex, sometimes it is not.
I hope you see what I mean. For me the following config could be the clearer solution to users instead of having the backslash escaping by default + exceptions documented and explained (for example "\s" should be also an exception not yet listed)

mutate {
  replace =>["message", /\n+/, "\t"]
}

or

grok {
  match => { "message" => /%{DATETIME} %{LOG} \t blabla \n %{GREEDYDATA}/ }
}

@splashx
Copy link

splashx commented Aug 4, 2015

is this issue related to this issue?

@jordansissel
Copy link
Contributor Author

@splashx yep :)

@splashx
Copy link

splashx commented Aug 5, 2015

@jordansissel oh no 😆

@acchen97
Copy link
Contributor

+1 as I just hit this today too

@bcubk
Copy link

bcubk commented Oct 22, 2015

Hi @jordansissel, are there any progress about this issue? Is there a timetable when it's fixed?

@nellicus
Copy link

+1 hit today (same as #3239 6 months ago)

@jordansissel
Copy link
Contributor Author

This change is likely to be resolved eventually, but not right now. There's a lot of work going on for the new centralized configuration feature that's coming soon and will impact how we store and represent the configuration of a logstash agent. I'd rather fix this problem at that time than try to fix it now and break everyone who is using the current not-so-good behavior.

@bcubk
Copy link

bcubk commented Oct 30, 2015

Thank you for this information 👍

@jhftrifork
Copy link

I'm not sure what to make of this. Could someone confirm that this is the current behavior?

Upon encountering a quote character, the Logstash config parser interprets this as a string literal, which is terminated by the next quote character of the same type as the opening quote. This string literal then represents the string of characters between the start and end quote characters in the surface syntax.

This means that a character sequence seq represents a string str if either

  • str contains no single-quotes, and seq = "'" + str + "'", or
  • str contains no double-quotes, and seq = '"' + str + '"'.

If str contains single-quotes and double-quotes, it is unrepresentable as a string literal in Logstash syntax. (Deal with it.)

@jordansissel
Copy link
Contributor Author

I've updated the issue's description to include some possible workarounds for specific characters. See the section under "Workarounds today:"

@jordansissel
Copy link
Contributor Author

@suyograo @acchen97 I'm open to adding this as an off-by-default feature as soon as maybe Logstash 5.2 or 5.3 timeframe.

@acchen97
Copy link
Contributor

acchen97 commented Dec 9, 2016

@jordansissel +1 let's try to target 5.2 for this. Can you please own this feature?

@frittentheke
Copy link

frittentheke commented Mar 20, 2017

Glad this issue is somewhat on your radar. The existing workarounds are helpful, but one thing is apparently still impossible to do with the current configuration parser: Using mutate+gsub to replace something matched with a double-quote character . i.e.

mutate { gsub => [ 'message', '\"','"' ] }

just like mentioned in logstash-plugins/logstash-filter-mutate#40.
And this is not a missing feature or cleanup, but a plain bugfix that has currently no workaround.

@jakommo
Copy link

jakommo commented Mar 31, 2017

@jordansissel @acchen97 any news on this?

@jordansissel
Copy link
Contributor Author

jordansissel commented Mar 31, 2017 via email

@binary132
Copy link

This makes it impossible to insert e.g. control characters.

@jordansissel
Copy link
Contributor Author

I have a branch where I am working on this as time permits. No ETA. https://github.com/elastic/logstash/compare/feature/config-string-escapes

@binary132
Copy link

I was able to work around this by manually inserting the character literal. 😄

@jordansissel
Copy link
Contributor Author

PR is open for this: #7442

@ph ph assigned jordansissel and unassigned jsvd Jun 13, 2017
jordansissel added a commit that referenced this issue Jun 16, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 16, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 20, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 20, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 20, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 20, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 22, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 22, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
jordansissel added a commit that referenced this issue Jun 23, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.
elasticsearch-bot pushed a commit that referenced this issue Jun 23, 2017
New boolean setting `config.support_escapes` which defaults to false
(the historical behavior). When set to true, the following escapes are
handled:

* backslash doublequote -> doublequote
* backslash quote -> quote
* backslash n -> newline (ascii 10)
* backslash r -> carriage return (ascii 13)
* backslash backslash -> backslash
* backslash t -> tab (ascii 9)

This will solve #1645.

Fixes #7516
@suyograo
Copy link
Contributor

Fixed in #7442

@Mekk
Copy link

Mekk commented Mar 29, 2019

Hmm. I configure fresh ELK system and it took me few days to find why my text contain \n (lteral \ then literal n).

Well, I wrote

mutate {  join => "somefield" => "\n" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests