New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape handling #133

Merged
merged 22 commits into from Mar 30, 2018

Conversation

Projects
3 participants
@chriseppstein
Copy link
Collaborator

chriseppstein commented Mar 26, 2018

Fix numerous escaping issues and syntax handling regressions in 3.0.

Note: this PR is a Work-in-Progress. Feedback is welcome, commits are still forthcoming. Details will be added here as they develop.

TL;DR

Strings and identifiers from CSS files are now exposed to the API after unescaping them and API consumers going forward will set property values without containing any escape sequences. This optimizes the majority use-case of using standard CSS syntax and expecting it to Just Work™.

There are special APIs to use for setting values when normal CSS escaping rules are being intentionally broken -- these must be used whenever values are manipulating non-standard syntax. See methods setPropertyWithoutEscape, setPropertyAndEscape, and appendToPropertyAndEscape on the Node class.

BREAKING CHANGES: This PR introduces breaking changes. Where possible and likely to add the most value, deprecations have been added. But because it's impossible to know if a string has or hasn't been escaped by the caller, full deprecation handling is hard to achieve without materially making the new APIs worse and then introducing a subsequent deprecation/remove cycle back to the original API but with the new semantics. It's a lot of churn for API consumers, most of whom probably don't need to worry about whether things are escaped at all because they don't have a single escape in their stylesheets. For those users, this change appears as a bug fix (which indeed, it is).

Gory Details

Lexer

The lexer no longer breaks word tokens apart when an escape sequence is
encountered and word tokens starting with an escape continue past the first escape.

The lexer does not unescape. This allows the parser to capture the originally authored escape syntax and preserve it during stringification if a selector built from that token is left untouched.

AST

Some AST nodes had support for a raws option that mirrors the primary values for the node but has the semantics that the value is already escaped and can be placed directly into the CSS file during stringification. Before this PR, the raws object was primarily used for capturing comments encountered mid-identifier and other corner-case issues. This PR normalizes this pattern and makes it available to all AST nodes so that the raws object is always considered during stringification. By default now, setters are used for node properties that can have values can contain characters that require escaping or that might change the escape behavior of other properties. Those setters keep the raws object up to date. Setting to the raws object does not affect the escaped value in any way, the methods mentioned above are provided to keep the two attributes in sync without having to set the raws object directly.

Constructors for all node types expect property values to be unescaped. If the raws value for a property containing characters requiring an escape is not provided, it will be initialized to the escaped value.

Parser

The parser now sets properties to the escaped value and tracks the unescaped property values in raws for all node types with identifiers and strings.

Notes for Specific Types of Nodes

Node

  • appendToPropertyAndEscape(name, value, valueEscaped) - Adds to the existing value for the property named name for both the unescaped and escaped value. This is used by the parser in when tokens are encountered that should be included as part of the previous value. Some users may find it helpful as well, so it's public.
  • setPropertyAndEscape(name, value, valueEscaped) - Set the property's value and the escaped value explicitly.
  • setPropertyWithoutEscape(name, value) - Sets the property and the escape to the same value.
  • stringifyProperty(name) - returns the string value for a property by reading from raws and falling back to the unescaped value.

Attribute

The biggest impact to this strategy is on the Attribute selector where
escapes are not uncommon in attribute names and quoted values are a type
of escape and in common use.

Because of this, there are some new APIs and conventions for Attribute specifically tailored around managing attribute values. Because these APIs are intended to remain, there are now deprecation warnings for ambiguous situations where it seems likely that the user is expecting the old semantics.

The value property now tracks the unescaped and unquoted (quotes are a type of escape sequence) value. This means that the selectors [foo=bar] and [foo="bar"] have the same value property. In this case, they will differ according to the value of quoteMark which can have the values of a single quote character, a double quote character and null for identifiers. Setting value property directly will continue to use the existing quoteMark for the attribute's value. Use the setValue() method to set the value while also managing the quoteMark. This method (and some others discussed later) now take options that allow for specifying a strategy for the quote mark depending on the options passed and the value being set. When smart: true is specified, the quoteMark will intelligently depend on the value being set. Depending on the preferCurrentQuoteMark option, the existing value for quoteMark can take precedence over the quoteMark provided in the options. Setting the quoteMark property directly will change the quote for an attribute's value and immediately synchronizes the raws value for the value property.

Additionally, the following new helper methods were added for Attribute nodes:

  • getQuotedValue(quoteMarkOpts) - returns an escaped value that is quoted and escaped according to the quote mark options provided. If no options are passed, the existing quote mark value is used.
  • smartQuoteMark(quoteMarkOpts) - returns the smart quote mark result for the current value, that can be assigned to quoteMark to convert it.
  • preferredQuoteMark(quoteMarkOpts) - returns the preferred quote mark without considering the current value but after considering the options against the current value of quoteMark.
ClassName

ClassName used to inherit from Namespace but there's no legal CSS syntax for this and the string output with a namespace was just a simple concatenation. Furthermore, no tests failed when I changed the base class to Node. My assumption is that this was added to support some sort of future plans for non-standard syntax but that those plans never came to fruition. Removing it seems like the best way to make users tell me what the heck is going on there if they care about it 😅.

chriseppstein added some commits Mar 25, 2018

Fix numerous escaping issues in Attributes.
The lexer no longer breaks word tokens apart when an escape is
encountered. This will allow the parser to be simplified in a lot of
place, but that's not been done yet. But it does fix some places where
the parser wasn't handling the old way that the tokenizer split up
tokens.

This is the first commit of many where the parser will change to setting
values to the unescaped value and handle escaping those values back into
css automatically. It used to be that the API consumer had to handle
escapes on their own and that's not cool.

The biggest impact to this strategy is on the Attribute selector where
escapes are not uncommon in attribute names and quoted values are a type
of escape.

The parser now sets namespace, name  to the unescaped value.
The parser does not yet set the value to an unescaped value, that change
will follow.

The `raws` value will track the escaped value. Originally, it tracks the
exact escape sequences that were in the source, but once a value
changes, the raws value is updated and may be different. In general our
escape strategy tries to build the smallest escapes for a given
character.

The Attribute selector is much easier to mutate now, and there are
helper methods to work with the quoteMark and update the value and quote
mark at the same time. Where necessary setters on attribute are used to
keep raws values in sync.

A number of properties and some inputs to the constructor are deprecated now.
Setting an attribute's value to an escaped value is also deprecated,
where ambiguous, use setValue().
@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 27, 2018

Coverage Status

Coverage decreased (-2.9%) to 92.615% when pulling 7524537 on escape-handling into 7d5bd5b on master.

1 similar comment
@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 27, 2018

Coverage Status

Coverage decreased (-2.9%) to 92.615% when pulling 7524537 on escape-handling into 7d5bd5b on master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 27, 2018

Coverage Status

Coverage decreased (-0.5%) to 95.048% when pulling 9aa8646 on escape-handling into 7d5bd5b on master.

@chriseppstein chriseppstein added this to In progress in 4.0 Release Mar 27, 2018

* The case insensitivity flag or an empty string depending on whether this
* attribute is case insensitive.
*/
readonly insensitiveFlag : 'i' | '';

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

Not sure if we care, (or if this is handled in code and normalized) but the case insensitive flag is technically... case insensitive :) It can be i or I.

This comment has been minimized.

@chriseppstein

chriseppstein Mar 28, 2018

Author Collaborator

This was a good thing to note. the property always returns lowercase, but there were some other small issues around using a capital I.

*
* You can also change the quotation used for the current value by setting quoteMark.
**/
quoteValue(options?: SmartQuoteMarkOptions): string;

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

I would like to see parity with setValue and have this be getQuotedValue. This method reads to me like you're modifying the Attribute to add quotes, not getting the quoted value.

This comment has been minimized.

@chriseppstein

chriseppstein Mar 28, 2018

Author Collaborator

👍

* then the source quote mark (this is inverted if `preferCurrentQuoteMark` is
* true). If the quoteMark is unspecified, a double quote is used.
**/
smartQuoteMark(options: PreferredQuoteMarkOptions): QuoteMark;

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

Option: this method can be merged with preferredQuoteMark() to simply be getQuoteMark() if the options hash is optional. If no options hash is provided, it will intelligently quote as needed. If an options hash is provided, it will use the preferred settings.

This comment has been minimized.

@chriseppstein

chriseppstein Mar 28, 2018

Author Collaborator

You're describing the behavior of a private function that I have called _determineQuoteMark(options) which calls one of these other two depending on the value of the smart option. The name getQuoteMark(options) seems confusing to me because there's also a property quoteMark. But if no (or empty) options are passed to getQuoteMark(), it always returns the value of quoteMark so maybe it's fine.

* be interpreted as part of the value and escaped accordingly.
* @param value
*/
setValue(value: string, options?: SmartQuoteMarkOptions): void;

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

Should options be SmartQuoteMarkOptions | PreferredQuoteMarkOptions? What if I want to explicitly set the quote type when setting the value?

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

In fact, you could unify the two quote mark options types to a single type with all three as optional properties. That interface would make sense with the quote mark getters defined down below as well (if the two methods are unified as suggested).

This comment has been minimized.

@chriseppstein

chriseppstein Mar 28, 2018

Author Collaborator

SmartQuoteMarkOptions extends PreferredQuoteMarkOptions and all values are optional. So the distinction between these two types is just nominal to make clear that for some methods, the smart option, if passed, is ignored.

* @param {any} value the unescaped value of the property
* @param {string} valueEscaped optional. the escaped value of the property.
*/
appendToPropertyAndEscape(name, value, valueEscaped);

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

Where is appendToPropertyWithoutEscape()?

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

Actually, doesn't setPropertyAndEscape() do relatively the same thing as setPropertyWithoutEscape() if no valueEscaped arg is passed? 🤔 Do we need two methods for those?

This comment has been minimized.

@chriseppstein

chriseppstein Mar 28, 2018

Author Collaborator

where is appendToPropertyWithoutEscape()?

I had a method like that at one point, but I felt it wasn't clear that it would preserve the escaped value if there was an existing escaped value.

Actually, doesn't setPropertyAndEscape() do relatively the same thing as setPropertyWithoutEscape() if no valueEscaped arg is passed? 🤔 Do we need two methods for those?

Yes, technically, but the implementations are different and the naming felt all wrong. But after thinking about it, I think just setProperty will do with those semantics. scratch that, setProperty confuses me vs node.property = "value" which has very different semantics of escaping the value automatically. I think it's generally expected that assignment to foo should behave like setFoo(value) and I don't want to violate that expectation. Gonna just leave these as is.

* @param value the unescaped value of the property
* @param valueEscaped optional. the escaped value of the property.
*/
setPropertyAndEscape(name: string, value: any, valueEscaped: string);

This comment has been minimized.

@amiller-gh

amiller-gh Mar 27, 2018

Contributor

valueEscaped is marked as optional in the comments, but is not optional in the type definition.

chriseppstein added some commits Mar 28, 2018

Handle Comments with whitespace around combinators.
Comments were causing extra descendant combinator nodes to be created.

@chriseppstein chriseppstein force-pushed the escape-handling branch from 2b263de to e20cfbc Mar 29, 2018

chriseppstein added some commits Mar 29, 2018

@chriseppstein chriseppstein changed the title [WIP] Escape handling Escape handling Mar 30, 2018

@chriseppstein chriseppstein merged commit 9aa8646 into master Mar 30, 2018

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 95.048%
Details

@chriseppstein chriseppstein referenced this pull request Mar 30, 2018

Closed

Update API.md #136

@chriseppstein chriseppstein moved this from In progress to Done in 4.0 Release Mar 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment