Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added byte offsets to rule sets #85

Merged
merged 19 commits into from
Mar 13, 2017
Merged

Conversation

G0dwin
Copy link
Contributor

@G0dwin G0dwin commented Mar 9, 2017

I have added some basic logic to the parser to capture byte offsets when scanning for rule sets and added a field to RuleSet to retain them. This will allow code coverage and lint utilities (for example) to report where in a file a particular rule was found (see the CSS code coverage utility I'm working on here: lingua-franca/marmara).

There is still more work that needs to be done, primarily when it comes to differentiating files (from @import statement or otherwise), and when urls are expanded.

@grosser
Copy link
Contributor

grosser commented Mar 9, 2017

idk about this ... kind of neat feature ... but more work/memory for the 99% usecase
only thing striking me as a bit unfortunate is the addition of more optional argument to a few methods ... would be nice to make them keyword args ... but that's not easy without breaking existing usage ...

@akzhan thoughts ?

@akzhan
Copy link
Member

akzhan commented Mar 9, 2017

It's nice contribution, but undone.

We need any use case to be bundled with (may be in examples folder or in wiki or read.me).

@akzhan
Copy link
Member

akzhan commented Mar 9, 2017

Also it's not clear to detect line endings by Windows platform.

We may use "read binary mode" to keep line endings of file.

@grosser
Copy link
Contributor

grosser commented Mar 9, 2017 via email

@G0dwin
Copy link
Contributor Author

G0dwin commented Mar 9, 2017

@akzhan I can add documentation before merging, however, I'm not seeing a wiki, only the README.md file.

@grosser I can refactor so that we only collect offsets if an option is passed to load_uri! and try to ensure the former code path remains as close as possible to the original but only add the additional overhead when the option is present.

To get around the additional optional arguments, we could subclass RuleSet to something like FileRuleSet where the additional code for storing offsets and anything else specific to relating back to the original CSS code could be kept. In order to support import statements and multiple files in general, we will also need to store a filename.

@akzhan
Copy link
Member

akzhan commented Mar 9, 2017

@G0dwin ok, we need

a) a section in README.
b) separate position-aware code.
c) replace CRLF detection with byte-precise code (File.read('rb') etc.).

@akzhan akzhan mentioned this pull request Mar 10, 2017
@G0dwin
Copy link
Contributor Author

G0dwin commented Mar 13, 2017

Alright:
a) I added a section in the README
b) I added the option :capture_offsets to load_uri!, load_file!, and load_string!, we only look for, process, and store offsets if this flag is set. I also created a subclass of RuleSet called OffsetAwareRuleSet. This class takes in additional constructor params and stores the offsets. We only create the subclass if the flag is set.
c) I fixed CRLF detection though fixing the encoding, I didn't change the IO.read(...) call. I tested on Windows and Ubuntu machines to ensure the offsets were identical.

In addition, I added a benchmark rake task (rake benchmark) to test the difference between the old code and the new. I don't have an isolated machine to run the tests on but the difference seemed negligible, the last test I ran, parsing import.css 50 000 times took 50.7 seconds using the old code and 47.1 seconds with the new. Parsing screen.css 5000 times from dialect.ca took 65.9 seconds using the old code, and 63.7 seconds after my change. I would account the decrease to either a random variance, or due to some minor fixes I made to the code base along the way.

I also added a filename attribute to the OffsetAwareRuleSet with which I was able to get imports working with offset capturing. I removed the code I added to tests and isolated offset capturing tests into one test file.

Let me know if there are any other changes that you would like to see before merging, big or small.

Cheers,
godwin

@akzhan akzhan requested a review from grosser March 13, 2017 05:32
@akzhan
Copy link
Member

akzhan commented Mar 13, 2017

LGTM

@@ -1,2 +1,3 @@
/pkg/
/Gemfile.lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the .gitignore, the change was made in 0ed2f74. I'm going to take a closer look though, I'm not as familiar with the way that Github shows diffs, I'm surprised it's showing this and other changes as changes when they currently match the master.

@@ -53,6 +53,22 @@ parser.add_block!(css)
parser.to_s
=> #content { font-size: 13px; line-height: 1.2; }
body { margin: 0 1em; }

# capturing byte offsets within a file
parser.load_uri!('../style.css', {:base_uri => 'http://example.com/styles/inc/', :capture_offsets => true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer 1.9 hash , base_uri: 'http://'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this change

Copy link
Member

@akzhan akzhan Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just landed readme update.

token = matches[0]

# save the regex offset so that we know where in the file we are
offset = Regexp.last_match.offset(0) if options[:capture_offsets]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth extracting that into a local variable if this code is used a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options[:capture_offsets]? Sure.

@@ -504,11 +577,11 @@ def read_remote_file(uri) # :nodoc:

res = http.get(uri.request_uri, {'User-Agent' => USER_AGENT, 'Accept-Encoding' => 'gzip'})
src = res.body
charset = fh.respond_to?(:charset) ? fh.charset : 'utf-8'
charset = res.respond_to?(:charset) ? res.encoding : 'utf-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should get rid of this charset detection and drop 1.8 ... but that's for another PR

@@ -1,3 +1,3 @@
module CssParser
VERSION = "1.4.9".freeze
VERSION = "1.4.10".freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change version in the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again seems to be coming from rebasing, I'll take a closer look.

cp_with_exceptions.load_uri!("#{@uri_base}/no-exist.xyz")
end

uri_regex = Regexp.new(Regexp.escape("#{@uri_base}/no-exist.xyz"))
assert_match uri_regex, err.message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be simpler and produce a more readable error
err.message.must_include "#{@uri_base}/no-exist.xyz"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from ed148aa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by e2c831e


def test_content_with_data
rule = RuleSet.new('div', '{content: url(data:image/png;base64,LOTSOFSTUFF)}')
assert_match (/image\/png;base64,LOTSOFSTUFF/), rule.to_s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule.to_s.must_include

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by e2c831e also

# Returns a string.
def ignore_pattern(css, regex, options)
# if we are capturing file offsets, replace the characters with spaces to retail the original positions
return css.gsub(regex) { |m| ' ' * m.length } if options[:capture_offsets]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be easier to read with ` if ... else ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor

@grosser grosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look ok ... lots of commits, so squash would be nice

@akzhan
Copy link
Member

akzhan commented Mar 13, 2017

Some commits related to rebase (Gemfile.lock, version up), don't worry.

@akzhan akzhan merged commit 4d0249b into premailer:master Mar 13, 2017
@akzhan
Copy link
Member

akzhan commented Mar 13, 2017

Just note that is't released as 1.5.0.pre.

Other updates should be proposed by other pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants