Added byte offsets to rule sets #85

G0dwin · 2017-03-09T04:16:48Z

I have added some basic logic to the parser to capture byte offsets when scanning for rule sets and added a field to RuleSet to retain them. This will allow code coverage and lint utilities (for example) to report where in a file a particular rule was found (see the CSS code coverage utility I'm working on here: lingua-franca/marmara).

There is still more work that needs to be done, primarily when it comes to differentiating files (from @import statement or otherwise), and when urls are expanded.

grosser · 2017-03-09T05:15:14Z

idk about this ... kind of neat feature ... but more work/memory for the 99% usecase
only thing striking me as a bit unfortunate is the addition of more optional argument to a few methods ... would be nice to make them keyword args ... but that's not easy without breaking existing usage ...

@akzhan thoughts ?

akzhan · 2017-03-09T08:18:56Z

It's nice contribution, but undone.

We need any use case to be bundled with (may be in examples folder or in wiki or read.me).

akzhan · 2017-03-09T08:23:40Z

Also it's not clear to detect line endings by Windows platform.

We may use "read binary mode" to keep line endings of file.

grosser · 2017-03-09T15:48:22Z

FYI https://github.com/giakki/uncss

…

On Thu, Mar 9, 2017 at 12:23 AM, Akzhan Abdulin ***@***.***> wrote: Also it's not clear to detect line endings by Windows platform. We may use "read binary mode" to keep line endings of file. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#85 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAsZwR4JguYFEaEOPpY2YBTKWIMbJdbks5rj7cMgaJpZM4MXoM_> .

G0dwin · 2017-03-09T15:55:54Z

@akzhan I can add documentation before merging, however, I'm not seeing a wiki, only the README.md file.

@grosser I can refactor so that we only collect offsets if an option is passed to load_uri! and try to ensure the former code path remains as close as possible to the original but only add the additional overhead when the option is present.

To get around the additional optional arguments, we could subclass RuleSet to something like FileRuleSet where the additional code for storing offsets and anything else specific to relating back to the original CSS code could be kept. In order to support import statements and multiple files in general, we will also need to store a filename.

akzhan · 2017-03-09T20:23:12Z

@G0dwin ok, we need

a) a section in README.
b) separate position-aware code.
c) replace CRLF detection with byte-precise code (File.read('rb') etc.).

…port tracing

G0dwin · 2017-03-13T03:00:10Z

Alright:
a) I added a section in the README
b) I added the option :capture_offsets to load_uri!, load_file!, and load_string!, we only look for, process, and store offsets if this flag is set. I also created a subclass of RuleSet called OffsetAwareRuleSet. This class takes in additional constructor params and stores the offsets. We only create the subclass if the flag is set.
c) I fixed CRLF detection though fixing the encoding, I didn't change the IO.read(...) call. I tested on Windows and Ubuntu machines to ensure the offsets were identical.

In addition, I added a benchmark rake task (rake benchmark) to test the difference between the old code and the new. I don't have an isolated machine to run the tests on but the difference seemed negligible, the last test I ran, parsing import.css 50 000 times took 50.7 seconds using the old code and 47.1 seconds with the new. Parsing screen.css 5000 times from dialect.ca took 65.9 seconds using the old code, and 63.7 seconds after my change. I would account the decrease to either a random variance, or due to some minor fixes I made to the code base along the way.

I also added a filename attribute to the OffsetAwareRuleSet with which I was able to get imports working with offset capturing. I removed the code I added to tests and isolated offset capturing tests into one test file.

Let me know if there are any other changes that you would like to see before merging, big or small.

Cheers,
godwin

akzhan · 2017-03-13T05:33:25Z

LGTM

grosser · 2017-03-13T05:50:37Z

.gitignore

@@ -1,2 +1,3 @@
 /pkg/
+/Gemfile.lock


please don't https://grosser.it/2015/08/14/check-in-your-gemfile-lock/

This is the .gitignore, the change was made in 0ed2f74. I'm going to take a closer look though, I'm not as familiar with the way that Github shows diffs, I'm surprised it's showing this and other changes as changes when they currently match the master.

grosser · 2017-03-13T05:51:18Z

README.md

@@ -53,6 +53,22 @@ parser.add_block!(css)
 parser.to_s
 => #content { font-size: 13px; line-height: 1.2; }
   body { margin: 0 1em; }
+
+# capturing byte offsets within a file
+parser.load_uri!('../style.css', {:base_uri => 'http://example.com/styles/inc/', :capture_offsets => true)


prefer 1.9 hash , base_uri: 'http://'

I can make this change

just landed readme update.

grosser · 2017-03-13T05:54:04Z

lib/css_parser/parser.rb

        token = matches[0]

+        # save the regex offset so that we know where in the file we are
+        offset = Regexp.last_match.offset(0) if options[:capture_offsets]


might be worth extracting that into a local variable if this code is used a lot

options[:capture_offsets]? Sure.

grosser · 2017-03-13T05:56:13Z

lib/css_parser/parser.rb

@@ -504,11 +577,11 @@ def read_remote_file(uri) # :nodoc:

          res = http.get(uri.request_uri, {'User-Agent' => USER_AGENT, 'Accept-Encoding' => 'gzip'})
          src = res.body
-          charset = fh.respond_to?(:charset) ? fh.charset : 'utf-8'
+          charset = res.respond_to?(:charset) ? res.encoding : 'utf-8'


we should get rid of this charset detection and drop 1.8 ... but that's for another PR

grosser · 2017-03-13T05:56:30Z

lib/css_parser/version.rb

@@ -1,3 +1,3 @@
 module CssParser
-  VERSION = "1.4.9".freeze
+  VERSION = "1.4.10".freeze


don't change version in the PR

This again seems to be coming from rebasing, I'll take a closer look.

grosser · 2017-03-13T05:57:22Z

test/test_css_parser_loading.rb

      cp_with_exceptions.load_uri!("#{@uri_base}/no-exist.xyz")
    end

+    uri_regex = Regexp.new(Regexp.escape("#{@uri_base}/no-exist.xyz"))
+    assert_match uri_regex, err.message


this might be simpler and produce a more readable error
err.message.must_include "#{@uri_base}/no-exist.xyz"

This came from ed148aa

fixed by e2c831e

grosser · 2017-03-13T05:57:57Z

test/test_css_parser_misc.rb

+
+  def test_content_with_data
+    rule = RuleSet.new('div', '{content: url(data:image/png;base64,LOTSOFSTUFF)}')
+    assert_match (/image\/png;base64,LOTSOFSTUFF/), rule.to_s


rule.to_s.must_include

fixed by e2c831e also

grosser · 2017-03-13T05:59:12Z

lib/css_parser/parser.rb

+    # Returns a string.
+    def ignore_pattern(css, regex, options)
+      # if we are capturing file offsets, replace the characters with spaces to retail the original positions
+      return css.gsub(regex) { |m| ' ' * m.length } if options[:capture_offsets]


might be easier to read with ` if ... else ...

grosser

look ok ... lots of commits, so squash would be nice

akzhan · 2017-03-13T07:24:17Z

Some commits related to rebase (Gemfile.lock, version up), don't worry.

akzhan · 2017-03-13T18:48:55Z

Just note that is't released as 1.5.0.pre.

Other updates should be proposed by other pull requests.

G0dwin added 2 commits March 8, 2017 20:06

Added byte offsets to rule sets

d1f37f3

File offsets were different between Windows and Linux

3536b79

mmb and others added 5 commits March 9, 2017 11:37

Include uri in RemoteFileError message. premailer#54

ed148aa

No Gemfile.lock wanted

0ed2f74

test for attribute content with data.

de0e19e

minor

20df364

Version bump to 1.4.10

25b1144

akzhan mentioned this pull request Mar 10, 2017

Return line number #48

Closed

akzhan and others added 12 commits March 11, 2017 02:32

drop trailing badge [ci skip]

7920c53

Isolated offset code behind the capture_offsets option and enabled im…

4729538

…port tracing

Added benchmark task

e5b924c

Fixed syntax error

b29d27a

Added byte offsets to rule sets

79898c5

File offsets were different between Windows and Linux

a4ac09f

Isolated offset code behind the capture_offsets option and enabled im…

b69f7d6

…port tracing

Added benchmark task

50306c4

Fixed syntax error

cfe9f66

Cleaned up and refactored

35e45f3

Rebased

03de2a9

Fixed line ending difference between windows and unix

e3a71da

akzhan requested a review from grosser March 13, 2017 05:32

grosser reviewed Mar 13, 2017

View reviewed changes

grosser approved these changes Mar 13, 2017

View reviewed changes

akzhan merged commit 4d0249b into premailer:master Mar 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added byte offsets to rule sets #85

Added byte offsets to rule sets #85

G0dwin commented Mar 9, 2017

grosser commented Mar 9, 2017

akzhan commented Mar 9, 2017 •

edited

Loading

akzhan commented Mar 9, 2017

grosser commented Mar 9, 2017 via email

G0dwin commented Mar 9, 2017

akzhan commented Mar 9, 2017

G0dwin commented Mar 13, 2017

akzhan commented Mar 13, 2017

grosser Mar 13, 2017

G0dwin Mar 13, 2017

grosser Mar 13, 2017

G0dwin Mar 13, 2017

akzhan Mar 14, 2017 •

edited

Loading

grosser Mar 13, 2017

G0dwin Mar 13, 2017

grosser Mar 13, 2017

grosser Mar 13, 2017

G0dwin Mar 13, 2017

grosser Mar 13, 2017

G0dwin Mar 13, 2017

akzhan Mar 14, 2017

grosser Mar 13, 2017

akzhan Mar 14, 2017

grosser Mar 13, 2017

G0dwin Mar 13, 2017

grosser left a comment

akzhan commented Mar 13, 2017

akzhan commented Mar 13, 2017

Added byte offsets to rule sets #85

Added byte offsets to rule sets #85

Conversation

G0dwin commented Mar 9, 2017

grosser commented Mar 9, 2017

akzhan commented Mar 9, 2017 • edited Loading

akzhan commented Mar 9, 2017

grosser commented Mar 9, 2017 via email

G0dwin commented Mar 9, 2017

akzhan commented Mar 9, 2017

G0dwin commented Mar 13, 2017

akzhan commented Mar 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akzhan Mar 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grosser left a comment

Choose a reason for hiding this comment

akzhan commented Mar 13, 2017

akzhan commented Mar 13, 2017

akzhan commented Mar 9, 2017 •

edited

Loading

akzhan Mar 14, 2017 •

edited

Loading