New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibe to remove nokogiri dependency? #100

Closed
mperham opened this Issue Feb 3, 2016 · 8 comments

Comments

Projects
None yet
4 participants
@mperham

mperham commented Feb 3, 2016

Nokogiri is a huge native dependency and every Rails install pulls in nokogiri through loofah. Would it be tractable to move to Oga, which has a much smaller native component or Ox?

@flavorjones

This comment has been minimized.

Show comment
Hide comment
@flavorjones

flavorjones Feb 3, 2016

Owner

Hi Mike,

Thanks for asking this question. I think we can get to a decision point pretty quickly, but first I'd like to provide some context.

Nokogiri, through either libxml2 or xerces/nekohtml, does a decent job of validating and fixing broken markup. Building a valid DOM is actually quite valuable for the sanitization function that Loofah is performing for Rails.

Historically, most XSS (and other HTML/js) attack vectors have intentionally used invalid or broken markup to sneak through sanitization. Some historical examples are here:

https://github.com/flavorjones/loofah/blob/master/test/assets/testdata_sanitizer_tests1.dat

which are taken directly from the html5lib test suite, and which Loofah runs in its own test suite.

We run these tests in Loofah's test suite to ensure that we're adequately protecting sites; though you'll notice that Loofah itself contains no logic to actually detect or handle invalid markup -- Loofah relies entirely on Nokogiri (and therefore libxml2 and xerces) to catch and remove-or-fix these cases.

All of this is to say that, if you wanted to experiment with porting Loofah to use Oga or Ox, as long Loofah's test suite and Rails's sanitization tests all pass, then we can have a more serious conversation about whether to replace Nokogiri entirely, or else have it use a pluggable parser.

Does all that make sense?

Owner

flavorjones commented Feb 3, 2016

Hi Mike,

Thanks for asking this question. I think we can get to a decision point pretty quickly, but first I'd like to provide some context.

Nokogiri, through either libxml2 or xerces/nekohtml, does a decent job of validating and fixing broken markup. Building a valid DOM is actually quite valuable for the sanitization function that Loofah is performing for Rails.

Historically, most XSS (and other HTML/js) attack vectors have intentionally used invalid or broken markup to sneak through sanitization. Some historical examples are here:

https://github.com/flavorjones/loofah/blob/master/test/assets/testdata_sanitizer_tests1.dat

which are taken directly from the html5lib test suite, and which Loofah runs in its own test suite.

We run these tests in Loofah's test suite to ensure that we're adequately protecting sites; though you'll notice that Loofah itself contains no logic to actually detect or handle invalid markup -- Loofah relies entirely on Nokogiri (and therefore libxml2 and xerces) to catch and remove-or-fix these cases.

All of this is to say that, if you wanted to experiment with porting Loofah to use Oga or Ox, as long Loofah's test suite and Rails's sanitization tests all pass, then we can have a more serious conversation about whether to replace Nokogiri entirely, or else have it use a pluggable parser.

Does all that make sense?

@flavorjones

This comment has been minimized.

Show comment
Hide comment
@flavorjones

flavorjones Feb 3, 2016

Owner

(Note my previous comment has been slightly edited.)

Owner

flavorjones commented Feb 3, 2016

(Note my previous comment has been slightly edited.)

@mperham

This comment has been minimized.

Show comment
Hide comment
@mperham

mperham Feb 3, 2016

Yeah, you're totally reasonable and I had a feeling this was a Big Change. Removing nokogiri as a Rails dependency is on my wish list but I guess I'll have to ask Santa this year.

mperham commented Feb 3, 2016

Yeah, you're totally reasonable and I had a feeling this was a Big Change. Removing nokogiri as a Rails dependency is on my wish list but I guess I'll have to ask Santa this year.

@mperham mperham closed this Feb 3, 2016

@YorickPeterse

This comment has been minimized.

Show comment
Hide comment
@YorickPeterse

YorickPeterse Feb 9, 2016

Replacing Nokogiri with X depends on how closely Loofah is tied into the API of Nokogiri. Oga's API is quite different in a bunch of places so I doubt a simple find-replace would suffice. For those interested there's a guide for migrating from Nokogiri to Oga which can be found at http://code.yorickpeterse.com/oga/latest/file.migrating_from_nokogiri.html.

Replacing Nokogiri with X depends on how closely Loofah is tied into the API of Nokogiri. Oga's API is quite different in a bunch of places so I doubt a simple find-replace would suffice. For those interested there's a guide for migrating from Nokogiri to Oga which can be found at http://code.yorickpeterse.com/oga/latest/file.migrating_from_nokogiri.html.

@flavorjones

This comment has been minimized.

Show comment
Hide comment
@flavorjones

flavorjones Feb 10, 2016

Owner

@YorickPeterse the point I was trying to make above is that this may not be as simple as a mechanical transformation from one backend's API to another's.

The behavior of the parser is actually important to the function of sanitization, and so I'm happy to have a conversation about using other backends if we know the test suite will pass.

Owner

flavorjones commented Feb 10, 2016

@YorickPeterse the point I was trying to make above is that this may not be as simple as a mechanical transformation from one backend's API to another's.

The behavior of the parser is actually important to the function of sanitization, and so I'm happy to have a conversation about using other backends if we know the test suite will pass.

@YorickPeterse

This comment has been minimized.

Show comment
Hide comment
@YorickPeterse

YorickPeterse Feb 10, 2016

@flavorjones Sure, a working test suite is absolutely vital. Also, if the choice is ever made to switch from Nokogiri to X a set of benchmarks would be useful to ensure performance isn't suddenly 9000 times worse. Either way if any help/info is ever needed for Oga feel free to ping me on GitHub or the Twitters.

@flavorjones Sure, a working test suite is absolutely vital. Also, if the choice is ever made to switch from Nokogiri to X a set of benchmarks would be useful to ensure performance isn't suddenly 9000 times worse. Either way if any help/info is ever needed for Oga feel free to ping me on GitHub or the Twitters.

@mikbe

This comment has been minimized.

Show comment
Hide comment
@mikbe

mikbe Jul 9, 2016

No one should be using Nokogiri until they can make it work on OS X with a simple gem install nokogiri. It's an abysmal mess and they don't seem willing to fix it.

mikbe commented Jul 9, 2016

No one should be using Nokogiri until they can make it work on OS X with a simple gem install nokogiri. It's an abysmal mess and they don't seem willing to fix it.

@flavorjones

This comment has been minimized.

Show comment
Hide comment
@flavorjones

flavorjones Jul 10, 2016

Owner

@mikbe, thanks for your comment.

I've actually participated in dozens of threads and issues for literally hundreds of hours of my life over the last few years, trying to diagnose a wide variety of issues related to Nokogiri working well on OSX.

I don't remember seeing your name in any of those threads. Are you familiar with the underlying challenges that you seem to think are so simple to fix? If so, why haven't you shared your diagnoses and participating in fixing the issues?

If not, then I don't understand why you'd make a comment like this. I see ruby-talk email from you in my inbox from 2012 and 2011 that appear to be from a kind, thoughtful, empathetic individual. Why did you take time out of your day to make this rude, disrespectful, and (worst of all) contentless comment, exactly? You know there are humans who read these comments, yes? And that those people are not obligated to provide services to you?

Don't bother responding. I'm going lock this ticket. Though I would certainly welcome your help in the future, should you decide to participate constructively in the community.

Owner

flavorjones commented Jul 10, 2016

@mikbe, thanks for your comment.

I've actually participated in dozens of threads and issues for literally hundreds of hours of my life over the last few years, trying to diagnose a wide variety of issues related to Nokogiri working well on OSX.

I don't remember seeing your name in any of those threads. Are you familiar with the underlying challenges that you seem to think are so simple to fix? If so, why haven't you shared your diagnoses and participating in fixing the issues?

If not, then I don't understand why you'd make a comment like this. I see ruby-talk email from you in my inbox from 2012 and 2011 that appear to be from a kind, thoughtful, empathetic individual. Why did you take time out of your day to make this rude, disrespectful, and (worst of all) contentless comment, exactly? You know there are humans who read these comments, yes? And that those people are not obligated to provide services to you?

Don't bother responding. I'm going lock this ticket. Though I would certainly welcome your help in the future, should you decide to participate constructively in the community.

Repository owner locked and limited conversation to collaborators Jul 10, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.