Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Nokogiri 1.5.0 on libxml2 2.7.8 reading HTML line numbers as "0" #613

Open
kwhitaker opened this Issue · 3 comments

2 participants

@kwhitaker

Attempting to parse HTML files on CentOS5, running ruby 1.9.2, nokogiri 1.5.0, and libxml2 2.7.8.

Parsing a file with syntax like this:

html = Nokogiri::HTML(File.read('index.html'))
html.css("a").each {|href| puts href.line}

results in "0" for every line number. If I instead parse it as xml:

html = Nokogiri::XML(File.read('index.html'))

the line numbers will be displayed correctly. I know there was a previous issue with libxml2 2.7.3, and I also know that CentOS comes with libxml2 2.6.2. However, I've followed the tutorial for installation on the site, and built Nokogiri against libxml2 2.7.8. Here's my nokogiri -v output:

# Nokogiri (1.5.0)
    --- 
    warnings: []

    nokogiri: 1.5.0
    ruby: 
      version: 1.9.2
      platform: x86_64-linux
      description: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
      engine: ruby
    libxml: 
      binding: extension
      compiled: 2.7.8
      loaded: 2.7.8

I do still technically have libxml 2.6.2 installed on the system via yum, but it doesn't look like it's affected the nokogiri build. Is there some other step I should be using?

As an aside, if I must end up using Nokogiri::XML to parse the html, will it work with HTML4 and HTML5 documents, as well as XHTML?

Thanks.

@flavorjones
Owner

Hello!

Thanks for reporting this. I'm unable to reproduce it with:


# Nokogiri (1.5.0)
    ---
    warnings: []
    nokogiri: 1.5.0
    ruby:
      version: 1.9.2
      platform: x86_64-linux
      description: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      compiled: 2.7.8
      loaded: 2.7.8

So perhaps this is a problem either specific to your HTML file (can you provide it?) or your version of 1.9.2 (you have p0, I have p290) (can you upgrade it?).

@kwhitaker

Thanks for the response! Unfortunately, upgrading our version of Ruby at this time isn't really an option–all of our code has been built against p0, and we won't be upgrading it for a while probably.

Here is the html file:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="viewport" content="width=device-width; height=device-height, initial-scale=1.0; maximum-scale=1.0; user-scalable=0;" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Dockers</title>
</head>
<body class="fullpage-vert" onunload="javascript:clearInterval(audioLoop);">
<div id="container">
    <div id="danceHolder">
        <img id="danceVid" src="1-1.jpg" width="320" height="480" alt="" />
    </div>
    <div id="introHolder">
        <img id="introVid" src="0-1.jpg" width="320" height="480" alt="" />
        <div id="ctabg"></div>
        <div id="cta1"></div>
        <div id="cta2"></div>
        <div id="cta3"></div>
        <div id="phone"></div>
        <div id="logo"></div>
    </div>
</div>
</body>
</html>
@flavorjones
Owner

Well, I didn't mean "upgrade your production servers", I meant "can you try this on your dev machine with a different version of ruby". I'm trying to isolate what the cause could be, and as I mentioned before, we differ on the patchlevel of ruby we're running.

The HTML you included above doesn't appear to match well with the ruby script you included in the original post, since there are no "a" elements in it. That said, if I change the script to search for "div", I see line numbers appropriately, so we're left with either:

a) it has something to do with the version of Ruby you're on, or
b) it is something else that we don't know about yet

Please let me know if you're able to reproduce with a newer version of 1.9.2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.