Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ENH: read-html fixes #3616
Conversation
This was referenced May 15, 2013
|
let me know when you need merging on any PR's |
|
this closes #3606, right? |
|
yep, that is fixed already. i might be able to get to the rest of this today, i know the 0.11.1 rls is due today...the annoyances of the parsing may have to wait tho or i might just open up the flavor argument to allow one of |
|
the main issue is the import errors..... |
|
that is also fixed in this. |
|
yep... |
|
take your time...btw |
|
ok thanks. i'm working on a cmdline interface to store neurophys data and it's due tmrw so pandas may have to wait... |
|
@jreback @y-p what do u think about removing the pure |
|
@cpcloud I would rather see correct and slow then wrong but fast! let's see premature optimization is evil Can always add it back in 0.12 (or after) if you discover how to fix it. And you have the flavor option, so sort of 'easy' to add it. (course have to edit stuff to take it out...docs,install docs,docstrings...) of course if there are cases where lxml can do better (and is correct), but bombs on other cases, then you could always raise on those (but that may be more trouble than its worth) |
|
i think the xpath implementation of lxml might be broken... :( @jreback can i leave the code for |
|
ok |
|
@jreback this ready 2 go as soon as travis passes. |
|
that is odd. travis is not running arg |
|
ah there we go |
jreback
merged commit a8723a4
into pandas-dev:master
May 20, 2013
|
@cpcloud thanks...this is great... I edited the v0.11.1 a bit (as this is new, just announcing it). I think an example is warranted. Maybe take a df, do a separate PR |
|
see this: https://www.travis-ci.org/pydata/pandas/jobs/7320947 I don't think travis was actually testing html5lib stuff....(I just added it in) add in ci/install.sh (right after bs4)....and test |
|
going to put in a separate issue |
|
ok. |
cpcloud commentedMay 15, 2013
Some updates and bug fixes. See release notes for more details.
sort of pointless right now since we don't really have control over the speed of the parsing libraryvbenchstuffFigure out whyreported a bug w/ example to lxml peoplelxmlchooses to ignore thingsFigure out whysame as abovebs4'sthead.find_all(['th', 'td'])parses differently thanlxml'sthead.xpath('.//thead//th|.//thead//td')even whenlxmlis thebs4backend.