Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/TST: read_html should follow pandas conventions when creating empty data #6447

Merged
merged 1 commit into from
Feb 27, 2014
Merged

BUG/TST: read_html should follow pandas conventions when creating empty data #6447

merged 1 commit into from
Feb 27, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 22, 2014

closes #5129
closes #6445

This a very specific bug fix. Only using lxml exposes this bug, whereas using
bs4 raises an exception. lxml drops data which allows it to parse the multiindex
header differently and succeed.

@cpcloud cpcloud self-assigned this Feb 23, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Feb 24, 2014

@jreback what do u think about this? this bug makes me think we should change the default flavor to bs4 because that forces you to say "i'm ok with dropping data" whereas bs4 will keep data raise b/c there's empty data in the header rows

@cpcloud
Copy link
Member Author

cpcloud commented Feb 24, 2014

of course that is backwards incompatible .... but only in very few cases i think ... just not sure ... it's annyoing that the two parsers will not parse the same gvien the same data

@cpcloud
Copy link
Member Author

cpcloud commented Feb 24, 2014

regardless ... the nan should be changed to the empty string bc that's how the text parser detects empty multiindexes

@jreback
Copy link
Contributor

jreback commented Feb 24, 2014

no idea as I really don't use this

go with consistency if u can

@cpcloud
Copy link
Member Author

cpcloud commented Feb 27, 2014

@jreback going to merge after travis passes ...

@cpcloud cpcloud closed this Feb 27, 2014
@cpcloud cpcloud deleted the read-html-float-iterable-5129 branch February 27, 2014 17:59
@jreback
Copy link
Contributor

jreback commented Feb 27, 2014

sure

@cpcloud cpcloud restored the read-html-float-iterable-5129 branch February 27, 2014 18:08
@cpcloud
Copy link
Member Author

cpcloud commented Feb 27, 2014

Alright I'm going to reopen this, I screwed something up with git.

@cpcloud cpcloud deleted the read-html-float-iterable-5129 branch February 27, 2014 18:10
@cpcloud cpcloud restored the read-html-float-iterable-5129 branch February 27, 2014 18:10
@cpcloud cpcloud reopened this Feb 27, 2014
cpcloud added a commit that referenced this pull request Feb 27, 2014
BUG/TST: read_html should follow pandas conventions when creating empty data
@cpcloud cpcloud merged commit fb6b803 into pandas-dev:master Feb 27, 2014
@cpcloud cpcloud deleted the read-html-float-iterable-5129 branch February 27, 2014 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_html shouldn't fail when ISPs reroute nonexistent URLs Exception with read_html and header
2 participants