Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_html does not parse correctly the header of non-string columns #5048

Closed
alefnula opened this issue Sep 29, 2013 · 2 comments · Fixed by #4770

Comments

@alefnula
Copy link
Contributor

commented Sep 29, 2013

I presume that the problem is that the data is first parsed and then the header is selected out. But when the dtype of the column is a number type the item that should become the column name, since it's not a valid number, becomes NaN.

Sample data:

data1 = io.StringIO(u'''<table>
    <thead>
        <tr>
            <th>Country</th>
            <th>Municipality</th>
            <th>Year</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Ukraine</td>
            <th>Odessa</th>
            <td>1944</td>
        </tr>
    </tbody>
</table>''')
data2 = io.StringIO(u'''
<table>
    <tbody>
        <tr>
            <th>Country</th>
            <th>Municipality</th>
            <th>Year</th>
        </tr>
        <tr>
            <td>Ukraine</td>
            <th>Odessa</th>
            <td>1944</td>
        </tr>
    </tbody>
</table>''')

Output:

>>> pd.read_html(data1)[0]
   Country Municipality  Year
0  Ukraine       Odessa  1944
>>> pd.read_html(data2, header=0)[0]
0  Country Municipality   NaN
1  Ukraine       Odessa  1944

@ghost ghost assigned cpcloud Sep 29, 2013

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 29, 2013

@alefnula Excellent. You essentially wrote the test for me :)

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 29, 2013

great this is now fixed in my refactor ... didn't have to do anything :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.