Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when building index from tesla_2021_10k.htm #14

Open
shenghu opened this issue Nov 22, 2023 · 0 comments
Open

Error when building index from tesla_2021_10k.htm #14

shenghu opened this issue Nov 22, 2023 · 0 comments

Comments

@shenghu
Copy link

shenghu commented Nov 22, 2023

I'm trying on Mac and get this error,

ValueError: 3 columns passed, passed data had 5 columns

The error is throw from the following function

def html_to_df(html_str: str) -> pd.DataFrame:
    """Convert HTML to dataframe."""
    from lxml import html

    tree = html.fromstring(html_str)
    table_element = tree.xpath("//table")[0]
    rows = table_element.xpath(".//tr")

    data = []
    for row in rows:
        cols = row.xpath(".//td")
        cols = [c.text.strip() if c.text is not None else "" for c in cols]
        data.append(cols)

    return pd.DataFrame(data[1:], columns=data[0])

Where

  • html_str is "





    ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
    "
  • data[0] is "['', '', '']"
  • data[1] is "['', '☒', '', 'ANNUAL REPORT PURSUA...CT OF 1934', '']"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant