Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

以openpyxl为例, 不要太信任你所使用的库 #55

Closed
ferstar opened this issue Mar 16, 2022 · 0 comments
Closed

以openpyxl为例, 不要太信任你所使用的库 #55

ferstar opened this issue Mar 16, 2022 · 0 comments
Labels

Comments

@ferstar
Copy link
Owner

ferstar commented Mar 16, 2022

最近在使用openpyxl时踩了个小坑: 遍历 Excel 时openpyxl可能会由于对行数的误判而提前终止

测试 Excel: sample.xlsx 这个 Excel 实际只有 10 行

代码(分别使用了openpyxl/pylightxl/xlrd来计算示例文件的行数):

import openpyxl
import pylightxl
import xlrd


def openpyxl_count(path):
    """
    Counts the number of rows in an Excel file with openpyxl.
    """
    wb = openpyxl.load_workbook(path)
    sheet = wb.active
    return sheet.max_row


def pylightxl_count(path):
    """
    Counts the number of rows in an Excel file with pylightxl.
    """
    db = pylightxl.readxl(path)
    ws = db.ws(db.ws_names[0])
    return ws.maxrow


def xlrd_count(path):
    """
    Counts the number of rows in an Excel file with xlrd.
    """
    wb = xlrd.open_workbook(path)
    sheet = wb.sheet_by_index(0)
    return sheet.nrows


if __name__ == '__main__':
    excel_path = 'sample.xlsx'
    print(openpyxl_count(excel_path))
    print(pylightxl_count(excel_path))
    print(xlrd_count(excel_path))

结果很奇怪, openpyxl得出14行的结论, 其他两个工具结论正确10, 实际上使用 MS office 打开也确实显示只有10

哪里出问题了呢? 我拆开这个文档, 查了下xml文件, 发现后四行确实是存在的, 只不过没有内容

也就是说从视觉上看openpyxl是错的, 但从真实数据上看他又是对的

甲方客户可不管真实底层数据的情况, 他只关心他给你了 N 行数据, 结果你给人整了 M 行, 此为坑也

排坑无非两个方案: 要么换库, 要么把源文件改造一下

我选改造源文件, 即遍历 Excel 前先把xml中隐藏的空值行全干掉, 这样就欧了, 同时也提了个 issue 给openpyxl项目: https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1806

但官方不认为这是个 bug, 因为 xml 文件确实是有内容的, 只不过是几个空行, 尴尬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant