Skip to content

Capability to return a BytesIO/filelike even if it isn't encrypted? #85

@CDWimmer

Description

@CDWimmer

Hello,

I understand that this is potentially out of scope for the project, but considering the existence of OfficeFile.is_encrypted() I feel this would tie its usage up nicely.

I'll explain a use case via example:
I am using this to load up a set of usually-encrypted Excel files into pandas, this is great, except a handful of these Excel files have randomly have not been password protected. I don't actually care whether or not they have a password, I just want to put them all into dataframes.

Right now, the argument I pass to pandas.read_excel() is either a non-protected Excel file's Path, or a BytesIO objected retrieved using this library.

This is fine but it has resulted in this messy function:

def decrypt_office_file(file: Path, password: str = None) -> Union[io.BytesIO, Path]:
    decrypted_file = io.BytesIO()
    with open(file, 'rb') as f:
        office_file = msoffcrypto.OfficeFile(f)
        if office_file.is_encrypted():
            office_file.load_key(password=password)
            office_file.decrypt(decrypted_file)
        else:
            decrypted_file = file
    return decrypted_file


excel_file = decrypt_office_file("my_file.xlsx")
df = pd.read_excel(excel_file, ...)

And then I just have to hope everything downstream is cool with taking either a BytesIO or a str/Path, which is okay for pandas but I imagine is less okay for other libraries/use cases.

I'm not sure how it would be best to insert the functionality, but something like OfficeFile.to_bytes() (I'm sure there are better ideas for function names available) would be great, then we can have consistent return types.

I also find it really odd that .decrypt() takes the object you want to inject the file into as an argument, rather than returning a BytesIO object? It makes following the code flow feel awkward to me, but that's an issue for another day!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions