-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Hello,
I understand that this is potentially out of scope for the project, but considering the existence of OfficeFile.is_encrypted() I feel this would tie its usage up nicely.
I'll explain a use case via example:
I am using this to load up a set of usually-encrypted Excel files into pandas, this is great, except a handful of these Excel files have randomly have not been password protected. I don't actually care whether or not they have a password, I just want to put them all into dataframes.
Right now, the argument I pass to pandas.read_excel() is either a non-protected Excel file's Path, or a BytesIO objected retrieved using this library.
This is fine but it has resulted in this messy function:
def decrypt_office_file(file: Path, password: str = None) -> Union[io.BytesIO, Path]:
decrypted_file = io.BytesIO()
with open(file, 'rb') as f:
office_file = msoffcrypto.OfficeFile(f)
if office_file.is_encrypted():
office_file.load_key(password=password)
office_file.decrypt(decrypted_file)
else:
decrypted_file = file
return decrypted_file
excel_file = decrypt_office_file("my_file.xlsx")
df = pd.read_excel(excel_file, ...)And then I just have to hope everything downstream is cool with taking either a BytesIO or a str/Path, which is okay for pandas but I imagine is less okay for other libraries/use cases.
I'm not sure how it would be best to insert the functionality, but something like OfficeFile.to_bytes() (I'm sure there are better ideas for function names available) would be great, then we can have consistent return types.
I also find it really odd that .decrypt() takes the object you want to inject the file into as an argument, rather than returning a BytesIO object? It makes following the code flow feel awkward to me, but that's an issue for another day!