Parse Excel from in-memory file object #1529
We came across a situation where we had a file object representing Excel data (came from HTTP POST but I'm thinking it could also come from MongoDB for example), and would've liked to pass it directly to Pandas to parse (vs saving it to disk and passing path to Pandas).
Could this be possible?
I saw that xlrd had
Don't know if that could work for openpyxl also.
Thanks for the quick reply!
It seems that
They all have the same logic which basically is:
if file_contents: filestr = file_contents else: f = open(filename, 'rb') filestr = f.read() f.close()
So they don't check a single variable to see what type it is (
Looked at openpyxl, and they do the check "file object vs path" themselves (https://github.com/chronossc/openpyxl/blob/master/openpyxl/reader/excel.py#L43). So would mean only doing it for xlrd.
Tell you what, I feel energetic, so I'll try and look into it :)
Haha well done! I didn't know we were racing ;)
Yeah didn't get a chance to work on it as much as I wanted. But I did spot:
I also saw that xlrd was going to support Excel 2007 in future versions, I don't know what you want to do about that, ie keep using both, or switch to only xlrd.
I didn't know about
def _excel_type(filepath_or_buffer): # Thanks to xlrd for this peeksz = 4 if isinstance(filepath_or_buffer, str): f = open(filepath_or_buffer, "rb") peek = f.read(peeksz) f.close() elif hasattr(filepath_or_buffer, 'read') \ and hasattr(filepath_or_buffer, 'seek'): f = filepath_or_buffer peek = f.read(peeksz) f.seek(0) else: raise TypeError("You must provide the path to a file " "or a file-like object") # Check if ZIP file if peek == "PK\x03\x04" \ or peek == "PK\x03\x04".encode('latin1'): # Python 3 return 'xlsx' else: return 'xls'
Then I would've checked the type (and I like your way of checking for file-like, ie needs a
wb = xlrd.open_workbook(filename=filename) # or f = open(filename, 'rb') bytes = f.read() f.close() wb = xlrd.open_workbook(file_contents=bytes)
I guess the only advantage there is it saves having to use a tempfile and an I/O trip to the disk. But your solution has the advantage that it just works, and also is compatible with openpyxl.
Thanks for taking the time!
Just a couple small things: