Invalid Avro file causes very slow processing #66
I have a program that receives binary files that may or may not be Avro encoded.
try: with open("maybe_an_avro_file", 'rb') as fin: reader = fastavro.reader(fin) except Exception: #This file cannot be parsed as Avro, handle differently pass
However, there were a few files that caused fastavro to take a very long time trying to read the block count out of the (non-existent) schema header.
For example, a 2M file consumed a CPU for nearly 1 hour before eventually causing a Python OverflowError.
This is not a huge file, but there seems to be something that is not O(n), as it reads the first few bytes rapidly, then the performance rapidly drops.
This results in a few questions:
The text was updated successfully, but these errors were encountered:
Avro files has a "magic" header which you can check. I've added
Yes, will investigate more.