Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Invalid Avro file causes very slow processing #66
I have a program that receives binary files that may or may not be Avro encoded.
try: with open("maybe_an_avro_file", 'rb') as fin: reader = fastavro.reader(fin) except Exception: #This file cannot be parsed as Avro, handle differently pass
However, there were a few files that caused fastavro to take a very long time trying to read the block count out of the (non-existent) schema header.
For example, a 2M file consumed a CPU for nearly 1 hour before eventually causing a Python OverflowError.
This is not a huge file, but there seems to be something that is not O(n), as it reads the first few bytes rapidly, then the performance rapidly drops.
This results in a few questions:
added a commit
Dec 8, 2016
Avro files has a "magic" header which you can check. I've added
Yes, will investigate more.