-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding messages with a unexpected bit_arr length silently fails returning None #46
Comments
The only fix I can see for this currently is to check the length of the bit array before each access which would still cause a significant slowdown... Perhaps this is a issue the bitarray library needs to solve |
Figured out why the error is not being returned the silent property for decoding defaults to true hiding errors... def decode(self, silent: bool = True) -> Optional["AISMessage"]:
"""
Decode the message content.
@param silent: Boolean. If set to true errors are ignored and None is returned instead
"""
msg = AISMessage(self)
try:
msg.decode()
except Exception as e:
if not silent:
raise e
return msg |
I'm thinking the issue is the message being decoded is part of a multipart message so even though the first section of data is fine the missing parts still result in a error being throw. |
Nevermind, getting a lot of errors with single part messages too. |
The fact that any exception that occurs during decoding is caught and thrown as |
So the issue with this and others appears to be that it's missing the last bit for the |
So looking over everything testing with 6 million records type 21 reports make up about 1%, while 40% of type 21 reports encounter this error and type 21 reports are getting 50x more errors than other types. Not sure if this is just due to type 21 reports being unlucky or if there is something funky with that last bit causing valid messages to be dropped.... On the one hand the bit is missing so it makes sense for the library to drop it, but on the other hand the number of messages failing to decode because of it seems to indicate that perhaps that bit isnt required or strictly sent all the time. |
Hey @Inrixia, thank you for bringing this up. Sadly, I am aware of this issue. This is why I started to refactor the whole project a while ago. I began to implement a more reliable and generous approach for encoding messages. You can see the current state by looking at encode.py. I still need to figure out, how to refactor the existing decoding part without breaking too many things. |
Ah thanks it certainly is a interesting problem. Having had a brief look over the rewrite it looks promising. Ideally it would be possible to optionally allow for messages missing bits to be decoded with missing values set to None, especially if the checksums are valid. As for the refractor I would think it's best to increment a major version number and just accept breaking changes. It would be a good opportunity to have things like the boolean types actually return booleans instead of 0 or 1. Currently I am having to cast boolean and enum types to their respective boolean and int values after decoding. And combining nmea message header information with decoded information as message decoding does not return information like the talker and fragment count etc. Curious on your thoughts and plans for where you want to go with it |
Part of the underlying issue is, that the NMEA specification is not freely available. Therefore, many decoders and encoders rely on unofficial resources like https://gpsd.gitlab.io/gpsd/AIVDM.html. According to this specification Type 21 messages should have between 272 and 360 bits:
So it seems that you example message "!AIVDM,1,1,,B,E>lt;KLab21@1bb@I@@@@@@@@@@D8k2tnmvs000003v0@,2*52" is actually invalid. In fact other online decoders such as http://ais.tbsalling.dk/ fail to decode this message for this exact reason. |
Yep this is the conclusion I came to. As for why roughly 50% of type 21 messages encounter this I don't know, it's certainly weird but I do beleive that failing to decode due to the missing bit is expected behavior with the current state of the library. Though as mentioned above having the ability to decode messages with a valid checksum and none values where the bits end early could be benificial. But at the same time allowing for these messages to be decoded could allow for actually corrupt messages to be, which is why it would need to be optional and not default behavior |
Generally I think that it it best to be very liberal in what to accept as the standard seems to be mostly a rough guideline for most encoders. :-D To allow for more liberal decoding I am planning to slightly change the way of accessing parts to the message. If we look at the decoding now, we can see, that I am currently slicing the bitarray explicitly: return {
'type': get_int_from_data(0, 6),
'repeat': get_int_from_data(6, 8),
'mmsi': get_mmsi(bit_arr, 8, 38),
'status': NavigationStatus(get_int_from_data(38, 42)),
'turn': get_int_from_data(42, 50, signed=True),
'speed': get_int_from_data(50, 60) / 10.0,
'accuracy': bit_arr[60],
'lon': get_int_from_data(61, 89, signed=True) / 600000.0,
'lat': get_int_from_data(89, 116, signed=True) / 600000.0,
'course': get_int_from_data(116, 128) * 0.1,
'heading': get_int_from_data(128, 137),
'second': get_int_from_data(137, 143),
'maneuver': ManeuverIndicator(get_int_from_data(143, 145)),
'raim': bit_arr[148],
'radio': get_int_from_data(149, len(bit_arr)),
} By doing to, it is obvious that the lib will run into an Instead, I think that it would be better to decode each part of the message until the end of the bitarray is reached. If the message is too short for every field, the remaining fields would be set to some default value. One could also set a |
My thoughts exactly. Though I would use None over a default to make it clear that no value was decoded, using a default would be misleading. |
Using None has the disadvantage that a user of the library would need to perform a None check explicitly for every message. This may lead to bugs and a lot of boilerplate code. |
Also type signatures would become messy |
Assuming allowing partial decoding is enabled by default I would assume that a flag would be provided on each decoded object if it failed to completely decode (easy to do just set it/remove it at the end). So a user would only need to check the flag, or set a option to have default values returned. But returning defaults is worse imo as it would allow for a user to not check the flag and treat invalid data as valid. Not to mention using defaults you have no way to identify what fields failed to decode |
I will need to think about this, when time has come. But this is a fairly small implementation detail, which can be changed easily later on. As my time is limited, I am planning the following steps:
Never the less, I am thinking about a quick fix for this issue: We could change the def get_int(data: bitarray, ix_low: int, ix_high: int, signed: bool = False) -> int:
if len(data) < ix_low:
return 0
ix_high = min(ix_high, len(data) - 1)
shift: int = (8 - ((ix_high - ix_low) % 8)) % 8
data = data[ix_low:ix_high]
i: int = from_bytes_signed(data) if signed else from_bytes(data)
return i >> shift Also all explict slices of the bitarray would be replaced with a call to
to
So |
I would be happy to assist in the rewrite, if you want help. That sounds like a good plan. But I don't think a quick fix of returning 0 would work as 0 is a valid value for many properties, especially considering that boolean values are returned as ints currently where The best approach is definately using the new method of decoding mentioned above accessing the array lowest index up. Then just catching if an exception occurs and optionally returning a partial object along with a error flag set or throwing up further if not. Of course this would require some consideration on if None should be explicitly set for invalid properties or if the kvp should just be missing due to how python treats those two instances differently. |
The main thing is being able to identify bad/failed properties if partial decoding is enabled. For me at least I can only allow for either a fully decoded object or one where I know exactly what properties failed or are missing. Having a partial one where I don't know what properties could be invalid would make the object useless as I couldn't trust the data |
Boolean and Integers are actually the same objects in Python, so I dont see any need to return a boolean: 1 == True
True
0 == False
True |
When accessed by other libraries or serialized out to json it is interpreted as a integer though requiring casting beforehand. |
I finished my work on adding support for encoding messages. So now there are classes for every message, which should make a iterative decoding approach quite feasible. I hope that I will find time to work on this soon. |
Sounds good, I would be happy to implement partial decoding myself and do a pull request but looks like only the encoding is using the new classes so will hold off until I know what you intend for the decoding. |
Hey @Inrixia, I added a very basic proof of concept on how we could reuse the existing classes used for encoding. See the following method: Line 245 in d643ebe
Actually, it would be fairly easy to reuse the classes without too many changes. I am thinking of the following:
What do you think? |
Looks good |
I implemented most of the logic. So the new structure of the project is basically done. The following things still need some love:
But most of the work should be done. Overall I am quite happy with the result. I really like the fact, that the message classes are somewhat declarative and not procedural anymore. Instead of telling the decoder class MessageType1:
msg_type = bit_field(6, int, default=1)
repeat = bit_field(2, int, default=0)
mmsi = bit_field(30, int, from_converter=from_mmsi, to_converter=to_mmsi)
...
|
I also decided, to publish the refactored library under a different name. Currently, there are at least 11 dependent packages of this lib. And because the interface changed significantly this could cause some irritation/confusion. I am thinking about the name |
Awesome, is the refactor just going to be on a seperate branch then? |
Not sure yet. I am think about moving the current state to a different branch (e.g version-1). Then Main would be the place for the new, active development. |
@Inrixia Sooo. Everything should be done. You can preview the newest version by installing: May I ask you to install this version and mess around with it? I would love to get some feedback on bugs or errors that I may have missed. :-) |
Happy to test it and see how things go. Could I get a link to the repo/branch that contains the new code? Thanks :) |
yep: https://github.com/M0r13n/pyais/tree/iterative-decoding you can also install it with pip:
|
Have tested using the new library on a few messages and looked into how it will integrate. Have not done an extensive test across message types to check for errors yet. The good:
The bad:
Overall its a massive improvement, and other than the few kinks to sort out with enums and Please let me know your thoughts |
Oh I should also note that currently I am merging the def resultFromNMEA(NMEAMessage):
result = {}
result["validChecksum"] = NMEAMessage.is_valid
result["message_fragments"] = NMEAMessage.message_fragments
result["fragment_number"] = NMEAMessage.fragment_number
# Named ais_id header_type to remove confusion between ais_id and message_id
# ais_id (header_type) is the ais message type
# message_id is incremented for each new multi-fragment message sent by a vessel.
# It allows a decoding program to match together fragments that belong to the same message.
result["header_type"] = NMEAMessage.ais_id
result["NMEATalker"] = NMEAMessage.talker.value
result["NMEAType"] = NMEAMessage.type
result["message_id"] = NMEAMessage.message_id
result["channel"] = NMEAMessage.channel
return result and then
Might be nice to have this merging supported directly by the library as the values provided by |
@Inrixia Thanks four your detailed reply. :-)
That's not so easy to achieve, because passing
True. But on the other hand the classes itself consume less space and attribute access is faster.
I would like to not do this. The beauty of the current solution is that all messages are self contained in a single class and every field is defined through a single Attribute. If we would define I did the test and benchmarked the following options:
I think
Not really. But I came up with a compromise: def asdict(self, enum_as_int: bool = False) -> typing.Dict[str, typing.Any]:
"""
Convert the message to a dictionary.
@param enum_as_int: If set to True all Enum values will be returned as raw ints.
@return: The message as a dictionary.
"""
if enum_as_int:
d = {slt: int(getattr(self, slt)) if slt in ENUM_FIELDS else getattr(self, slt) for slt in self.__slots__}
else:
d = {slt: getattr(self, slt) for slt in self.__slots__}
return d Using this approach you can get the values as ints when using
Your example is somewhat application specific. I added a more general implementation: def decode_and_merge(self, enum_as_int: bool = False) -> Dict[str, Any]:
"""
Decodes the message and returns the result as a dict together with all attributes of
the original NMEA message.
@param enum_as_int: Set to True to treat IntEnums as pure integers
@return: A dictionary that holds all fields, defined in __slots__ + the decoded msg
"""
rlt = self.asdict()
del rlt['bit_array']
decoded = self.decode()
rlt.update(decoded.asdict(enum_as_int))
return rlt |
Can you try it again? :-)
|
Yes, the enum only exists in the context of python so as soon as you want to write out the data to anything outside of python ie a file or database when serializing it the enum is turned into a int, there is a big difference between a default int value and Also just to be sure when you state
I would argue that this warrants actually checking the performance difference since presumably any system that is concerned with performance will be bulk processing or streaming data and accessing the entire object. But since using non slotted classes would require casting the enums to ints after the fact anyway it's potentially better to just go with the Also to improve performance of
This is pretty good performance, assuming those numbers are accurate looking at tests would be roughly 4-5% of processing time out of my whole decoding process, where it takes roughly ~20min per ~20mil records.
Yes, and I've just realized that I'm actually performing some logic between getting the NMEA message and actually decoding so the above won't be useful for me sorry, however it's probably still nice to include it for people who want to have both. |
Yep I'll checkout the new Currently the only things left is the issue with
|
If a field is @classmethod
def from_value(cls, v: typing.Optional[typing.Any]) -> typing.Optional["StationIntervals"]:
return cls(v) if v is not None else None So it will return None if None is passed as a value. Also I think that we shouldn't focus on performance too much. We should definitely avoid obvious performance bottlenecks, but I wouldn't focus on performance too much. If one needs really fast performance, there is the lib-ais project written in C++ which is ~3 times faster anyway. |
That looks great. |
Yep. That should work. |
Looks like there is a issue with
|
These are the current errors I'm seeing with a 10mil test dataset.
|
Also just want to double check have any of the types changed for values returned compared to v1 aside from the Booleans and Enums? |
Here are the error metrics using the fix in #50 on a large test dataset of ~230M reports.
Only error that stands out now is
But it's only for 47/232008704 reports so probably fine to ignore it. Outside of non throwable errors or issues with schema changes (see my previous question) I think with the implementation of #50 the implementation of iterative decoding is prettymuch done. |
Performance wise at least for my pipeline which includes some advanced multi part decoding I got roughly 27k/s/core (reports per second per core) which is pretty good. Note this includes things like writing out data so the actual perfomance of just the pyais library will be faster. |
Thank you very much for your detailed tests - I really appreciate it. 👍
This is closed thanks to your PR. I also added a unittest for this.
This is also fixed and should not occur anymore. The reason was, that the
This seems like an error of your code, right? I also added a new base exception:
I don't think so. May I ask where you got your large dataset from? It would be great, if I could use such a large dataset as kind of an integration test. Again, thanks for your help. :-) I guess, that we're ready for a v2 release, don't you think so? |
Not technically an error those are part 2 messages that were unable to be matched to a part 1 (I'm not using the built in matching from pyais), something you can ignore.
Unfortunately it's not publically available and I don't have permission to share it sorry <3
Definitely looks like it, if you want to wait a day or two I will re-decode the ~203M test dataset with the new changes to see if anything else pops up. After its ready for release I intend to re-decode the full historical dataset of 7Billion reports. With the new decoding method errors for the test dataset are down to 0.01% which is a order of magnitude improvement. |
I guess that I wait for you to decode the ~203M dataset. The new version will then be published next weekend! :-) |
Here is the result using the latest version:
Will proceed with decoding the full dataset now and update with the metrics for that. I reckon this is prettymuch ready for release |
Here are the results using the full dataset, a few new edge cases but overall extremely good.
|
Looks like |
Thanks again for your valuable input. I looked at those
These are obviously garbage, but should not cause any crashes non the less. Regarding your concerns regarding field names: You are right. I changed some names.
|
Version 2.0.0 is released. 🥳 I would suggest that we treat new errors in dedicated issues. This thread is already quite huge and covers lots of different topics. |
Awesome! And yes this did outgrow its initial cause. Thanks again for the help |
When decoding messages, if the
bit_arr
length is less than what is needed for every property expected to be decodedNone
is returned with no error.Test Code:
As you can see below I have commented out several properties except for
assigned
which attempts to access index 270 of thebit_arr
.When this is uncommented the returned value is None, however when commented (there are no properties that access outside of the bit_arr length) you can see the message decodes as expected.
Ideally properties that do not have data or fail to decode would result in None.
I appear to be getting this for most if not all of my type 21 messages.
The text was updated successfully, but these errors were encountered: