-
-
Notifications
You must be signed in to change notification settings - Fork 19
Feature/#37 spark splitting #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
import scala.annotation.tailrec | ||
|
||
object OSMDataFinder { | ||
val pattern = Array[Byte](0x0A, 0x07, 0x4F, 0x53, 0x4D, 0x44, 0x61, 0x74, 0x61) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we pick these bytes here? Seems like magic number and not that safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the static value for block headers that are used in osm.pbf
This is the protobuf definition:
message BlobHeader {
required string type = 1;
optional bytes indexdata = 2;
required int32 datasize = 3;
}
And type
, for osm data blocks, is always OSMData
as protobuf string representation. It is part of the format spec.
https://wiki.openstreetmap.org/wiki/PBF_Format#File_format
https://wiki.openstreetmap.org/wiki/PBF_Format#Encoding_OSM_entities_into_fileblocks
https://wiki.openstreetmap.org/wiki/PBF_Format#Format_example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The reference is here, right?
00000090 __ 0a - S 1 'type'
00000090 __ __ 07 - length 7 bytes
00000090 __ __ __ 4f 53 4d 44 61 74 61 "OSMData"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. What I did, to double check, was to open a file with a hexadecimal editor. Only to confirm.
No description provided.