New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBF parsing issue with Mapbox's PBF library #1

Closed
DenisCarriere opened this Issue Jan 5, 2018 · 20 comments

Comments

Projects
None yet
2 participants
@DenisCarriere
Collaborator

DenisCarriere commented Jan 5, 2018

PBF parsing issue with Mapbox's PBF library

CC: @kpwebb

The issue is happening right at the beginning, the first few bytes are being skipped because it doesn't know what they are.

Basic Example (breaks)

const pbf = require('pbf')

const buffer = fs.readFileSync('11-602-769.geometry.pbf')
new Pbf(buffer).readFields((tag, data, pbf) => {
  // Error Message
  // Unimplemented type: 4
})

Hack to make it work

When commenting out this line: https://github.com/mapbox/pbf/blob/master/index.js#L41

It prevents this.skip(val) from occuring which also checks if the value is valid.

Error Message

test.js ............................................... 1/3
  sharedstreets-pbf -- geometry
  not ok Unimplemented type: 4
    stack: |
      Pbf.skip (node_modules/pbf/index.js:191:20)
      Pbf.readFields (node_modules/pbf/index.js:41:45)
      Object.geometry (index.js:13:129)
      glob.sync.forEach.filepath (test.js:13:37)
      Array.forEach (<anonymous>)
      Test.t (test.js:10:67)
    at:
      line: 191
      column: 20
      file: node_modules/pbf/index.js
      function: Pbf.skip
    test: sharedstreets-pbf -- geometry
    source: |
      else throw new Error('Unimplemented type: ' + type);

  sharedstreets-pbf -- intersection
  not ok Unimplemented type: 7
    stack: |
      Pbf.skip (node_modules/pbf/index.js:191:20)
      Pbf.readFields (node_modules/pbf/index.js:41:45)
      Object.intersection (index.js:28:135)
      glob.sync.forEach.filepath (test.js:26:37)
      Array.forEach (<anonymous>)
      Test.t (test.js:23:71)
    at:
      line: 191
      column: 20
      file: node_modules/pbf/index.js
      function: Pbf.skip
    test: sharedstreets-pbf -- intersection
    source: |
      else throw new Error('Unimplemented type: ' + type);
@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 5, 2018

Collaborator

Would there be anything "special" on the first byte/position?

The skip(value) only gets fired if at the start position.

if (this.pos === startPos) this.skip(val);

Maybe a custom header?

Collaborator

DenisCarriere commented Jan 5, 2018

Would there be anything "special" on the first byte/position?

The skip(value) only gets fired if at the start position.

if (this.pos === startPos) this.skip(val);

Maybe a custom header?

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 5, 2018

Collaborator

Got a new error on this,

 Expected varint not more than 10 bytes

@kpwebb Does that mean anything to you?

Collaborator

DenisCarriere commented Jan 5, 2018

Got a new error on this,

 Expected varint not more than 10 bytes

@kpwebb Does that mean anything to you?

DenisCarriere added a commit that referenced this issue Jan 5, 2018

@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 5, 2018

Member

Oh this is a probably delimited encoding issue. Per google spec for large data sets each message is preceded by a length value:

https://developers.google.com/protocol-buffers/docs/techniques#streaming

We used the Google Java lib's writeDelimted to generate the messages. Is it possible to read the read the varint with the size and then read x bytes from the stream as message, followed by the next varint? Looks like this should be possible with the customer reading API:

https://github.com/mapbox/pbf#custom-reading

Member

kpwebb commented Jan 5, 2018

Oh this is a probably delimited encoding issue. Per google spec for large data sets each message is preceded by a length value:

https://developers.google.com/protocol-buffers/docs/techniques#streaming

We used the Google Java lib's writeDelimted to generate the messages. Is it possible to read the read the varint with the size and then read x bytes from the stream as message, followed by the next varint? Looks like this should be possible with the customer reading API:

https://github.com/mapbox/pbf#custom-reading

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 5, 2018

Collaborator

Well the issue with the custom reading is I can't even get to that part, it breaks when it loads.

new Pbf(buffer).readFields((tag, data, pbf) => {
  // breaks before here (where custom reading would be)
})
Collaborator

DenisCarriere commented Jan 5, 2018

Well the issue with the custom reading is I can't even get to that part, it breaks when it loads.

new Pbf(buffer).readFields((tag, data, pbf) => {
  // breaks before here (where custom reading would be)
})
@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 5, 2018

Collaborator

@kpwebb

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

That makes sense, pretty sure that's making the Mapbox PBF library break (I understand why they chose not to support that).

Any advantage to determine where each message begins or ends? The VectorTile schema doesn't do that and you can parse the PBF into a GeoJSON FeatureCollection in the same order.

Collaborator

DenisCarriere commented Jan 5, 2018

@kpwebb

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)

That makes sense, pretty sure that's making the Mapbox PBF library break (I understand why they chose not to support that).

Any advantage to determine where each message begins or ends? The VectorTile schema doesn't do that and you can parse the PBF into a GeoJSON FeatureCollection in the same order.

@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 5, 2018

Member

It's a good question!

I'm following Google's recs for encoding large repeated data sets. They discourage this for messages > 1MB. Our SharedStreets tiles are well above that limit unlike vector tiles. At z=11 we have tiles in dense urban environments that exceed 5MB.

I'm not sure about when this tradeoff really becomes a problem, my understanding is biggest risk is memory overhead for decoding large messages.

Member

kpwebb commented Jan 5, 2018

It's a good question!

I'm following Google's recs for encoding large repeated data sets. They discourage this for messages > 1MB. Our SharedStreets tiles are well above that limit unlike vector tiles. At z=11 we have tiles in dense urban environments that exceed 5MB.

I'm not sure about when this tradeoff really becomes a problem, my understanding is biggest risk is memory overhead for decoding large messages.

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 5, 2018

Collaborator

From what I've noticed when writing the parser, not having the delimited isn't a deal breaker, I can still figure out where the message starts/ends by just using the positions.

https://github.com/sharedstreets/sharedstreets-js-pbf/blob/master/index.mjs#L40-L76

Collaborator

DenisCarriere commented Jan 5, 2018

From what I've noticed when writing the parser, not having the delimited isn't a deal breaker, I can still figure out where the message starts/ends by just using the positions.

https://github.com/sharedstreets/sharedstreets-js-pbf/blob/master/index.mjs#L40-L76

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 6, 2018

Collaborator

@kpwebb Having no luck with Mapbox's pbf library. Seems like the writeDelimted is causing some issues even when using the "compile a JavaScript module from a .proto file".

CLI

$ pbf sharedstreets.proto > sharedstreets.js

Node.js (Breaks)

const Pbf = require('pbf')
const {SharedStreetsGeometry} = require('./sharedstreets.js')
const buffer = fs.readFileSync('11-602-769.geometry.pbf')

// read
const pbf = new Pbf(buffer)
const obj = SharedStreetsGeometry.read(pbf)
// Error => Unimplemented type: 4

Would it be possible to drop the the delimited encoding? It seems more of a hassle then a benefit.

Unless we try to find a solution to add Delimiter to the Mapbox pbf library, doesn't seem like an easy one.

Collaborator

DenisCarriere commented Jan 6, 2018

@kpwebb Having no luck with Mapbox's pbf library. Seems like the writeDelimted is causing some issues even when using the "compile a JavaScript module from a .proto file".

CLI

$ pbf sharedstreets.proto > sharedstreets.js

Node.js (Breaks)

const Pbf = require('pbf')
const {SharedStreetsGeometry} = require('./sharedstreets.js')
const buffer = fs.readFileSync('11-602-769.geometry.pbf')

// read
const pbf = new Pbf(buffer)
const obj = SharedStreetsGeometry.read(pbf)
// Error => Unimplemented type: 4

Would it be possible to drop the the delimited encoding? It seems more of a hassle then a benefit.

Unless we try to find a solution to add Delimiter to the Mapbox pbf library, doesn't seem like an easy one.

@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 15, 2018

Member

We need to use decodeDelimited as these tiles exceed Google's recommended size for a single protobuf message. I'm going to try and switch this over to using the decodeio/protobuf.js project.

Member

kpwebb commented Jan 15, 2018

We need to use decodeDelimited as these tiles exceed Google's recommended size for a single protobuf message. I'm going to try and switch this over to using the decodeio/protobuf.js project.

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 15, 2018

Collaborator

👍 No worries, that works for me, I can tackle converting it to protobuf.js.

I just found it strange that Mapbox's pbf library didn't support decodeDelimited.

Collaborator

DenisCarriere commented Jan 15, 2018

👍 No worries, that works for me, I can tackle converting it to protobuf.js.

I just found it strange that Mapbox's pbf library didn't support decodeDelimited.

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 16, 2018

Collaborator

@kpwebb Let me work on this one today, I'm going to switch it to Typescript that way we can keep the SharedStreets types consistent across all Javascript code bases.

Collaborator

DenisCarriere commented Jan 16, 2018

@kpwebb Let me work on this one today, I'm going to switch it to Typescript that way we can keep the SharedStreets types consistent across all Javascript code bases.

@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 16, 2018

Member

@DenisCarriere I just checked in what I've done on a the protobufjs branch. I tried using the pbjs and pbts CLI tools to compile proto files to js modules and TypeScript definitions. See:

https://github.com/sharedstreets/sharedstreets-pbf/blob/protobufjs/package.json#L15-L16

Made a crude attempt at modifying the tests to confirm this works, but still much work needed!

As for SharedStreets types, I think we should just use the .proto definitions as the default internal type, and trasnmography to GeoJSON and other formats as needed.

Member

kpwebb commented Jan 16, 2018

@DenisCarriere I just checked in what I've done on a the protobufjs branch. I tried using the pbjs and pbts CLI tools to compile proto files to js modules and TypeScript definitions. See:

https://github.com/sharedstreets/sharedstreets-pbf/blob/protobufjs/package.json#L15-L16

Made a crude attempt at modifying the tests to confirm this works, but still much work needed!

As for SharedStreets types, I think we should just use the .proto definitions as the default internal type, and trasnmography to GeoJSON and other formats as needed.

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 16, 2018

Collaborator

As for SharedStreets types, I think we should just use the .proto definitions as the default internal type, and trasnmography to GeoJSON and other formats as needed.

👍 Ok we can reserve this library to simply do the raw .proto conversions.

  • We can add option to convert to GeoJSON? Similar to Mapbox's vector-tile-js
Collaborator

DenisCarriere commented Jan 16, 2018

As for SharedStreets types, I think we should just use the .proto definitions as the default internal type, and trasnmography to GeoJSON and other formats as needed.

👍 Ok we can reserve this library to simply do the raw .proto conversions.

  • We can add option to convert to GeoJSON? Similar to Mapbox's vector-tile-js
@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 16, 2018

Collaborator

@kpwebb Did you ever get protobuf.js to work with the Delimited? I can't seem to receive an Array of Messages, I'm just getting 1 Message which is being overlapped by each other one.

Javascript

export function intersection (buffer) {
  return SharedStreetsIntersection.decodeDelimited(buffer).toJSON()
}

Only returns a single Object and not an Array of Objects. 🤔

# SharedStreets Geometry
{ id: 'HGdvAjtekfDrLFzPevNtf3',
  fromIntersectionId: 'CnvWEFX8or1vwdEsJ3uMeC',
  toIntersectionId: '6rmzoG63zZB4y9qTHKVFgk',
  forwardReferenceId: 'VgcQjjseNwVCuQAbJRjzyR',
  roadClass: 'Trunk',
  latlons:
   [ 40.75239562988281,
     -74.007568359375,
     40.753089904785156,
     -74.00729370117188 ] }
# SharedStreets Intersection
{ id: 'GqJezjdNjXCJTe2GF5FRsL',
  nodeId: '42436088',
  lat: 40.72600173950195,
  lon: -74.00909423828125,
  inboundReferenceIds:
   [ 'FC6bbhBJZ6rmGCpSRxMHfK',
     'ATpF4wynJAeRnQoSBi7QXv',
     '3XuKRafKzW1XRQzqEfCk7C',
     'M1TjUUwhyMiT7kY7qjPivF' ] }
Collaborator

DenisCarriere commented Jan 16, 2018

@kpwebb Did you ever get protobuf.js to work with the Delimited? I can't seem to receive an Array of Messages, I'm just getting 1 Message which is being overlapped by each other one.

Javascript

export function intersection (buffer) {
  return SharedStreetsIntersection.decodeDelimited(buffer).toJSON()
}

Only returns a single Object and not an Array of Objects. 🤔

# SharedStreets Geometry
{ id: 'HGdvAjtekfDrLFzPevNtf3',
  fromIntersectionId: 'CnvWEFX8or1vwdEsJ3uMeC',
  toIntersectionId: '6rmzoG63zZB4y9qTHKVFgk',
  forwardReferenceId: 'VgcQjjseNwVCuQAbJRjzyR',
  roadClass: 'Trunk',
  latlons:
   [ 40.75239562988281,
     -74.007568359375,
     40.753089904785156,
     -74.00729370117188 ] }
# SharedStreets Intersection
{ id: 'GqJezjdNjXCJTe2GF5FRsL',
  nodeId: '42436088',
  lat: 40.72600173950195,
  lon: -74.00909423828125,
  inboundReferenceIds:
   [ 'FC6bbhBJZ6rmGCpSRxMHfK',
     'ATpF4wynJAeRnQoSBi7QXv',
     '3XuKRafKzW1XRQzqEfCk7C',
     'M1TjUUwhyMiT7kY7qjPivF' ] }
@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 16, 2018

Member

@DenisCarriere I did! I checked in the test.js file with an example of how to do this:

glob.sync(path.join(__dirname, 'sharedstreets-ref-system/sample_data/nyc', '*.geometry.pbf')).forEach(filepath => {
const {name, base} = path.parse(filepath)
const buffer = fs.readFileSync(filepath)
const reader = protobuf.Reader.create(buffer); // or: new protobuf.BufferReader(buffer);
const outfile = filepath.replace(path.join('sharedstreets-ref-system/sample_data/nyc', base), path.join('test', 'out', name + '.json'))
var outStreatm = fs.createWriteStream(outfile, {'flags': 'a'});
while(reader.pos < reader.len) {
var result = sharedstreets.SharedStreetsGeometry.decodeDelimited(reader);
outStreatm.write(JSON.stringify(result) + "\n")
}
outStreatm.end();
t.deepEqual("test", "test");
//t.deepEqual(result, load.sync(outfile), name)
});

Member

kpwebb commented Jan 16, 2018

@DenisCarriere I did! I checked in the test.js file with an example of how to do this:

glob.sync(path.join(__dirname, 'sharedstreets-ref-system/sample_data/nyc', '*.geometry.pbf')).forEach(filepath => {
const {name, base} = path.parse(filepath)
const buffer = fs.readFileSync(filepath)
const reader = protobuf.Reader.create(buffer); // or: new protobuf.BufferReader(buffer);
const outfile = filepath.replace(path.join('sharedstreets-ref-system/sample_data/nyc', base), path.join('test', 'out', name + '.json'))
var outStreatm = fs.createWriteStream(outfile, {'flags': 'a'});
while(reader.pos < reader.len) {
var result = sharedstreets.SharedStreetsGeometry.decodeDelimited(reader);
outStreatm.write(JSON.stringify(result) + "\n")
}
outStreatm.end();
t.deepEqual("test", "test");
//t.deepEqual(result, load.sync(outfile), name)
});

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 16, 2018

Collaborator

Oh got it, I think I was missing this part:

while (reader.pos < reader.len) { 
  //..
} 
Collaborator

DenisCarriere commented Jan 16, 2018

Oh got it, I think I was missing this part:

while (reader.pos < reader.len) { 
  //..
} 
@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 16, 2018

Member

The key bit is

const reader = protobuf.Reader.create(buffer);

The protobuf uses the reader to index the position as it pulls in the buffer. Took me a minute to figure this out too..

Member

kpwebb commented Jan 16, 2018

The key bit is

const reader = protobuf.Reader.create(buffer);

The protobuf uses the reader to index the position as it pulls in the buffer. Took me a minute to figure this out too..

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 16, 2018

Collaborator

Oh got it! I don't know why they didn't build this into the .decodeDelimited() method, ah well! 🤷‍♂️ It should be a lot smoother now.

Collaborator

DenisCarriere commented Jan 16, 2018

Oh got it! I don't know why they didn't build this into the .decodeDelimited() method, ah well! 🤷‍♂️ It should be a lot smoother now.

@DenisCarriere

This comment has been minimized.

Show comment
Hide comment
@DenisCarriere

DenisCarriere Jan 16, 2018

Collaborator

👍 @kpwebb protobuf.js is the way to go, I've updated the pbf library & tests.

Feel free to check out the master branch or import the module via npm.

$ npm install sharedstreets-pbf
// Import library
const sharedstreetsPbf = require('sharedstreets-pbf')

// Read Buffer
const buffer = fs.readFileSync('z-x-y.geometry.pbf')

// Parse PBF
const geoms = sharedstreetsPbf.geometry(buffer)

// Read SharedStreets as an Array
geoms[0].id // => 'NxPFkg4CrzHeFhwV7Uiq7K'
Collaborator

DenisCarriere commented Jan 16, 2018

👍 @kpwebb protobuf.js is the way to go, I've updated the pbf library & tests.

Feel free to check out the master branch or import the module via npm.

$ npm install sharedstreets-pbf
// Import library
const sharedstreetsPbf = require('sharedstreets-pbf')

// Read Buffer
const buffer = fs.readFileSync('z-x-y.geometry.pbf')

// Parse PBF
const geoms = sharedstreetsPbf.geometry(buffer)

// Read SharedStreets as an Array
geoms[0].id // => 'NxPFkg4CrzHeFhwV7Uiq7K'
@kpwebb

This comment has been minimized.

Show comment
Hide comment
@kpwebb

kpwebb Jan 16, 2018

Member

Nice, thanks! At first glance looks good!

Member

kpwebb commented Jan 16, 2018

Nice, thanks! At first glance looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment