Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using large file? #79

Closed
exebetche opened this issue May 15, 2017 · 2 comments
Closed

Using large file? #79

exebetche opened this issue May 15, 2017 · 2 comments
Labels

Comments

@exebetche
Copy link

exebetche commented May 15, 2017

I try to open a large pbf file (>2Go), but get an error:

var file = 'test.pbf';
var pbf = new Pbf(fs.readFileSync(file));
buffer.js:23
  const ui8 = new Uint8Array(size);
              ^
RangeError: Invalid typed array length

It seems nodejs buffers can't be bigger than 2Go:
nodejs/node#6560 (comment)

So my question:
Is there a way to use a stream instead of buffer as argument to access the file? Or maybe an other way to access file bigger than 2Go?

Thanks.

node version: 4.2.6

@kjvalencik
Copy link
Collaborator

Correct. The node.js heap is limited in size. You could implement your own sub-class of Buffer that reads incrementally. However, you would likely still have an issue with the resulting object.

If the proto file is that large, it's highly likely that resulting object is also not going to fit on the heap. You probably want some type of sax / oboe type interface for receiving the data as well.

For this to work, both ends of pbf would need to change. The Pbf class would need some changes to support a stream and the generated code would need to support a stream as well.

You could re-use many of the methods from Pbf for parsing the tags, but there are some fundamental issues that are very different from what this project does.

Do you know what about your file is so large? if it includes large binary data, you still might have this issue even with streaming, unless the streaming also supports streams for a single field.

@mourner
Copy link
Member

mourner commented Jul 7, 2017

As far as I know, Protobuf is not designed to hold huge amounts of data. Instead, it's most suited for encoding many relatively small objects. If this 2G file consists of thousands of smaller PBF objects, the proper way to deal with this would be using https://github.com/mafintosh/length-prefixed-stream for splitting a stream of PBF objects, and then use pbf for decoding each individually.

@mourner mourner closed this as completed Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants