Apache Avro library failed to parse file - when uploading avro file #1379

DinaWork · 2016-06-13T14:22:53Z

here is a reference to the issue
http://stackoverflow.com/questions/37790813/apache-avro-library-failed-to-parse-file-nodejs

also had to add avro formats in gcloud/lib/bigquery files.

 var formats = {
    csv: 'CSV',
    json: 'NEWLINE_DELIMITED_JSON',
    avro: 'AVRO' /// added by me 
  };

stephenplusplus · 2016-06-13T14:45:22Z

Thanks for the bug catch-- PR incoming. I think the problem you're having is because you're uploading the actual zip file. You'll have to extract the .avro file first before BigQuery can handle it.

stephenplusplus · 2016-06-13T14:46:43Z

Also, in case you find it useful, we have a built-in event emitter to the Job object:

job
  .on('error', console.log)
  .on('complete', function(metadata) {
    console.log('job completed', metadata);
  });

DinaWork · 2016-06-13T15:00:17Z

no, It isn't the zip file.. I extracted it..
I attached it here...

On Mon, Jun 13, 2016 at 5:45 PM, Stephen Sawchuk notifications@github.com
wrote:

Thanks for the bug catch-- PR incoming. I think the problem you're having
is because you're uploading the actual zip file. You'll have to extract the
.avro file first before BigQuery can handle it.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1379 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ARWSndrgU0sCdYLMbgNwxu4RiyhANEvTks5qLW0MgaJpZM4I0YOj
.

stephenplusplus · 2016-06-13T15:05:06Z

Okay, sorry that I missed that. After making the fix you pointed out to add support for AVRO files, I was able to import the .avro file successfully:

fs.createReadStream('./local-avro-file.avro')
  .pipe(table.createWriteStream(metadata))
  .on('complete', function(job) {
    job
      .on('error', console.log)
      .on('complete', function(metadata) {
        console.log('job completed', metadata);
      });
  });

Can you try it this way to help track down where the problem is coming from?

stephenplusplus · 2016-06-13T15:20:47Z

PR sent with the AVRO fix: #1380.

DinaWork · 2016-06-13T17:07:47Z

Yes, it works now. Thanks!!
Though I still have some open issues:

a) This works only for a table which has the same schema,
But when trying to upload to an empty table with no schema - it doesn't
work.

b) In general: how do I use gcloud for uploading jsons with their schema -
to a new table (documentation isn't clear enough)

On Mon, Jun 13, 2016 at 6:20 PM, Stephen Sawchuk notifications@github.com
wrote:

PR sent with the AVRO fix: #1380
#1380.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1379 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ARWSnXtNIKE11AU3Gfc-vbhN1mTrSTJLks5qLXVYgaJpZM4I0YOj
.

stephenplusplus · 2016-06-13T17:26:46Z

A) Good question. The official API request we make when uploading to a new table is Jobs: insert, specifically a load operation. The request body is outlined here, and the description for schema:

configuration.load.schema | nested object | [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.

The UI handles detecting a schema from the AVRO file when dumping to a new table, so I'm not sure why we aren't able to reproduce the same behavior.

B) I'm not sure those can be done in a single operation. I believe the table has to first be created before loading data. If I took your question too literally, and you just want to know how to load JSON data into an empty table, it should look like this:

table.import('./data.json', {
  schema: 'name:string, servings:integer, cookingTime:float, quick:boolean'
}, function(err) {});

If you want the schema inferred from the JSON file itself, it seems like that's either not supported by the API or we're using it wrong.

@jgeewax anyone from BigQuery who can help with these questions?

stephenplusplus · 2016-06-27T16:33:24Z

@DinaWork - do you mind taking this question to StackOverflow? I think the right eyes from BigQuery members will be able to see it over there, this channel must be off their radar. Please cross-link in your post. I'll be sure to implement any changes to make how we're using BigQuery better for users.

stephenplusplus added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: bigquery Issues related to the BigQuery API. labels Jun 13, 2016

stephenplusplus mentioned this issue Jun 13, 2016

bigquery: allow AVRO format #1380

Merged

stephenplusplus closed this as completed Jun 27, 2016

JustinBeckwith assigned stephenplusplus Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache Avro library failed to parse file - when uploading avro file #1379

Apache Avro library failed to parse file - when uploading avro file #1379

DinaWork commented Jun 13, 2016 •

edited by stephenplusplus

Loading

stephenplusplus commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

DinaWork commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

DinaWork commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

stephenplusplus commented Jun 27, 2016

Apache Avro library failed to parse file - when uploading avro file #1379

Apache Avro library failed to parse file - when uploading avro file #1379

Comments

DinaWork commented Jun 13, 2016 • edited by stephenplusplus Loading

stephenplusplus commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

DinaWork commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

DinaWork commented Jun 13, 2016

stephenplusplus commented Jun 13, 2016

stephenplusplus commented Jun 27, 2016

DinaWork commented Jun 13, 2016 •

edited by stephenplusplus

Loading