Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Avro library failed to parse file - when uploading avro file #1379

Closed
DinaWork opened this issue Jun 13, 2016 · 8 comments
Closed

Apache Avro library failed to parse file - when uploading avro file #1379

DinaWork opened this issue Jun 13, 2016 · 8 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@DinaWork
Copy link

DinaWork commented Jun 13, 2016

here is a reference to the issue
http://stackoverflow.com/questions/37790813/apache-avro-library-failed-to-parse-file-nodejs

also had to add avro formats in gcloud/lib/bigquery files.

 var formats = {
    csv: 'CSV',
    json: 'NEWLINE_DELIMITED_JSON',
    avro: 'AVRO' /// added by me 
  };
@stephenplusplus stephenplusplus added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: bigquery Issues related to the BigQuery API. labels Jun 13, 2016
@stephenplusplus
Copy link
Contributor

Thanks for the bug catch-- PR incoming. I think the problem you're having is because you're uploading the actual zip file. You'll have to extract the .avro file first before BigQuery can handle it.

@stephenplusplus
Copy link
Contributor

Also, in case you find it useful, we have a built-in event emitter to the Job object:

job
  .on('error', console.log)
  .on('complete', function(metadata) {
    console.log('job completed', metadata);
  });

@DinaWork
Copy link
Author

no, It isn't the zip file.. I extracted it..
I attached it here...

On Mon, Jun 13, 2016 at 5:45 PM, Stephen Sawchuk notifications@github.com
wrote:

Thanks for the bug catch-- PR incoming. I think the problem you're having
is because you're uploading the actual zip file. You'll have to extract the
.avro file first before BigQuery can handle it.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1379 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ARWSndrgU0sCdYLMbgNwxu4RiyhANEvTks5qLW0MgaJpZM4I0YOj
.

@stephenplusplus
Copy link
Contributor

Okay, sorry that I missed that. After making the fix you pointed out to add support for AVRO files, I was able to import the .avro file successfully:

fs.createReadStream('./local-avro-file.avro')
  .pipe(table.createWriteStream(metadata))
  .on('complete', function(job) {
    job
      .on('error', console.log)
      .on('complete', function(metadata) {
        console.log('job completed', metadata);
      });
  });

Can you try it this way to help track down where the problem is coming from?

@stephenplusplus
Copy link
Contributor

PR sent with the AVRO fix: #1380.

@DinaWork
Copy link
Author

Yes, it works now. Thanks!!
Though I still have some open issues:

a) This works only for a table which has the same schema,
But when trying to upload to an empty table with no schema - it doesn't
work.

b) In general: how do I use gcloud for uploading jsons with their schema -
to a new table (documentation isn't clear enough)

On Mon, Jun 13, 2016 at 6:20 PM, Stephen Sawchuk notifications@github.com
wrote:

PR sent with the AVRO fix: #1380
#1380.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1379 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ARWSnXtNIKE11AU3Gfc-vbhN1mTrSTJLks5qLXVYgaJpZM4I0YOj
.

@stephenplusplus
Copy link
Contributor

A) Good question. The official API request we make when uploading to a new table is Jobs: insert, specifically a load operation. The request body is outlined here, and the description for schema:

configuration.load.schema | nested object | [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.

The UI handles detecting a schema from the AVRO file when dumping to a new table, so I'm not sure why we aren't able to reproduce the same behavior.

B) I'm not sure those can be done in a single operation. I believe the table has to first be created before loading data. If I took your question too literally, and you just want to know how to load JSON data into an empty table, it should look like this:

table.import('./data.json', {
  schema: 'name:string, servings:integer, cookingTime:float, quick:boolean'
}, function(err) {});

If you want the schema inferred from the JSON file itself, it seems like that's either not supported by the API or we're using it wrong.

@jgeewax anyone from BigQuery who can help with these questions?

@stephenplusplus
Copy link
Contributor

@DinaWork - do you mind taking this question to StackOverflow? I think the right eyes from BigQuery members will be able to see it over there, this channel must be off their radar. Please cross-link in your post. I'll be sure to implement any changes to make how we're using BigQuery better for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants