Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Destination #115

Closed
11 tasks
poundifdef opened this issue Mar 15, 2024 · 5 comments · Fixed by #123
Closed
11 tasks

BigQuery Destination #115

poundifdef opened this issue Mar 15, 2024 · 5 comments · Fixed by #123

Comments

@poundifdef
Copy link
Contributor

poundifdef commented Mar 15, 2024

We want to be able to read and write data from BigQuery in bulk. Here are the steps to accomplish that:

Step 1: Create new BigQuery Destination interface

  • Add a new package in pkg/destinations called bigquery
  • Update destinations.go to be able to connect to the new bigquery destination.
  • Implement the functions in the Destination interface for the new bigquery package. For this first step, you may stub these functions (panic("not implemented"))

Step 2: Implement Queries from BigQuery

Set up configuration parameters for BigQuery.

  • The YAML configuration should look like this. If there are any fields that we need, then add them:
databases:
  - type: bigquery
    api_keys:
      - test_api_key
    settings:
      json_credentials: >
        {
          "type": "service_account",
          "project_id": "example-123",
          "private_key_id": "...",
          "private_key": "-----BEGIN PRIVATE KEY-----...",
          ... 
        }
  • Implement QueryJSON() and QueryCSV() functions. This should execute the input query and output the result in the right format. You may find it useful to create a generic private query() function, and then have the JSON/CSV functions format data while returning.
  • Data should be streamed to the writer and avoid allocations as much as possible. Do not buffer all data into memory before returning to the client. Do not generate the entire JSON/CSV payload in memory.
  • The Close() function should clean up the BigQuery connection, and we should also close any open handlers related to executing the query.

Step 3: Implement Table Creation

Next, when the user inserts new data to big query, we want to create tables and columns based on the input data. Refer to the other packages in pkg/destinations for how to do this.

  • Implement the CreateEmptyTable() function. This should create a table with the given name along with an int64 column called __row_id. You may use this as a reference. This should only create the table if it doesn't already exist.
  • Implement the CreateColumns() function. This should read the input file and alter the BigQuery table to add columns if they do not exist. Here is a reference for how to do this.

Implement Data Insertion

We want to bulk upload data to BigQuery from the input file. This means imlementing the InsertFromNDJsonFile() function.

  • Data should be streamed to BigQuery. Do not load the entire data set into memory when uploading to the database.
  • Implement this with the Load API. The input type is newline-delimited JSON.
Copy link

algora-pbc bot commented Mar 15, 2024

💎 $500 bounty created by scratchdata
🙋 If you start working on this, comment /attempt #115 along with your implementation plan
👉 To claim this bounty, submit a pull request that includes the text /claim #115 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to scratchdata/scratchdata!

👉 Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @mohanish2504 #123

@abhishek818
Copy link

abhishek818 commented Mar 15, 2024

@poundifdef Can i get this assigned? , will give it a try over the weekend..

@mohanish2504
Copy link
Contributor

@poundifdef will liek to add my name for attempt

@poundifdef poundifdef changed the title Implement BigQuery as a destination BigQuery Destination Mar 16, 2024
abhishek818 added a commit to abhishek818/scratchdata that referenced this issue Mar 16, 2024
Copy link

algora-pbc bot commented Mar 17, 2024

💡 @mohanish2504 submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Copy link

algora-pbc bot commented Mar 20, 2024

🎉🎈 @mohanish2504 has been awarded $500! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants