Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigquery timeouts and returns only job_id #24

Open
matreyes opened this issue Oct 16, 2023 · 2 comments
Open

Bigquery timeouts and returns only job_id #24

matreyes opened this issue Oct 16, 2023 · 2 comments

Comments

@matreyes
Copy link
Contributor

When your query takes too long to run, there is a timeout on Bigquery, and the query returns only the job_id.

In my own project (based in req_bigquery) I've replaced the first call, and always create a job (instead direct query), and then I call the job recursively to get the query results:

    def query(query, project_id, goth_name, finch_name, retries \\ 0)
    def query(_query, _project_id, _goth_name, _finch_name, 3), do: {:error, :too_many_retries}

    def query(query, project_id, goth_name, finch_name, retries) do
      token = Goth.fetch!(goth_name).token
      headers = [{"Authorization", "Bearer #{token}"}, {"Content-Type", "application/json"}]

      body =
        %{
          "configuration" => %{
            "query" => %{
              "query" => query,
              "useLegacySql" => false
            },
            "labels" => %{"service" => @service_name}
          }
        }
        |> Jason.encode!()

      # jobs insert API
      case Finch.build(:post, "#{@bigquery_base_url}/projects/#{project_id}/jobs", headers, body)
          |> Finch.request(finch_name) do
        {:ok, %{status: 200, body: body}} ->
          job = Jason.decode!(body)
          job_id = job["jobReference"]["jobId"]
          get_job(project_id, job_id, headers, finch_name)

        {:ok, %{status: error_code, body: body} = resp} ->
          Logger.error("Couldn't create a job. Error code: #{error_code}.")
          Logger.error(inspect(resp))
          {:error, body}

        {:error, e} ->
          Logger.error("Error creating bigquery job: #{inspect(e)}")
          :timer.sleep(1000)
          query(query, project_id, goth_name, finch_name, retries + 1)
      end
    end

    defp get_job(project_id, job_id, headers, finch_name) do
      Logger.debug("getting job")
      job_url = "#{@bigquery_base_url}/projects/#{project_id}/jobs/#{job_id}"
      req = Finch.build(:get, job_url, headers)

      with {:ok, %Response{status: 200, body: body}} <- Finch.request(req, finch_name),
          %{"status" => %{"state" => "DONE"}} = job_info <- Jason.decode!(body) do
        build_result(project_id, job_id, headers, job_info, finch_name)
      else
        _ ->
          :timer.sleep(@check_interval)
          get_job(project_id, job_id, headers, finch_name)
      end
    end

I really don't know if something like that would work for req_bigquery / livebook

Best,

@coryt
Copy link
Contributor

coryt commented Nov 16, 2023

@matreyes I hit this issue as well. I added a timeoutMs parameter (PR #28) but this could still be a problem.

Q for others here, what is the best way to handle async queries?

@matreyes
Copy link
Contributor Author

Yeah, I tried with timeout, but it still returned false, as documentation says

However, the call is not guaranteed to wait for the specified timeout; it typically returns after around 200 seconds (200,000 milliseconds), even if the query is not complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants