Skip to content

lairtonmendes/databricks_sql

Repository files navigation

databricks_sql

Ruby gem for the Databricks SQL Statements API with support for:

  • Personal Access Token (PAT) authentication
  • synchronous and asynchronous execution (polling)
  • format: JSON_ARRAY
  • disposition: INLINE
  • disposition: EXTERNAL_LINK with automatic file download and parsing
  • HTTP and SQL execution error handling

Installation

Add to your Gemfile:

gem "databricks_sql"

Or install directly:

gem install databricks_sql

Global Configuration (Recommended)

Configure connection settings once and reuse them across your application:

require "databricks_sql"

Databricks.configure do |config|
	config.host = "https://adb-1234567890123456.7.azuredatabricks.net"
	config.token = ENV.fetch("DATABRICKS_TOKEN")
	config.warehouse_id = ENV.fetch("DATABRICKS_WAREHOUSE_ID")
	config.timeout = 30
	config.open_timeout = 10
	config.external_link_require_https = true
	config.external_link_allowed_hosts = ["files.example.com", "s3.amazonaws.com"]
end

Security notes:

  • The API host must use HTTPS.
  • EXTERNAL_LINK URLs are HTTPS-only by default.
  • If external_link_allowed_hosts is set, downloads are allowed only from those domains.

Then initialize your client without passing credentials again:

client = DatabricksSql::Client.new

You can also configure through DatabricksSql.configure:

DatabricksSql.configure do |config|
	config.host = "https://adb-1234567890123456.7.azuredatabricks.net"
	config.token = ENV.fetch("DATABRICKS_TOKEN")
	config.warehouse_id = ENV.fetch("DATABRICKS_WAREHOUSE_ID")
end

If needed, override per client instance:

client = DatabricksSql::Client.new(
	host: "https://adb-1234567890123456.7.azuredatabricks.net",
	token: ENV.fetch("DATABRICKS_TOKEN"),
	warehouse_id: ENV.fetch("DATABRICKS_WAREHOUSE_ID")
)

Synchronous Usage

execute_statement submits the query and waits for a terminal status (SUCCEEDED, FAILED, CANCELED, or CLOSED).

result = client.execute_statement(
	statement: "SELECT id, name FROM analytics.users LIMIT 5",
	format: "JSON_ARRAY",
	disposition: "INLINE"
)

puts result.status
puts result.columns.inspect
puts result.rows.inspect

SQL Context (catalog/schema)

result = client.execute_statement(
	statement: "SELECT current_catalog(), current_schema()",
	catalog: "main",
	schema: "analytics"
)

Type Mapping with column_schema

column_schema allows optional per-column coercion.

result = client.execute_statement(
	statement: "SELECT id, is_active, created_at FROM analytics.users LIMIT 2",
	column_schema: {
		"id" => :integer,
		"is_active" => :boolean,
		"created_at" => :datetime
	}
)

result.rows.each do |row|
	puts [row["id"].class, row["is_active"].class, row["created_at"].class].inspect
end

Asynchronous Usage (Polling)

1) Submit without blocking

submission = client.execute_statement_async(
	statement: "SELECT * FROM large_table",
	format: "JSON_ARRAY",
	disposition: "EXTERNAL_LINK",
	wait_timeout: "10s",
	on_wait_timeout: "CONTINUE"
)

statement_id = submission.fetch("statement_id")
puts "Statement ID: #{statement_id}"

2) Manual polling

loop do
	state = client.get_statement(statement_id: statement_id)
	puts "Current status: #{state["status"]}"
	break if %w[SUCCEEDED FAILED CANCELED CLOSED].include?(state["status"])
	sleep 1
end

3) Automatic polling with global timeout

result = client.wait_for_statement(
	statement_id: statement_id,
	disposition: "EXTERNAL_LINK",
	poll_interval: 1.0,
	max_wait: 120,
	cancel_on_timeout: true
)

puts result.rows.size

INLINE vs EXTERNAL_LINK

  • INLINE returns results directly in the API payload.
  • EXTERNAL_LINK extracts the download URL, downloads the file, and returns parsed content.

In EXTERNAL_LINK mode, JSON and CSV are parsed automatically.

Error Handling

Main error classes:

  • DatabricksSql::AuthenticationError (401)
  • DatabricksSql::AuthorizationError (403)
  • DatabricksSql::NotFoundError (404)
  • DatabricksSql::RateLimitError (429)
  • DatabricksSql::ServerError (5xx)
  • DatabricksSql::TimeoutError
  • DatabricksSql::ConnectionError
  • DatabricksSql::ExecutionError (logical SQL execution failure)
  • DatabricksSql::ParseError

Example:

begin
	result = client.execute_statement(statement: "SELECT * FROM missing_table")
	p result.rows
rescue DatabricksSql::ExecutionError => e
	warn "SQL execution failed: #{e.message}"
rescue DatabricksSql::HTTPError => e
	warn "HTTP error #{e.status_code}: #{e.message}"
rescue DatabricksSql::Error => e
	warn "DatabricksSql error: #{e.message}"
end

Development

bin/setup
bundle exec rubocop
bundle exec rspec

Install locally:

bundle exec rake install

License

MIT. See LICENSE.txt.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors