Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Teradata driver #6459

Closed
wants to merge 100 commits into from
Closed

Conversation

@rawyler
Copy link
Contributor

rawyler commented Nov 23, 2017

The driver has been used with a productive Teradata instance and works fine so far.
Tests were run as described in the wiki (ENGINES=h2,teradata lein test) using the Teradata Express image.
I managed to get down to 13 failures, 13 errors.
I sense that there isn't that much work left to reach 0 so I thought it's best to ask my remaining questions.

Questions:

1. Teradata: Duplicate definition of 'name' in NAMED phrase

For example: breakout_test.clj:118 results in the following query:
SELECT
TOP 10 "test-data"."venues"."id" AS "id",
"test-data"."venues"."name" AS "name",
"test-data"."venues"."category_id" AS "category_id",
"test-data"."venues"."latitude" AS "latitude",
"test-data"."venues"."longitude" AS "longitude",
"test-data"."venues"."price" AS "price",
"categories__via__category_id"."name" AS "name"
FROM "test-data"."venues"
LEFT JOIN "test-data"."categories" "categories__via__category_id" ON "test-data"."venues"."category_id" = "categories__via__category_id"."id"
ORDER BY "categories__via__category_id"."name" DESC

For Teradata this results in "[Error 3863] [SQLState 42000] Duplicate definitio
n of 'name' in NAMED phrase."
I've seen that the implementations for Oracle and BigQuery support a de-duplication of aliases. Oracle does it if a limit is applied and BigQuery if an aggregation is made.

I wanted to ask if it is sufficent if the de-duplication is performed in apply-limit only.

2. Database name: test_data does not exist

For example: nested_queries_test.clj:78 results in the following query:

SELECT "id", "longitude", "category_id", "price", "name", "latitude" FROM "test_data"."venues"

Somehow my Teradata implementation seems to generate wrong database identifiers when qualified using the identifier function of nested_queries_test.

Can you give me a hint?

3. Date bucketing tests, some queries issued against wrong database

Some of the tests result in querying checkins:4-per-minute instead of checkins:4-per-hour or checkins:1-per-day. If i change the resulting query to use the correct database, the results are as expected by the test.

For example date_bucketing_test.clj:898 produces the following query:

SELECT count(*) AS "count" FROM "a-checkin-every-15-seconds"."checkins" WHERE trunc("a-checkin-every-15-seconds"."checkins"."timestamp", 'day') = trunc(CURRENT_TIMESTAMP, 'day')

although checkins:1-per-day should be used:

SELECT count(*) AS "count" FROM "a-checkin-every-86400-seconds"."checkins" WHERE trunc("a-checkin-every-86400-seconds"."checkins"."timestamp", 'day') = trunc(CURRENT_TIMESTAMP, 'day')

Do you have any idea why the tests in date_bucketing_test:885-... seem to use a-checkin-every-15-seconds although they request it differently?

4. Order of the results in database_test.clj

Somehow the order of the returned result is not the same as the expected when running with ENGINES=h2,teradata. The data seems to be correct.

TODO (done)
@rawyler rawyler force-pushed the swisscom-bigdata:dev/teradata_driver branch from dd8b076 to de78f86 Nov 23, 2017
@salsakran salsakran requested a review from camsaul Dec 4, 2017
@salsakran

This comment has been minimized.

Copy link
Contributor

salsakran commented Dec 4, 2017

@camsaul mind answering the questions that @rawyler brings up?

@moukat moukat mentioned this pull request Dec 4, 2017
[helpers :as h]
types]
[metabase.models [field :as field]]
[clojure.tools.logging :as log]

This comment has been minimized.

Copy link
@camsaul

camsaul Dec 13, 2017

Member

We try to keep all our requires sorted. If you're using emacs you can have it do it for you automatically with the cljr-clean-ns command from the clj-refactor package (just make sure you (setq cljr-favor-prefix-notation t)). Otherwise take a look at other namespaces in the project for an example. We just sort them alphabetically, e.g. clojure.tools.logging should come before metabase.models.field.

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

I wanted to ask if it is sufficent if the de-duplication is performed in apply-limit only.

That query you posted looks like it would still need alias deduplication even if you didn't apply a TOP 10 limit. So you probably need to do it somewhere else besides apply-limit or you could still get into situations where a query with duplicate aliases could be generated.

[interface :refer [def-database-definition]]]
[toucan.db :as db]
[toucan.util.test :as tt])
(:import metabase.driver.teradata.TeradataDriver))

This comment has been minimized.

Copy link
@camsaul

camsaul Dec 13, 2017

Member

Did you just copy-paste all these required namespaces from somewhere else? It looks like you're using maybe two of them. If you're using Emacs you can just use cljr-clean-ns to clean up the ones that aren't used automatically

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

Somehow my Teradata implementation seems to generate wrong database identifiers when qualified using the identifier function of nested_queries_test.

It looks to me like the venues table has test_data recorded as its schema so it's trying to qualify that as schema.table-name, which is why you get test_data.venues. Which way does Teradata expect it to be qualified? database.schema.table-name?

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

Do you have any idea why the tests in date_bucketing_test:885-... seem to use a-checkin-every-15-seconds although they request it differently?

My best guess is there's something not working quite right in your test data code and it's causing it to reuse the same DB instead of creating new ones. So the unit tests think they're getting the checkin-every-day DB but it's fetching a different DB instead and thinks it has the right one.

This might be an issue with reusing names. Different test DBs need to have different names or the test code will skip DB creation because it thinks the test DBs have already been created. Let me poke through the code and see if I can figure out what's going on

@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Dec 13, 2017

Which way does Teradata expect it to be qualified? database.schema.table-name?

Teradata uses (user|database).table-name.column-name

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

So for tables it should be database.table, not schema.table?

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

Order of the results in database_test.clj
Somehow the order of the returned result is not the same as the expected when running with ENGINES=h2,teradata. The data seems to be correct.

Which test/line number is out of order? There's 40 or so tests in that file

(let [db-connection (sql/db->jdbc-connection-spec database)]
(run-query-without-timezone driver settings db-connection query))))))

(def TeradataISQLDriverMixin

This comment has been minimized.

Copy link
@camsaul

camsaul Dec 13, 2017

Member

There's no reason to have a separate "mixin" here with a partial implementation of some of the driver methods. We only do that for the Postgres driver because the Redshift driver uses the same implementations. Since no other drivers are going to be using these Teradata implementations there's no need to make them available like this

This comment has been minimized.

Copy link
@rawyler

rawyler Dec 13, 2017

Author Contributor

ok

@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Dec 13, 2017

It would be database.table (the jdbc driver's .getSchemas method returns all the databases)

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

@rawyler is the only way to run Teradata locally with the Teradata Express VM image? That's fine for local development but I'm not sure how we're going to get that working on CI. Is there a Docker image?

Alternatively it sounds like it's available on the AWS Marketplace, but then we'd have to pay for that in order to test things

@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Dec 13, 2017

So test_data isn't the correct name of the database, right? In the query above it has test-data. Is that the right name for the DB? e.g. "test-data"."venues" instead of "test_data"."venues"?

@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Dec 13, 2017

Yes, "test-data"."venues" would be the correct one.

@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Dec 13, 2017

@camsaul The only thing I could find regarding the dockerization was this: https://community.teradata.com/t5/Database/Teradata-Express-for-Docker/m-p/71980#M1067
Instead of trying to get a Teradata Express running within a VirtualBox within a Docker container, I used the official VMWare image. I can look into this and check if a Teradata Express on VirtualBox within Docker would run.

@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Jan 2, 2018

As this Teradata Express within VirtualBox within Docker requires a lot of effort, I would like to run the tests against an AWS instance. Makes more sense and shouldn't be too expensive.

@rawyler rawyler force-pushed the swisscom-bigdata:dev/teradata_driver branch 2 times, most recently from cb46ef3 to de78f86 Jan 8, 2018
@camsaul

This comment has been minimized.

Copy link
Member

camsaul commented Feb 9, 2018

@rawyler what steps are needed to run this in AWS?

@aedocw

This comment has been minimized.

Copy link

aedocw commented Apr 11, 2018

@camsaul there is a free instance of Teradata Database on AWS Marketplace (https://aws.amazon.com/marketplace/pp/B06Y4Z54R5?qid=1523485253250&sr=0-6&ref_=srh_res_product_title) which is launched with cloud formations. Keeping the instance running full time would be unreasonably expensive, and I'm not sure if it would launch fast enough for the test runner?

Alternately, a low-performance solution is available - I got the Teradata Express VM image running in a VM under qemu. It was pretty easy, and results in a DB running in a single instance (I think I used m2.large). Would that work for testing if the version in marketplace doesn't work?

camsaul and others added 22 commits Nov 28, 2018
Fix handling dimension parameters using a foreign key
Auto-bucket type/DateTime field literals where appropriate 📆
Open links in new tab if command/'meta' key is pressed
Disable tests that are failing on CI
…x-negative-currency-formatting
Fix currency formatting of negative numbers when currency_in_header = true
* check can_write when displaying header buttons

* check can_write when displaying header buttons

* don't hide duplicate
…LoadHelper-errors

Don't log info messages or async load Liquibase during compilation
…cting-to-pg-without-sslmode-require

Warn on Postgres conn str w/ ssl=true but not sslmode=require ⚠️
* ensure group and accordian lists scroll when the window is too small

* CR fixes
Update released languages' strings and add German (yay!)
Fix date formatting settings for SQL queries
@kdoh kdoh added the .CLA Signed label Jan 10, 2019
@tlrobinson

This comment has been minimized.

Copy link
Member

tlrobinson commented Mar 11, 2019

Hey all,

Thank you for this PR and apologies for the long delay responding. We've recently done some work to support drivers as plugins in Metabase and as of the next release (v0.32) we will be asking driver developers to first publish new drivers in their own repositories (typically named metabase-*-driver). If the driver fully passes the driver tests we will link to it from our documentation, and if a driver gains significant usage we will consider merging it into Metabase itself.

Here is some documentation about publishing a driver as a plugin in a separate repository: https://github.com/metabase/metabase/wiki/Writing-a-Driver:-Packaging-a-Driver-&-Metabase-Plugin-Basics#drivers-shipped-as-3rd-party-plugins

Here are two example drivers: https://github.com/metabase/sudoku-driver https://github.com/metabase/crate-driver

We still have some work to do on the documentation. You can follow progress and provide feedback on that here: #9348

Thank you.

@tlrobinson tlrobinson closed this Mar 11, 2019
@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Mar 11, 2019

Thanks for your feedback.
I will perform the transition to a separate repository soon.

@funsign

This comment has been minimized.

Copy link

funsign commented Apr 28, 2019

Hi,
I’m very interested by this teradata driver. When he will be merge?

Thx

@rawyler

This comment has been minimized.

Copy link
Contributor Author

rawyler commented Jun 25, 2019

The Teradata driver has been migrated to the plugin metabase-teradata-driver https://github.com/swisscom-bigdata/metabase-teradata-driver

@funsign

This comment has been minimized.

Copy link

funsign commented Jul 23, 2019

The Teradata driver has been migrated to the plugin metabase-teradata-driver https://github.com/swisscom-bigdata/metabase-teradata-driver

Hi,
thank you very much, tested and approved !!

Bye
Antoine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.