Server: keep Postgres static schema #371

szareiangm · 2018-06-26T13:56:16Z

Hello team,

We had this conversation in Gitter, but I want to make it official here. I was wondering if you folks agree to keep Postgres statics schema DDL in the Iglu server?

I saw #193 and #194 but they seem to be a bit far for now.

alexanderdean · 2018-06-26T14:01:34Z

Can you clarify what Postgres statics schema DDL is?

szareiangm · 2018-06-26T14:04:24Z

Like the .sql files in Iglu Central (i.e. https://github.com/snowplow/iglu-central/tree/master/sql/com.amazon.aws.cloudfront) but in Iglu Server.

alexanderdean · 2018-06-26T15:00:16Z

We don't have a plan to store SQL static files inside Iglu server - the maintenance burden would be enormous. The linked tickets explain our plan around being able to generate SQL "files" on Iglu server endpoints based on schemas in the server.

szareiangm · 2018-06-26T15:26:17Z

Yes but the links in the tickets are not working. Can I ask for the working links so that I can review the plan details, please?

chuwy · 2018-06-27T10:58:51Z

Sorry @szareiangm, I afraid there's no working links for these tickets. And unfortunately we don't have any ETA on full Postgres support yet (though many open source users are interested in it). But PRs for schema-ddl can significantly speed-up this process.

szareiangm · 2018-06-27T18:08:37Z

Thanks @chuwy . After spending some time, I understood that Iglu-server cannot generate Redshift schema but Igluctl can generate schema for Redshift.

Postgres and Redshift have similarities, and the schema-ddl code could be reused for Postgres, too. How about this plan below?

In schema-ddl: Taking out the mutual codes to a different package, called something like sql. This package might have some traits/abstractions to keep the structure the same for higher levels of the code using it. (Schema DDL: separate redshift-specific code from standard SQL #372)
In schema-ddl: Add Postgres DataTypes and syntax in the new package. (Schema DDL: Add ability to generate Postgres DDL #194)
In iglu-server: Add the capability to call DdlGenerator for the schema.

I will do it in steps with separate pull requests with separate issues.

cc @alexanderdean @Nafid

chuwy · 2018-06-29T09:30:14Z

Hey @szareiangm,

Thanks for your interest in this and sorry for slightly delayed response. I have few points here:

I like the plan overall, but little bit hazy on first step. Although abstracting over SQL DDLs sounds like a good idea in theory, I'm not 100% sure that this is a good idea in practice. Problem is that SQL DDLs usually differ in a most important for us nuance: data type definitions. As an example, you can take a look at Snowflake AST which is quite different. And we definitely want have data types as sealed hierarchy. Same for columns definition: column options can be very different.

I'm not saying that I disagree with this step, but at least if we'll take this path we need be sure that we don't loose any preciseness on DDLs. E.g. we cannot have definitions like:

case class Column(name: String, sqlType: SqlType, sqlOptions: SqlOptions)

As its very lossy type, we won't be able to inspect it as pretty much anything will be able to extend SqlType and SqlOptions. In a slightly similar fasion though, I can imagine following definition:

case class Column[T <: SqlDdl](name: String, sql: SqlType[T], sqlOptions: SqlOptions[T])

In summary, this is a very hard trade-off between boilerplate (until now, I was leaning towards boilerplate) and possibly leaky abstractions.

One more very important point is that if we'll implement your roadmap, we'll still be missing one very important building block in Postgres support. Namely, we don't have a Postgres support in RDB Shredder (snowplow/snowplow-rdb-loader#47).

szareiangm · 2018-06-29T12:09:46Z

@chuwy Good morning from Toronto!
What is your suggestion? Sealing the datatypes and column options inside the redshift package and have duplicate similar ones for postgres?

About Postgres support in RBD Shredder, I gotta spend some time and understand the dynamics of it. I am not sure if that blocks this plan.

chuwy · 2018-06-29T14:13:38Z

Hey @szareiangm!

About Postgres support in RBD Shredder, I gotta spend some time and understand the dynamics of it. I am not sure if that blocks this plan.

Sure, please feel free to ask questions. In short: enriched events is our canonical TSV, containing all common fields (e.g. event_id, collector_tstamp etc) as well as self-describing JSONs (contexts, unstructured events). And Shredder is a Spark transformer job that transforms this heterogeneous TSV+JSON into something more friendly for particular DB (in our case Redshift) like TSV.

What is your suggestion?

Sorry, I'm not entirely sure yet. Most specific suggestion is in previous comment. However, right now I'm totally fine with boilerplate solution. This is super-important for our use case to prevent constructing following non-sense objects, like:

Column("foo", redshiftType, postgresOptions)

And current single hierarchy does not prevent it. It can be not a big deal for object-construction, but will be a huge problem for parsers.

szareiangm · 2018-06-29T14:32:25Z

Sure, please feel free to ask questions. In short: enriched events is our canonical TSV, containing all common fields (e.g. event_id, collector_tstamp etc) as well as self-describing JSONs (contexts, unstructured events). And Shredder is a Spark transformer job that transforms this heterogeneous TSV+JSON into something more friendly for particular DB (in our case Redshift) like TSV.

Thanks for the kind explanation, invaluable and enlightening. I think that would be the next step in my plan. There is room for progress in every aspect of the codebase when we extending it to new data sinks. I will get back to you for shredding when I read its code more throughly, however for now, I have to have schema generated out of iglu-server as the first step.

Sorry, I'm not entirely sure yet. Most specific suggestion is in previous comment. However, right now I'm totally fine with boilerplate solution. This is super-important for our use case to prevent constructing following non-sense objects, like:
Column("foo", redshiftType, postgresOptions)
And current single hierarchy does not prevent it. It can be not a big deal for object-construction, but will be a huge problem for parsers.

I think since Redshift is based on Postgres 8 with some commercial additions, the difference in types and options that we are worried about are potentially minimal. However, we can decided on the solution for the potential issues as we move forward. Let's move on and and have a look at my PR for first step, and then I can submit the second step PR. I appreciate your positive and open attitude.

aldemirenes · 2019-07-22T11:13:32Z

Migrated to snowplow/iglu-server#13

szareiangm changed the title ~~Iglu server to keep Postgres static schema~~ Server: keep Postgres static schema Jun 26, 2018

szareiangm mentioned this issue Jun 28, 2018

Schema DDL: separate redshift-specific code from standard SQL (Closes #372) #373

Open

aldemirenes mentioned this issue Jul 22, 2019

Keep Postgres static schema snowplow/iglu-server#13

Open

aldemirenes closed this as completed Jul 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server: keep Postgres static schema #371

Server: keep Postgres static schema #371

szareiangm commented Jun 26, 2018

alexanderdean commented Jun 26, 2018

szareiangm commented Jun 26, 2018

alexanderdean commented Jun 26, 2018 •

edited

Loading

szareiangm commented Jun 26, 2018

chuwy commented Jun 27, 2018

szareiangm commented Jun 27, 2018 •

edited

Loading

chuwy commented Jun 29, 2018

szareiangm commented Jun 29, 2018

chuwy commented Jun 29, 2018

szareiangm commented Jun 29, 2018

aldemirenes commented Jul 22, 2019

Server: keep Postgres static schema #371

Server: keep Postgres static schema #371

Comments

szareiangm commented Jun 26, 2018

alexanderdean commented Jun 26, 2018

szareiangm commented Jun 26, 2018

alexanderdean commented Jun 26, 2018 • edited Loading

szareiangm commented Jun 26, 2018

chuwy commented Jun 27, 2018

szareiangm commented Jun 27, 2018 • edited Loading

chuwy commented Jun 29, 2018

szareiangm commented Jun 29, 2018

chuwy commented Jun 29, 2018

szareiangm commented Jun 29, 2018

aldemirenes commented Jul 22, 2019

alexanderdean commented Jun 26, 2018 •

edited

Loading

szareiangm commented Jun 27, 2018 •

edited

Loading