Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

snowplow-archive/schema-ddl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Schema DDL

[ ![Build Status] travis-image ] travis [ ![Release] release-image ] releases [ License license-image ] license

WARNING! THIS REPOSITORY HAS BEEN DEPRECATED. SCHEMA DDL IS NOW PART OF THE IGLU PROJECT

Schema DDL is a set of generators for producing various DDL formats from JSON Schemas. It's tightly coupled with other tools from [Snowplow Platform] snowplow like Iglu iglu and [Self-describing JSON] self-describing and used mostly in [Schema Guru] schema-guru.

Schema DDL itself does not provide any CLI and expose only Scala API. All usefull methods are placed inside schemaddl.generators package. Currently the only useful generator is schemaddl.generators.redshift.

Current features

Flatten Schema

To process JSON Schema in typesafe manner sometimes it's necessary to represent it's nested structure as map of paths to properties. schemaddl.generators.SchemaFlattener.flattenJsonSchema can be used for that. It accepts JSON Schema as json4s.JValue and returns schemaddl.FlatSchema.

Redshift DDL

Current main feature of Schema DDL is to produce Redshift table DDL (with or without Snowplow-specific data). schemaddl.generators.redshift.getTableDdl method can be used for that. It accepts schemaddl.FlatSchema and produces Redshift DDL file with warnings like product types (eg. boolean, string) which cannot be correctly translated into DDL without some manual labor.

Also there's schemaddl.generators.redshift.Ddl module providing AST-like structures for generating DDL in flexible and type-safe manner.

JSON Paths

Amazon Redhshift uses [COPY] redshift-copy command to load data into table. To map data into columns JSONPaths file used. It may be generated with schemaddl.generators.redshift.JsonPathGenerator.getJsonPathsFile method. Which accepts list of schemaddl.generators.redshift.Ddl.Column objects (which can be taken from Table DDL object) and returns JSONPaths file as a string. It's coupled with Table object to preserve structure of the table. For example, you may want to modify list of your Columns by rearranging it depending on some properties, but JSONPaths file always should have the same order of fields and thus we cannot rely on FlatSchema object.

Copyright and License

Schema DDL is copyright 2014-2016 Snowplow Analytics Ltd.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.