Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

NoOp Transformations

Utz Westermann edited this page Apr 7, 2017 · 6 revisions

Summary

NoOp transformations check for the existence of _SUCCESS flags in the fullPaths of the views they belong to, i.e., the views' partition folders in HDFS. In case such a flag exists and there is at least one further non-empty file in the folder, the view enters materialized state. Otherwise, it enters nodata state.

NoOp transformations are the default transformation applied if you do not specify a transformVia() clause.

Syntax

case class NoOp

Description

NoOp transformations are useful in the staging layers of a data warehouse. External ETL jobs can import raw data into HDFS beneath the partition folders of NoOp views and then signal data availability by setting _SUCCESS flags.

This implies that field and storage format declarations of NoOp views must match the raw data format, so that depending views can access the raw data via Hive and other transformations.

Helpers

None

Examples

An example of a stage NoOp view expecting to get its data delivered by an external ETL process:

case class Productfeed(
  year: Parameter[String],
  month: Parameter[String],
  day: Parameter[String]) extends View
  with DailyParameterization {

  val productId = fieldOf[String]
  val productName = fieldOf[String]
  val productPrice = fieldOf[Double]
  
  comment("Raw product master data")
  
  storedAs(TextFile(fieldTerminator = "|", lineTerminator = "\\n"))
}

Note that it defines the storage format as a | delimited CSV text file.

Packaging and Deployment

NoOp transformations are self-contained, so there are no packaging and deployment aspects to consider.

Change detection

NoOp transformations have no changeable logic. As a consequence, Schedoscope will not detect any changes to NoOp views and not automatically schedule rematerialization. If you want to rematerialize a NoOp view -- for example, because an external ETL job copied corrected raw data files after the view has already been materialized -- you need to explicitly invalidate the view.

Clone this wiki locally