## Legend model
We can load a Legend data model from classpath or directory as follows

In [0]:
%scala
import org.finos.legend.spark.LegendClasspathLoader
val legend = LegendClasspathLoader.loadResources()

## Legend entities
All available entities will be retrieved from our legend data model

In [0]:
%scala
val entities = legend.getEntityNames
display(entities.toSeq.toDF("pure"))

pure
databricks::mapping::developer_delta
databricks::entity::person
databricks::entity::sme
databricks::mapping::employee_developer
databricks::entity::employee
databricks::entity::developer
databricks::table::employee
databricks::mapping::employee_delta
databricks::table::developer


## Legend schema
We can create the spark schema for any Legend entity. This process will recursively loop through each underlying fields, enums, nested properties and supertypes.

In [0]:
%scala
val schema = legend.getSchema("databricks::entity::employee")
display(schema.fields.map(s => s.toDDL).toSeq.toDF("field"))

field
firstName STRING NOT NULL COMMENT 'Person first name'
lastName STRING NOT NULL COMMENT 'Person last name'
birthDate DATE NOT NULL COMMENT 'Person birth date'
gender STRING COMMENT 'Person gender'
id INT NOT NULL COMMENT 'Unique identifier of a databricks employee'
sme STRING COMMENT 'Programming skill that person truly masters'
joinedDate DATE NOT NULL COMMENT 'When did that person join Databricks'
highFives INT COMMENT 'How many high fives did that person get'


## Legend transformations
We can transform raw entities into their desired target tables. Note that relational transformations only support direct mapping and therefore easily enforced through `.withColumnRenamed` syntax.

In [0]:
%scala
val transformations = legend.getTransformations("databricks::mapping::employee_delta")
display(transformations.toSeq.toDF("column", "columnRenamed"))

column,columnRenamed
highFives,high_fives
joinedDate,joined_date
lastName,last_name
firstName,first_name
birthDate,birth_date
id,id
sme,sme
gender,gender


## Legend expectations
Given the `multiplicity` properties, we can 
detect if a field is optional or not or list has the right number of elements. Given an `enumeration`, 
we check for value consistency. These will be considered **technical expectations** and converted into SQL constraints. In addition, we also support the conversion of **business expectations**
from the PURE language to SQL expressions. We generate a legend
execution plan against a Databricks runtime

In [0]:
%scala
val expectations = legend.getExpectations("databricks::mapping::employee_delta")
display(expectations.toSeq.toDF("name", "expectation"))

name,expectation
[birthDate] is mandatory,birth_date IS NOT NULL
[sme] not allowed value,"(sme IS NULL OR sme IN ('Scala', 'Python', 'C', 'Java', 'R', 'SQL'))"
[id] is mandatory,id IS NOT NULL
[joinedDate] is mandatory,joined_date IS NOT NULL
[firstName] is mandatory,first_name IS NOT NULL
[high five] should be positive,(high_fives IS NOT NULL AND high_fives > 0)
[age] should be > 21,year(joined_date) - year(birth_date) > 21
[lastName] is mandatory,last_name IS NOT NULL


## Legend derivations
We can convert Legend derived properties as SQL expressions. In the example model, the field `age` is not physically stored but can be computed at runtime.

In [0]:
%scala
val derivations = legend.getDerivations("databricks::mapping::employee_delta")
display(derivations.toSeq.toDF("column", "expression"))

column,expression
age,year(joined_date) - year(birth_date) AS `age`


## Legend tables
In order to query our validated entity from legend interface, we can easily create the target state table.

In [0]:
%scala
val tableName = legend.createTable("databricks::mapping::employee_delta", "/FileStore/antoine.amend@databricks.com/legend/employee")
display(sql(s"DESCRIBE EXTENDED $tableName"))

col_name,data_type,comment
first_name,string,Person first name
last_name,string,Person last name
birth_date,date,Person birth date
gender,string,Person gender
id,int,Unique identifier of a databricks employee
sme,string,Programming skill that person truly masters
joined_date,date,When did that person join Databricks
high_fives,int,How many high fives did that person get
,,
# Partitioning,,


# Example - write
In this scenario, we read raw JSON files that we schematize, transform and persist to our target state delta table.

In [0]:
%sh
head /dbfs/FileStore/antoine.amend@databricks.com/legend/employee.json

In [0]:
%scala
val schema = legend.getSchema("databricks::entity::employee")
val schemaDf = spark.read.format("json").schema(schema).load("/FileStore/antoine.amend@databricks.com/legend")
display(schemaDf.limit(10))

firstName,lastName,birthDate,gender,id,sme,joinedDate,highFives
Levey,Storck,1989-02-19,M,,C,2015-12-05,282
Maria,O'Gorman,1987-08-14,M,2.0,Python,2017-03-03,299
Evvy,Lepoidevin,1970-10-04,M,3.0,C,2020-11-02,182
Georges,Jotcham,1973-11-26,F,4.0,Scala,2020-09-14,229
Doroteya,Wadhams,1987-03-11,N,5.0,Scala,2019-02-11,78
Mia,Millgate,1988-08-01,F,6.0,Python,2017-04-13,146
Celene,Calverley,1979-07-15,N,7.0,Python,2021-06-03,69
Richie,Di Matteo,1980-05-18,F,8.0,Python,2014-08-23,167
Ignaz,Kurth,1987-01-10,F,,Python,2014-02-01,199
Anthia,Duck,1998-02-08,F,10.0,Python,2015-01-14,277


In [0]:
%scala
import org.finos.legend.spark._
val transformations = legend.getTransformations("databricks::mapping::employee_delta")
val transformedDf = schemaDf.legendTransform(transformations)
display(transformedDf.limit(10))

first_name,last_name,birth_date,gender,id,sme,joined_date,high_fives
Levey,Storck,1989-02-19,M,,C,2015-12-05,282
Maria,O'Gorman,1987-08-14,M,2.0,Python,2017-03-03,299
Evvy,Lepoidevin,1970-10-04,M,3.0,C,2020-11-02,182
Georges,Jotcham,1973-11-26,F,4.0,Scala,2020-09-14,229
Doroteya,Wadhams,1987-03-11,N,5.0,Scala,2019-02-11,78
Mia,Millgate,1988-08-01,F,6.0,Python,2017-04-13,146
Celene,Calverley,1979-07-15,N,7.0,Python,2021-06-03,69
Richie,Di Matteo,1980-05-18,F,8.0,Python,2014-08-23,167
Ignaz,Kurth,1987-01-10,F,,Python,2014-02-01,199
Anthia,Duck,1998-02-08,F,10.0,Python,2015-01-14,277


In [0]:
%scala
val tableName = legend.getTable("databricks::mapping::employee_delta")
transformedDf.write.format("delta").mode("append").saveAsTable(tableName)

# Example - read
From delta, we read objects that we transform back as a pure entity with derived properties and violated constraints. New derivations could be added from legend studio and seamlessly computed here without the need for engineering team to code. The generated dataframe would comply with business expectations and data quality, as defined from the legend studio.

In [0]:
%scala
val legendDf = spark.read.legend("databricks::mapping::employee_delta")
display(legendDf.limit(10))

firstName,lastName,birthDate,gender,id,sme,joinedDate,highFives,age,legend
Levey,Storck,1989-02-19,M,,C,2015-12-05,282,26,List([id] is mandatory)
Maria,O'Gorman,1987-08-14,M,2.0,Python,2017-03-03,299,30,List()
Evvy,Lepoidevin,1970-10-04,M,3.0,C,2020-11-02,182,50,List()
Georges,Jotcham,1973-11-26,F,4.0,Scala,2020-09-14,229,47,List()
Doroteya,Wadhams,1987-03-11,N,5.0,Scala,2019-02-11,78,32,List()
Mia,Millgate,1988-08-01,F,6.0,Python,2017-04-13,146,29,List()
Celene,Calverley,1979-07-15,N,7.0,Python,2021-06-03,69,42,List()
Richie,Di Matteo,1980-05-18,F,8.0,Python,2014-08-23,167,34,List()
Ignaz,Kurth,1987-01-10,F,,Python,2014-02-01,199,27,List([id] is mandatory)
Anthia,Duck,1998-02-08,F,10.0,Python,2015-01-14,277,17,List([age] should be > 21)
