# Dressmaker - Samples

The data model of the tables.

![](../src/img/dressmaker_str.svg)

In [1]:
import $ivy.`org.apache.spark::spark-sql:3.4.0`

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF)

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

val spark = {
    NotebookSparkSession.builder()
    .progress(false)
    .appName("app17-0")
    // .master("spark://192.168.31.31:7077")
    .master("local[*]")
    .config("spark.sql.warehouse.dir", 
            "hdfs://192.168.31.31:9000/user/hive/warehouse") 
    .config("spark.cores.max", "4") 
    .config("spark.executor.instances", "1") 
    .config("spark.executor.cores", "2") 
    .config("spark.executor.memory", "10g") 
    .config("spark.shuffle.service.enabled", "false") 
    .config("spark.dynamicAllocation.enabled", "false") 
    .config("spark.sql.catalogImplementation", "hive")
    .config("spark.sql.repl.eagerEval.enabled", "true")
    .config("spark.driver.allowMultipleContexts", "true")
    .getOrCreate()
}

Loading spark-stubs, spark-hive
Adding Hive conf dir /opt/hive/conf to classpath
Creating SparkSession


SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.


[32mimport [39m[36m$ivy.$                                  

[39m
[32mimport [39m[36morg.apache.log4j.{Level, Logger}
[39m
[32mimport [39m[36morg.apache.spark._
[39m
[32mimport [39m[36morg.apache.spark.sql._
[39m
[32mimport [39m[36morg.apache.spark.sql.types._
[39m
[32mimport [39m[36morg.apache.spark.sql.functions._
[39m
[32mimport [39m[36morg.apache.spark.sql.expressions.Window

[39m
[36mspark[39m: [32mSparkSession[39m = org.apache.spark.sql.SparkSession@1a5364d1

In [2]:
import spark.implicits._
import spark.sqlContext.implicits._
def sc = spark.sparkContext
val hiveCxt = new org.apache.spark.sql.hive.HiveContext(sc)

[32mimport [39m[36mspark.implicits._
[39m
[32mimport [39m[36mspark.sqlContext.implicits._
[39m
defined [32mfunction[39m [36msc[39m
[36mhiveCxt[39m: [32msql[39m.[32mhive[39m.[32mHiveContext[39m = org.apache.spark.sql.hive.HiveContext@1b8cd7e1

In [3]:
// Credit to Aivean
implicit class RichDF(val ds:DataFrame) {
    def showHTML(limit: Int = 50, truncate: Int = 100) = {
        import xml.Utility.escape
        val data = ds.take(limit)
        val header = ds.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map {cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }
    publish.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map {row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)
    }
}

defined [32mclass[39m [36mRichDF[39m

In [4]:
val jmcust = hiveCxt.table("sqlzoo.jmcust")
val dressmaker = hiveCxt.table("sqlzoo.dressmaker")
val dress_order = hiveCxt.table("sqlzoo.dress_order")
val construction = hiveCxt.table("sqlzoo.construction")
val quantities = hiveCxt.table("sqlzoo.quantities")
val order_line = hiveCxt.table("sqlzoo.order_line")
val garment = hiveCxt.table("sqlzoo.garment")
val material = hiveCxt.table("sqlzoo.material")

[36mjmcust[39m: [32mDataFrame[39m = [c_no: int, c_name: string ... 2 more fields]
[36mdressmaker[39m: [32mDataFrame[39m = [d_no: int, d_name: string ... 2 more fields]
[36mdress_order[39m: [32mDataFrame[39m = [order_no: int, cust_no: int ... 2 more fields]
[36mconstruction[39m: [32mDataFrame[39m = [maker: int, order_ref: int ... 3 more fields]
[36mquantities[39m: [32mDataFrame[39m = [style_q: int, size_q: int ... 1 more field]
[36morder_line[39m: [32mDataFrame[39m = [order_ref: int, line_no: int ... 3 more fields]
[36mgarment[39m: [32mDataFrame[39m = [style_no: int, description: string ... 2 more fields]
[36mmaterial[39m: [32mDataFrame[39m = [material_no: int, fabric: string ... 3 more fields]

## 1.
The "central" table in this database is order_line - every garment ordered takes one line in this table. Many of the fields in this table are references to other tables. The fields of this table have the following meaning: 

- order_ref

This is a link to the dress_order table. We can join the dress_order table to find information such as the the date of the order and the customer number for a given garment order.

- line_no

The line number is used to distinguish different items on the same order - for example order number 5 has three lines - 1, 2 and 3.

- ol_style

Indicates the article of clothing ordered. For example ol_style 1 indicates trousers - we can see this by joining to the garments table. Line 1 in the garment table is trousers.

- ol_size

The size of the item ordered is given here - this is particularly important when it comes to working out how much material is required to build the item. We can see from the quantities table that trousers (style 1) in size 8 takes 2.7 meters - whereas trousers in size 12 needs 2.8 meters.

- ol_material

Each order specifies the material to be used. We can join to the material table to find a description or cost per meter. Material 1 is Silk, Black, Plain costing £7 per meter.


In [5]:
order_line.head()

[36mres4[39m: [32mRow[39m = [1,1,1,8,1]

## 2.
A sample join:

In order to translate the numbers in order_line into meaningful values we need to join a related table. For example if we want to access the descriptions of the materials we need to join the material table.

To achieve the join we include the table material on the FROM line and the join condition as a WHERE clause.

For each pair of tables there is a join condition between them (if they are linked). To find the join condition between order_line and material we look at the order_line table CREATE statement and notice the line that specifies that ol_material references the material table. This link will always be to the primary key of material table.

```sql
CREATE TABLE order_line (
  order_ref	INTEGER	NOT NULL REFERENCES dress_order
 ,line_no	INTEGER	NOT NULL
 ,ol_style	INTEGER	REFERENCES garment
 ,ol_size	INTEGER	NOT NULL
 ,ol_material	INTEGER	REFERENCES material
 ,PRIMARY KEY (order_ref, line_no)
 ,FOREIGN KEY (ol_style, ol_size) REFERENCES quantities
 );
SELECT order_ref, line_no, fabric, colour, pattern, cost
  FROM order_line, material
 WHERE ol_material = material_no
```

In [6]:
(order_line
 .join(material, (order_line("ol_material") === material("material_no")))
 .select("order_ref", "line_no", "fabric", "colour", "pattern", "cost")
 .showHTML())

order_ref,line_no,fabric,colour,pattern,cost
12,3,Silk,Black,Plain,7.0
7,1,Silk,Black,Plain,7.0
1,1,Silk,Black,Plain,7.0
12,4,Silk,Red Abstract,Printed,10.0
7,2,Silk,Red Abstract,Printed,10.0
1,2,Silk,Red Abstract,Printed,10.0
12,5,Cotton,Yellow Stripe,Woven,3.0
7,3,Cotton,Yellow Stripe,Woven,3.0
2,1,Cotton,Yellow Stripe,Woven,3.0
8,1,Cotton,Green Stripe,Woven,3.0


## 3.
To get a description of the garment we need to join the garment table. The join condition is that the ol_style in order_line matches the style_no in garment.

```sql
SELECT order_ref, line_no, description
  FROM order_line, garment
 WHERE ol_style = style_no
```

In [7]:
(order_line
 .join(garment, (order_line("ol_style") === garment("style_no")))
 .select("order_ref", "line_no", "description")
 .showHTML())

order_ref,line_no,description
1,1,Trousers
1,2,Long Skirt
2,1,Shorts
2,2,Short Skirt
2,3,Sundress
3,1,Suntop
4,1,Trousers
4,2,Long Skirt
5,1,Shorts
5,2,Short Skirt


## 4.
If we need both the description and the fabric we can join both material and garment to the order_line table. The join conditions are combined with "AND"

```sql
SELECT order_ref, line_no, fabric, description
  FROM order_line, material, garment
 WHERE ol_style = style_no
   AND ol_material = material_no
```

In [8]:
(order_line
 .join(material, (order_line("ol_material") === material("material_no")))
 .join(garment, (order_line("ol_style") === garment("style_no")))
 .select("order_ref", "line_no", "fabric", "description")
 .showHTML())

order_ref,line_no,fabric,description
12,3,Silk,Suntop
7,1,Silk,Short Skirt
1,1,Silk,Trousers
12,4,Silk,Trousers
7,2,Silk,Sundress
1,2,Silk,Long Skirt
12,5,Cotton,Long Skirt
7,3,Cotton,Suntop
2,1,Cotton,Shorts
8,1,Cotton,Suntop


## 5.
The quantities table tells us how much material is required for every garment for every size available. The join between the order_line and quantities is unusual in that it involves two fields. This can be seen by the fact that quantities has a composite key.

```sql
SELECT order_ref, line_no, quantity
  FROM order_line, quantities
 WHERE ol_style = style_q
   AND ol_size  = size_q
```

In [9]:
(order_line
 .join(quantities, ((order_line("ol_style") === quantities("style_q")) && 
                    (order_line("ol_size") === quantities("size_q"))))
 .select("order_ref", "line_no", "quantity")
 .showHTML())

order_ref,line_no,quantity
12,4,2.7
6,1,2.7
1,1,2.7
4,1,2.8
10,2,3.0
11,1,3.0
12,5,3.4
6,2,3.4
1,2,3.4
4,2,3.8


## 6.
Customers place orders - each order contains many lines - each line of the order refers to a garment:

```sql
SELECT c_name, order_date, order_no, line_no
   FROM jmcust, dress_order, order_line
  WHERE          jmcust.c_no = dress_order.cust_no
  AND   dress_order.order_no = order_line.order_ref
```

In [10]:
(jmcust.join(dress_order, (jmcust("c_no") === dress_order("cust_no")))
 .join(order_line, (dress_order("order_no") === order_line("order_ref")))
 .select("c_name", "order_date", "order_no", "line_no")
 .showHTML())

c_name,order_date,order_no,line_no
Ms Black,2002-02-27,8,3
Ms Black,2002-02-27,8,2
Ms Black,2002-02-27,8,1
Ms Brown,2002-02-27,9,1
Ms Brown,2002-02-21,7,3
Ms Brown,2002-02-21,7,2
Ms Brown,2002-02-21,7,1
Ms Gray,2002-02-28,10,2
Ms Gray,2002-02-28,10,1
Ms Gray,2002-02-20,6,3


## 7.
There's also a dress maker table, and a table called construction which gives you information about who made which order and when:

```sql
SELECT d_no, d_name, construction.order_ref, construction.line_ref, start_date, finish_date
 FROM dressmaker, order_line, construction
 WHERE  d_no=maker  
 AND order_line.order_ref=construction.order_ref 
 AND order_line.line_no=construction.line_ref
```

In [11]:
(dressmaker
 .join(construction, (dressmaker("d_no") === construction("maker")))
 .join(order_line, ((construction("order_ref") === order_line("order_ref")) &&
                    (construction("line_ref") === order_line("line_no"))))
 .select(col("d_no"), col("d_name"), col("order_line.order_ref"), 
         col("line_ref"), col("start_date"), col("finish_date"))
 .showHTML())

d_no,d_name,order_ref,line_ref,start_date,finish_date
1,Mrs Hem,1,1,2002-01-10,2002-03-05
1,Mrs Hem,4,2,2002-02-02,2002-03-25
1,Mrs Hem,7,1,2002-02-21,
1,Mrs Hem,10,1,2002-02-28,
1,Mrs Hem,12,3,2002-03-03,
2,Miss Stitch,1,2,2002-01-10,2002-03-15
2,Miss Stitch,5,1,2002-02-03,2002-03-15
2,Miss Stitch,7,2,2002-02-21,
2,Miss Stitch,10,2,2002-03-28,
2,Miss Stitch,12,4,2002-03-03,


In [12]:
spark.stop()