# Help Desk - Easy

## Scenario
A software company has been successful in selling its products to a number of customer organisations, and there is now a high demand for technical support. There is already a system in place for logging support calls taken over the telephone and assigning them to engineers, but it is based on a series of spreadsheets. With the growing volume of data, using the spreadsheet system is becoming slow, and there is a significant risk that errors will be made.

![rel](https://sqlzoo.net/w/images/3/38/Helpdesk.png)

In [1]:
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF)

import $ivy.`org.apache.spark::spark-sql:2.4.0`
import org.apache.spark.sql._
import org.apache.spark.sql.functions._

val spark = {
    NotebookSparkSession.builder()
    .progress(false)
    .appName("app12-1")
    .master("local[*]")
    .config("spark.sql.warehouse.dir", "hdfs://quickstart.cloudera:8020/user/hive/warehouse")
    .config("hive.metastore.uris", "thrift://quickstart.cloudera:9083")
    .config("spark.sql.catalogImplementation", "hive")
    .config("spark.sql.repl.eagerEval.enabled", "True")
    .getOrCreate()
}

import spark.implicits._

Loading spark-stubs, spark-hive
Creating SparkSession


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties


[32mimport [39m[36morg.apache.log4j.{Level, Logger}
[39m
[32mimport [39m[36m$ivy.$                                  
[39m
[32mimport [39m[36morg.apache.spark.sql._
[39m
[32mimport [39m[36morg.apache.spark.sql.functions._

[39m
[36mspark[39m: [32mSparkSession[39m = org.apache.spark.sql.SparkSession@433c7231
[32mimport [39m[36mspark.implicits._[39m

In [2]:
def sc = spark.sparkContext
val hiveCxt = new org.apache.spark.sql.hive.HiveContext(sc)

defined [32mfunction[39m [36msc[39m
[36mhiveCxt[39m: [32morg[39m.[32mapache[39m.[32mspark[39m.[32msql[39m.[32mhive[39m.[32mHiveContext[39m = org.apache.spark.sql.hive.HiveContext@30165391

In [3]:
// extend the DataFrame class to prettify the output of show()
implicit class RichDF(val ds:DataFrame) {
    def showHTML(limit: Int = -1, truncate: Int = 0) = {
        import xml.Utility.escape
        
        val data = if (limit < 0) ds.take(255) else ds.take(limit)
        val header = ds.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map { cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }

        publish.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map { row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)        
    }
}

defined [32mclass[39m [36mRichDF[39m

## 1.
There are three issues that include the words "index" and "Oracle". Find the call_date for each of them

```
+---------------------+----------+
| call_date           | call_ref |
+---------------------+----------+
| 2017-08-12 16:00:00 |     1308 |
| 2017-08-16 14:54:00 |     1697 |
| 2017-08-16 19:12:00 |     1731 |
+---------------------+----------+
```

In [4]:
val issue = hiveCxt.table("sqlzoo.Issue")
(issue.filter($"Detail".contains("index") && $"Detail".contains("Oracle"))
 .select("Call_date", "Call_ref")
 .showHTML())

20/07/05 17:16:14 INFO metastore: Trying to connect to metastore with URI thrift://quickstart.cloudera:9083
20/07/05 17:16:14 INFO metastore: Connected to metastore.


Call_date,Call_ref
2017-08-12 16:00:00.0,1308
2017-08-16 14:54:00.0,1697
2017-08-16 19:12:00.0,1731


[36missue[39m: [32mDataFrame[39m = [call_date: string, call_ref: int ... 5 more fields]

## 2.
Samantha Hall made three calls on 2017-08-14. Show the date and time for each

```
+---------------------+------------+-----------+
| call_date           | first_name | last_name |
+---------------------+------------+-----------+
| 2017-08-14 10:10:00 | Samantha   | Hall      |
| 2017-08-14 10:49:00 | Samantha   | Hall      |
| 2017-08-14 18:18:00 | Samantha   | Hall      |
+---------------------+------------+-----------+
```

In [5]:
val caller = hiveCxt.table("sqlzoo.Caller")
(issue.filter(to_date($"Call_date", "yyyy-MM-dd HH:mm:ss")==="2017-08-14")
 .join(caller.filter($"First_name"==="Samantha" && $"Last_name"==="Hall"), 
       issue("Caller_id")===caller("Caller_id"))
 .select("Call_date", "First_name", "Last_name")
 .showHTML())

Call_date,First_name,Last_name
2017-08-14 10:10:00.0,Samantha,Hall
2017-08-14 10:49:00.0,Samantha,Hall
2017-08-14 18:18:00.0,Samantha,Hall


[36mcaller[39m: [32mDataFrame[39m = [caller_id: int, company_ref: int ... 2 more fields]

## 3.
There are 500 calls in the system (roughly). Write a query that shows the number that have each status.

```
+--------+--------+
| status | Volume |
+--------+--------+
| Closed |    486 |
| Open   |     10 |
+--------+--------+
```

In [6]:
(issue.select("Status", "Caller_id")
 .groupBy("Status")
 .count()
 .orderBy("Status")
 .showHTML())

Status,count
Closed,486
Open,10


## 4.
Calls are not normally assigned to a manager but it does happen. How many calls have been assigned to staff who are at Manager Level?

```
+------+
| mlcc |
+------+
|   51 |
+------+
```

In [7]:
val staff = hiveCxt.table("sqlzoo.Staff")
val level = hiveCxt.table("sqlzoo.Level")
Seq(issue.join(staff, issue("Assigned_to")===staff("Staff_code"))
 .join(level, staff("Level_code")===level("Level_code"))
 .filter($"Manager"==="Y")
 .select("Call_ref")
 .count()).toDF("mlcc").showHTML()

mlcc
51


[36mstaff[39m: [32mDataFrame[39m = [staff_code: string, first_name: string ... 2 more fields]
[36mlevel[39m: [32mDataFrame[39m = [level_code: int, manager: string ... 2 more fields]

## 5.
Show the manager for each shift. Your output should include the shift date and type; also the first and last name of the manager.

```
+------------+------------+------------+-----------+
| Shift_date | Shift_type | first_name | last_name |
+------------+------------+------------+-----------+
| 2017-08-12 | Early      | Logan      | Butler    |
| 2017-08-12 | Late       | Ava        | Ellis     |
| 2017-08-13 | Early      | Ava        | Ellis     |
| 2017-08-13 | Late       | Ava        | Ellis     |
| 2017-08-14 | Early      | Logan      | Butler    |
| 2017-08-14 | Late       | Logan      | Butler    |
| 2017-08-15 | Early      | Logan      | Butler    |
| 2017-08-15 | Late       | Logan      | Butler    |
| 2017-08-16 | Early      | Logan      | Butler    |
| 2017-08-16 | Late       | Logan      | Butler    |
+------------+------------+------------+-----------+
```

In [8]:
val shift = hiveCxt.table("sqlzoo.Shift")
(shift.withColumn("shift_date", to_date($"Shift_date", "yyyy-MM-dd"))
 .join(staff, shift("Manager")===staff("Staff_code"))
 .select("shift_date", "Shift_type", "First_name", "Last_name")
 .orderBy("shift_date", "Shift_type")
 .dropDuplicates()
 .showHTML())

shift_date,Shift_type,First_name,Last_name
2017-08-12,Early,Logan,Butler
2017-08-12,Late,Ava,Ellis
2017-08-13,Early,Ava,Ellis
2017-08-13,Late,Ava,Ellis
2017-08-14,Early,Logan,Butler
2017-08-14,Late,Logan,Butler
2017-08-15,Early,Logan,Butler
2017-08-15,Late,Logan,Butler
2017-08-16,Early,Logan,Butler
2017-08-16,Late,Logan,Butler


[36mshift[39m: [32mDataFrame[39m = [shift_date: string, shift_type: string ... 4 more fields]

In [9]:
spark.stop()