# Help Desk - Easy

## Scenario
A software company has been successful in selling its products to a number of customer organisations, and there is now a high demand for technical support. There is already a system in place for logging support calls taken over the telephone and assigning them to engineers, but it is based on a series of spreadsheets. With the growing volume of data, using the spreadsheet system is becoming slow, and there is a significant risk that errors will be made.

![rel](https://sqlzoo.net/w/images/3/38/Helpdesk.png)

In [1]:
import findspark
import pandas as pd
findspark.init()

SVR = '192.168.31.31'
from pyspark.sql import SparkSession
from pyspark.sql.functions import *

sc = (SparkSession.builder.appName('app12-1') 
      .master(f'spark://{SVR}:7077') 
      .config('spark.sql.warehouse.dir', f'hdfs://{SVR}:9000/user/hive/warehouse') 
      .config('spark.cores.max', '4') 
      .config('spark.executor.instances', '1') 
      .config('spark.executor.cores', '2') 
      .config('spark.executor.memory', '10g') 
      .enableHiveSupport().getOrCreate())

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [2]:
shift = sc.read.table('sqlzoo.Shift')
staff = sc.read.table('sqlzoo.Staff')
issue = sc.read.table('sqlzoo.Issue')
shift_type = sc.read.table('sqlzoo.Shift_type')
level = sc.read.table('sqlzoo.Level')
customer = sc.read.table('sqlzoo.Customer')
caller = sc.read.table('sqlzoo.Caller')

## 1.
There are three issues that include the words "index" and "Oracle". Find the call_date for each of them

```
+---------------------+----------+
| call_date           | call_ref |
+---------------------+----------+
| 2017-08-12 16:00:00 |     1308 |
| 2017-08-16 14:54:00 |     1697 |
| 2017-08-16 19:12:00 |     1731 |
+---------------------+----------+
```

In [3]:
(issue.filter((col('Detail').contains('index')) & 
              (col('Detail').contains('Oracle')))
 .select('Call_date', 'Call_ref')
 .toPandas())

                                                                                

Unnamed: 0,Call_date,Call_ref
0,2017-08-12 16:00:00.0,1308
1,2017-08-16 14:54:00.0,1697
2,2017-08-16 19:12:00.0,1731


## 2.
Samantha Hall made three calls on 2017-08-14. Show the date and time for each

```
+---------------------+------------+-----------+
| call_date           | first_name | last_name |
+---------------------+------------+-----------+
| 2017-08-14 10:10:00 | Samantha   | Hall      |
| 2017-08-14 10:49:00 | Samantha   | Hall      |
| 2017-08-14 18:18:00 | Samantha   | Hall      |
+---------------------+------------+-----------+
```

In [4]:
(issue.filter((to_date(col('Call_date'))=='2017-08-14'))
 .join(caller.filter((col('First_name')=='Samantha') & 
                     (col('Last_name')=='Hall')), on='Caller_id')
 .select('Call_date', 'First_name', 'Last_name')
 .toPandas())

Unnamed: 0,Call_date,First_name,Last_name
0,2017-08-14 10:10:00.0,Samantha,Hall
1,2017-08-14 10:49:00.0,Samantha,Hall
2,2017-08-14 18:18:00.0,Samantha,Hall


## 3.
There are 500 calls in the system (roughly). Write a query that shows the number that have each status.

```
+--------+--------+
| status | Volume |
+--------+--------+
| Closed |    486 |
| Open   |     10 |
+--------+--------+
```

In [5]:
(issue.groupBy('Status')
 .agg(count('Call_ref').alias('Volume'))
 .toPandas())

Unnamed: 0,Status,Volume
0,Open,10
1,Closed,486


## 4.
Calls are not normally assigned to a manager but it does happen. How many calls have been assigned to staff who are at Manager Level?

```
+------+
| mlcc |
+------+
|   51 |
+------+
```

In [6]:
(issue.join(staff, on=(issue['Assigned_to']==staff['Staff_code']))
 .join(level.filter(col('Manager')=='Y'), 'Level_code')
 .agg(count('Call_ref').alias('mlcc'))
 .toPandas())

Unnamed: 0,mlcc
0,51


## 5.
Show the manager for each shift. Your output should include the shift date and type; also the first and last name of the manager.

```
+------------+------------+------------+-----------+
| Shift_date | Shift_type | first_name | last_name |
+------------+------------+------------+-----------+
| 2017-08-12 | Early      | Logan      | Butler    |
| 2017-08-12 | Late       | Ava        | Ellis     |
| 2017-08-13 | Early      | Ava        | Ellis     |
| 2017-08-13 | Late       | Ava        | Ellis     |
| 2017-08-14 | Early      | Logan      | Butler    |
| 2017-08-14 | Late       | Logan      | Butler    |
| 2017-08-15 | Early      | Logan      | Butler    |
| 2017-08-15 | Late       | Logan      | Butler    |
| 2017-08-16 | Early      | Logan      | Butler    |
| 2017-08-16 | Late       | Logan      | Butler    |
+------------+------------+------------+-----------+
```

In [7]:
(shift.withColumn('shift_date', to_date(col('Shift_date')))
 .join(staff, on=(shift['Manager']==staff['Staff_code']))
 .select('shift_date', 'Shift_type', 'First_name', 'Last_name')
 .dropDuplicates()
 .orderBy('shift_date', 'Shift_type')
 .toPandas())

Unnamed: 0,shift_date,Shift_type,First_name,Last_name
0,2017-08-12,Early,Logan,Butler
1,2017-08-12,Late,Ava,Ellis
2,2017-08-13,Early,Ava,Ellis
3,2017-08-13,Late,Ava,Ellis
4,2017-08-14,Early,Logan,Butler
5,2017-08-14,Late,Logan,Butler
6,2017-08-15,Early,Logan,Butler
7,2017-08-15,Late,Logan,Butler
8,2017-08-16,Early,Logan,Butler
9,2017-08-16,Late,Logan,Butler


In [9]:
sc.stop()