# Help Desk - Hard

In [1]:
# Prerequesites
from pyhive import hive
%load_ext sql
%sql hive://cloudera@quickstart.cloudera:10000/sqlzoo
%config SqlMagic.displaylimit = 20

## 11.
Show the manager and number of calls received for each hour of the day on 2017-08-12

```
+---------+---------------+----+
| Manager | Hr            | cc |
+---------+---------------+----+
| LB1     | 2017-08-12 08 |  6 |
| LB1     | 2017-08-12 09 | 16 |
| LB1     | 2017-08-12 10 | 11 |
| LB1     | 2017-08-12 11 |  6 |
| LB1     | 2017-08-12 12 |  8 |
| LB1     | 2017-08-12 13 |  4 |
| AE1     | 2017-08-12 14 | 12 |
| AE1     | 2017-08-12 15 |  8 |
| AE1     | 2017-08-12 16 |  8 |
| AE1     | 2017-08-12 17 |  7 |
| AE1     | 2017-08-12 19 |  5 |
+---------+---------------+----+
```

In [2]:
%%sql
SELECT Manager, DATE_FORMAT(Call_date, 'yyyy-MM-dd HH') hr, COUNT(Call_ref) cc
  FROM Issue JOIN Shift ON (Issue.Taken_by=Shift.Operator AND DATE_FORMAT(Issue.Call_date, 'yyyy-MM-dd')=Shift.Shift_date)
    WHERE DATE_FORMAT(Call_date, 'yyyy-MM-dd')='2017-08-12'
    GROUP BY DATE_FORMAT(Call_date, 'yyyy-MM-dd HH'), Manager

 * hive://cloudera@quickstart.cloudera:10000/sqlzoo
Done.


manager,hr,cc
LB1,2017-08-12 08,6
LB1,2017-08-12 09,16
LB1,2017-08-12 10,11
LB1,2017-08-12 11,6
LB1,2017-08-12 12,8
LB1,2017-08-12 13,4
AE1,2017-08-12 14,12
AE1,2017-08-12 15,8
AE1,2017-08-12 16,8
AE1,2017-08-12 17,7


## 12.
**80/20 rule. It is said that 80% of the calls are generated by 20% of the callers. Is this true? What percentage of calls are generated by the most active 20% of callers.**

Note - Andrew has not managed to do this in one query - but he believes it is possible.

```
+---------+
| t20pc   |
+---------+
| 32.2581 |
+---------+
```

In [3]:
%%sql
SELECT ROUND(100 * CAST(SUM(a.n) AS DOUBLE) / tot, 4) AS t20pc FROM
  (SELECT ROW_NUMBER() OVER (ORDER BY COUNT(1) DESC) rn,
   Caller_id, COUNT(1) n, SUM(COUNT(1)) OVER() tot, SUM(1) OVER() callers
   FROM Issue
   GROUP BY Caller_id
  ) AS a
WHERE a.rn <= 0.2*a.callers
GROUP BY tot, callers

 * hive://cloudera@quickstart.cloudera:10000/sqlzoo
Done.


t20pc
32.2581


## 13.
**Annoying customers. Customers who call in the last five minutes of a shift are annoying. Find the most active customer who has never been annoying.**

```
+--------------+------+
| Company_name | abna |
+--------------+------+
| High and Co. |   20 |
+--------------+------+
```

In [4]:
%%sql
WITH annoy AS (
 SELECT  Customer.Company_ref
 FROM Issue JOIN Shift ON (Issue.Taken_by=Shift.Operator AND 
                           DATE_FORMAT(Call_date, 'yyyy-MM-dd')=Shift.Shift_date)
   JOIN Shift_type ON (Shift.Shift_type=Shift_type.Shift_type)
   LEFT JOIN Caller ON (Issue.Caller_id=Caller.Caller_id)
   LEFT JOIN Customer ON (Caller.Company_ref=Customer.Company_ref)
 WHERE UNIX_TIMESTAMP(CONCAT_WS(' ', Shift_date, End_time), 'yyyy-MM-dd HH:mm') -
    UNIX_TIMESTAMP( Issue.Call_date)<=300
)
SELECT Company_name, Customer.Company_ref, COUNT(*) abna
FROM Issue JOIN Caller ON (Issue.Caller_id=Caller.Caller_id)
   JOIN Customer ON (Caller.Company_ref=Customer.Company_ref)
WHERE Customer.Company_ref NOT IN
    (SELECT Company_ref FROM annoy)
GROUP BY Customer.Company_ref, Company_name
ORDER BY abna DESC
LIMIT 1

 * hive://cloudera@quickstart.cloudera:10000/sqlzoo
Done.


company_name,company_ref,abna
High and Co.,146,20


## 14.
**Maximal usage. If every caller registered with a customer makes a call in one day then that customer has "maximal usage" of the service. List the maximal customers for 2017-08-13.**

```
+-------------------+--------------+-------------+
| company_name      | caller_count | issue_count |
+-------------------+--------------+-------------+
| Askew Inc.        |            2 |           2 |
| Bai Services      |            2 |           2 |
| Dasher Services   |            3 |           3 |
| High and Co.      |            5 |           5 |
| Lady Retail       |            4 |           4 |
| Packman Shipping  |            3 |           3 |
| Pitiable Shipping |            2 |           2 |
| Whale Shipping    |            2 |           2 |
+-------------------+--------------+-------------+
```

In [5]:
%%sql
SELECT a.company_name, COUNT(*) caller_count, SUM(n) issue_count FROM
(
 SELECT Customer.Company_ref, Company_name, 
    Caller.Caller_id, 
    CASE WHEN COUNT(Call_ref)>0 THEN 1 ELSE 0 END n
 FROM (SELECT * FROM Issue WHERE DATE_FORMAT(Call_date, 'yyyy-MM-dd')='2017-08-13') iss
    RIGHT JOIN Caller ON (iss.Caller_id=Caller.Caller_id)
    LEFT JOIN Customer ON (Caller.Company_ref=Customer.Company_ref)
    GROUP BY Customer.Company_ref, Caller.Caller_id, Company_name
) AS a
GROUP BY a.company_name
HAVING COUNT(*)=SUM(n)
ORDER BY a.company_name

 * hive://cloudera@quickstart.cloudera:10000/sqlzoo
Done.


company_name,caller_count,issue_count
Askew Inc.,2,2
Bai Services,2,2
Dasher Services,3,3
High and Co.,5,5
Lady Retail,4,4
Packman Shipping,3,3
Pitiable Shipping,2,2
Whale Shipping,2,2


## 15.
**Consecutive calls occur when an operator deals with two callers within 10 minutes. Find the longest sequence of consecutive calls – give the name of the operator and the first and last call date in the sequence.**

```
+----------+---------------------+---------------------+-------+
| taken_by | first_call          | last_call           | calls |
+----------+---------------------+---------------------+-------+
| AB1      | 2017-08-14 09:06:00 | 2017-08-14 10:17:00 |    24 |
+----------+---------------------+---------------------+-------+
```

_Solution in MySQL_:

```sql
SELECT a.taken_by, a.first_call, a.call_date AS last_call, a.call_count AS calls
FROM
(SELECT Issue.taken_by, 
       Issue.call_date,
       @counter := CASE WHEN TIMESTAMPDIFF(MINUTE, @current_call, Issue.call_date) <= 10
                          THEN @counter + 1
                        ELSE 1
                   END AS call_count,
       @first_call := CASE WHEN @counter = 1
                             THEN @first_call := call_date
                           ELSE @first_call
                      END AS first_call,
       @current_call := Issue.call_date
FROM Issue,
(SELECT @counter := 0, @first_call := 0, @current_call := 0) AS initvar
ORDER BY Issue.taken_by, Issue.call_date) AS a
ORDER BY a.call_count DESC
LIMIT 1;
```

In [6]:
%%sql
WITH t AS(
-- label consecutive calls 0
  SELECT Issue.*, 
    CASE WHEN UNIX_TIMESTAMP(Call_date) - UNIX_TIMESTAMP(LAG(Call_date, 1) OVER (
        PARTITION BY Taken_by ORDER BY Call_date)) > 600 THEN 1 
         ELSE 0 END flag
    FROM Issue
), g AS (
-- cumsum the flags for grouping
  SELECT t.*, SUM(t.flag) OVER (
      PARTITION BY t.Taken_by ORDER BY t.Call_date) AS grp
    FROM t
), rslt AS (
-- aggregate
  SELECT Taken_by, grp, MIN(Call_date) first_call, 
    MAX(Call_date) last_call, COUNT(Caller_id) n_calls
    FROM g
    GROUP BY Taken_by, grp
)
SELECT Taken_by, first_call, last_call, n_calls
  FROM rslt 
    ORDER BY n_calls DESC
    LIMIT 1

 * hive://cloudera@quickstart.cloudera:10000/sqlzoo
Done.


taken_by,first_call,last_call,n_calls
AB1,2017-08-14 09:06:00.0,2017-08-14 10:17:00.0,24
