# Using Null

- teacher

id	| dept	| name	| phone	| mobile
----|-------|-------|-------|-----
101	| 1 | Shrivell	| 2753 | 07986 555 1234
102	| 1	| Throd	    | 2754 | 07122 555 1920
103	| 1	| Splint	| 2293	|
104 |	| Spiregrain | 3287	|
105 | 2	| Cutflower	 | 3212 | 07996 555 6574
106 |	| Deadyawn | 3345 |	
... |      |        |        |

- dept

id	| name
----|----
1	| Computing
2	| Design
3	| Engineering
... |

### Teachers and Departments
The school includes many departments. Most teachers work exclusively for a single department. Some teachers have no department.

[Selecting NULL values](https://sqlzoo.net/wiki/Selecting_NULL_values).

In [1]:
import findspark
import pandas as pd
findspark.init()

SVR = '192.168.31.31'
from pyspark.sql import SparkSession

sc = (SparkSession.builder.appName('app08') 
      .master(f'spark://{SVR}:7077') 
      .config('spark.sql.warehouse.dir', f'hdfs://{SVR}:9000/user/hive/warehouse') 
      .config('spark.cores.max', '4') 
      .config('spark.executor.instances', '1') 
      .config('spark.executor.cores', '2') 
      .config('spark.executor.memory', '10g') 
      .enableHiveSupport().getOrCreate())

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


## 1. NULL, INNER JOIN, LEFT JOIN, RIGHT JOIN

List the teachers who have NULL for their department.

> _Why we cannot use =_   
> You might think that the phrase dept=NULL would work here but it doesn't - you can use the phrase dept IS NULL
> 
> _That's not a proper explanation._  
> No it's not, but you can read a better explanation at Wikipedia:NULL.

In [2]:
teacher = sc.read.table('sqlzoo.teacher')
dept = sc.read.table('sqlzoo.dept')

In [3]:
from pyspark.sql.functions import *
teacher.filter(isnull(teacher['dept'])).select('name').toPandas()

                                                                                

Unnamed: 0,name
0,Spiregrain
1,Deadyawn


## 2.
Note the INNER JOIN misses the teachers with no department and the departments with no teacher.

In [4]:
(teacher.withColumnRenamed('name', 'teacher')
     .join(dept, teacher['dept']==dept['id'])
    .select('teacher', 'name')
     .toPandas())

Unnamed: 0,teacher,name
0,Shrivell,Computing
1,Throd,Computing
2,Splint,Computing
3,Cutflower,Design


## 3.
Use a different JOIN so that all teachers are listed.

In [5]:
(teacher.withColumnRenamed('name', 'teacher')
    .join(dept, teacher['dept']==dept['id'], how='left')
    .select('teacher', 'name')
    .toPandas())

Unnamed: 0,teacher,name
0,Shrivell,Computing
1,Throd,Computing
2,Splint,Computing
3,Spiregrain,
4,Cutflower,Design
5,Deadyawn,


## 4.
Use a different JOIN so that all departments are listed.

In [6]:
(teacher.withColumnRenamed('name', 'teacher')
    .join(dept, teacher['dept']==dept['id'], how='right')
    .select('teacher', 'name')
    .toPandas())

Unnamed: 0,teacher,name
0,Splint,Computing
1,Throd,Computing
2,Shrivell,Computing
3,Cutflower,Design
4,,Engineering


## 5. Using the [COALESCE](https://sqlzoo.net/wiki/COALESCE) function


Use COALESCE to print the mobile number. Use the number '07986 444 2266' if there is no number given. **Show teacher name and mobile number or '07986 444 2266'**

In [7]:
teacher.select('name', 'mobile').fillna({'mobile': '07986 444 2266'}).toPandas()

Unnamed: 0,name,mobile
0,Shrivell,07986 555 1234
1,Throd,07122 555 1920
2,Splint,07986 444 2266
3,Spiregrain,07986 444 2266
4,Cutflower,07996 555 6574
5,Deadyawn,07986 444 2266


## 6.
Use the COALESCE function and a LEFT JOIN to print the teacher name and department name. Use the string 'None' where there is no department.

In [8]:
(teacher.withColumnRenamed('name', 'teacher')
    .join(dept, teacher['dept']==dept['id'], how='left')
    .select('teacher', 'name')
    .fillna({'name': 'None'})
    .toPandas())

Unnamed: 0,teacher,name
0,Shrivell,Computing
1,Throd,Computing
2,Splint,Computing
3,Spiregrain,
4,Cutflower,Design
5,Deadyawn,


## 7.
Use COUNT to show the number of teachers and the number of mobile phones.

In [9]:
teacher.agg({'name': 'count', 'mobile': 'count'}).toPandas()

Unnamed: 0,count(name),count(mobile)
0,6,3


## 8.
Use COUNT and GROUP BY **dept.name** to show each department and the number of staff. Use a RIGHT JOIN to ensure that the Engineering department is listed.

In [10]:
(teacher.withColumnRenamed('name', 'teacher')
 .join(dept, teacher['dept']==dept['id'], how='right')
 .groupBy('name')
 .agg({'teacher': 'count'})
 .toPandas())

Unnamed: 0,name,count(teacher)
0,Computing,3
1,Design,1
2,Engineering,0


## 9. Using [CASE](https://sqlzoo.net/wiki/CASE)


Use CASE to show the **name** of each teacher followed by 'Sci' if the teacher is in **dept** 1 or 2 and 'Art' otherwise.

In [11]:
(teacher.select('name', 'dept', when(teacher['dept'].isin([1, 2]), 'Sci')
                .otherwise('Art').alias('label'))
 .toPandas())

Unnamed: 0,name,dept,label
0,Shrivell,1.0,Sci
1,Throd,1.0,Sci
2,Splint,1.0,Sci
3,Spiregrain,,Art
4,Cutflower,2.0,Sci
5,Deadyawn,,Art


## 10.
Use CASE to show the name of each teacher followed by 'Sci' if the teacher is in dept 1 or 2, show 'Art' if the teacher's dept is 3 and 'None' otherwise.

In [12]:
(teacher.select('name', 'dept', 
                when(teacher['dept'].isin([1, 2]), 'Sci')
                .when(teacher['dept'].isin([3, ]), 'Art')
                .otherwise('None').alias('label'))
 .toPandas())

Unnamed: 0,name,dept,label
0,Shrivell,1.0,Sci
1,Throd,1.0,Sci
2,Splint,1.0,Sci
3,Spiregrain,,
4,Cutflower,2.0,Sci
5,Deadyawn,,


In [13]:
sc.stop()