# SQL with Python Reference Guide 7
# NULL Values
## (Justin M. Olds)
Based on Stanford SQL course: https://lagunita.stanford.edu/courses/DB/SQL/SelfPaced/info

---
**NULL values overview** 

A special values that specifies that an entity is undefined or unknown.

In [1]:
import sqlite3
import pandas as pd

conn = sqlite3.connect("class.db")
c = conn.cursor()

---
### Tables and Insert code below (same as before--college admissions data)

In [4]:
c.execute('DROP TABLE IF EXISTS College')
c.execute('DROP TABLE IF EXISTS Student') 
c.execute('DROP TABLE IF EXISTS Apply') 

c.execute('CREATE TABLE College(cName TEXT, state TEXT, enrollment INT)')
c.execute('CREATE TABLE Student(sID INT, sName TEXT, GPA REAL, sizeHS INT)')
c.execute('CREATE TABLE Apply(sID INT, cName TEXT, major TEXT, decision TEXT)')
conn.commit()

In [5]:
c.execute('DELETE FROM Student')
c.execute('DELETE FROM College')
c.execute('DELETE FROM Apply')

c.execute("INSERT INTO Student VALUES (123, 'Amy', 3.9, 1000)")
c.execute("INSERT INTO Student values (234, 'Bob', 3.6, 1500)")
c.execute("INSERT INTO Student values (345, 'Craig', 3.5, 500)")
c.execute("INSERT INTO Student values (456, 'Doris', 3.9, 1000)")
c.execute("INSERT INTO Student values (567, 'Edward', 2.9, 2000)")
c.execute("INSERT INTO Student values (678, 'Fay', 3.8, 200)")
c.execute("INSERT INTO Student values (789, 'Gary', 3.4, 800)")
c.execute("INSERT INTO Student values (987, 'Helen', 3.7, 800)")
c.execute("INSERT INTO Student values (876, 'Irene', 3.9, 400)")
c.execute("INSERT INTO Student values (765, 'Jay', 2.9, 1500)")
c.execute("INSERT INTO Student values (654, 'Amy', 3.9, 1000)")
c.execute("INSERT INTO Student values (543, 'Craig', 3.4, 2000)")

c.execute("INSERT INTO College values ('Stanford', 'CA', 15000)")
c.execute("INSERT INTO College values ('Berkeley', 'CA', 36000)")
c.execute("INSERT INTO College values ('MIT', 'MA', 10000)")
c.execute("INSERT INTO College values ('Cornell', 'NY', 21000)")

c.execute("INSERT INTO Apply values (123, 'Stanford', 'CS', 'Y')")
c.execute("INSERT INTO Apply values (123, 'Stanford', 'EE', 'N')")
c.execute("INSERT INTO Apply values (123, 'Berkeley', 'CS', 'Y')")
c.execute("INSERT INTO Apply values (123, 'Cornell', 'EE', 'Y')")
c.execute("INSERT INTO Apply values (234, 'Berkeley', 'biology', 'N')")
c.execute("INSERT INTO Apply values (345, 'MIT', 'bioengineering', 'Y')")
c.execute("INSERT INTO Apply values (345, 'Cornell', 'bioengineering', 'N')")
c.execute("INSERT INTO Apply values (345, 'Cornell', 'CS', 'Y')")
c.execute("INSERT INTO Apply values (345, 'Cornell', 'EE', 'N')")
c.execute("INSERT INTO Apply values (678, 'Stanford', 'history', 'Y')")
c.execute("INSERT INTO Apply values (987, 'Stanford', 'CS', 'Y')")
c.execute("INSERT INTO Apply values (987, 'Berkeley', 'CS', 'Y')")
c.execute("INSERT INTO Apply values (876, 'Stanford', 'CS', 'N')")
c.execute("INSERT INTO Apply values (876, 'MIT', 'biology', 'Y')")
c.execute("INSERT INTO Apply values (876, 'MIT', 'marine biology', 'N')")
c.execute("INSERT INTO Apply values (765, 'Stanford', 'history', 'Y')")
c.execute("INSERT INTO Apply values (765, 'Cornell', 'history', 'N')")
c.execute("INSERT INTO Apply values (765, 'Cornell', 'psychology', 'Y')")
c.execute("INSERT INTO Apply values (543, 'MIT', 'CS', 'N')")
conn.commit()


---
### Add some students to the Student table with null values for GPA
### (Kevin and Lori)

In [6]:
c.execute("INSERT INTO Student values (432, 'Kevin', NULL, 1500)")
c.execute("INSERT INTO Student values (321, 'Lori', NULL, 2500)")

<sqlite3.Cursor at 0x19dd519adc0>

---
### Start with a simple query that selects students with GPAs greater than 3.5
Kevin and Lori are excluded from the results. 

In [7]:
df = pd.read_sql_query("""
    SELECT sID, sName, GPA
    FROM Student
    WHERE GPA > 3.5
""", conn);df

Unnamed: 0,sID,sName,GPA
0,123,Amy,3.9
1,234,Bob,3.6
2,456,Doris,3.9
3,678,Fay,3.8
4,987,Helen,3.7
5,876,Irene,3.9
6,654,Amy,3.9


### Satisfying one WHERE clause normally and not the other due to a NULL value will return the row.
Kevin is returned in the output below.

In [8]:
df = pd.read_sql_query("""
    SELECT sID, sName, GPA, sizeHS
    FROM Student
    WHERE GPA > 3.5 OR sizeHS < 1600
""", conn);df

Unnamed: 0,sID,sName,GPA,sizeHS
0,123,Amy,3.9,1000
1,234,Bob,3.6,1500
2,345,Craig,3.5,500
3,456,Doris,3.9,1000
4,678,Fay,3.8,200
5,789,Gary,3.4,800
6,987,Helen,3.7,800
7,876,Irene,3.9,400
8,765,Jay,2.9,1500
9,654,Amy,3.9,1000


### NULL values and aggregate functions
There are many subtleties of NULL values within aggregate functions and subqueries. **Be careful**. 

For example, if COUNT DISTINCT is used NULL values will not be counted. However, if a SELECT statement asks for DISTINCT items NULL values can be returned as one of the distinct items. See Below:

In [11]:
df = pd.read_sql_query("""
    SELECT COUNT(DISTINCT GPA)
    FROM Student
""", conn);df

Unnamed: 0,COUNT(DISTINCT GPA)
0,7


In [12]:
df = pd.read_sql_query("""
    SELECT DISTINCT GPA
    FROM Student
""", conn);df

Unnamed: 0,GPA
0,3.9
1,3.6
2,3.5
3,2.9
4,3.8
5,3.4
6,3.7
7,
