É possível lidar com valores nulos usando usando `NOT NULL`. Porém, isso não é muito útil, pois muitas vezes não sabemos se um valor é nulo ou não. Por exemplo, se um usuário não preencher um campo de um formulário, o valor desse campo será nulo. 

Pode-se dar um valor padrão para nulos com a função `COALESCE`.

In [1]:
import pandas as pd
import sqlite3

Primeiro, vou trazer um exeplo de base com valores nulos para trabalhar:

In [15]:
con = sqlite3.connect('../primeiro_banco')
cur = con.cursor()
print('conexão aberta')

conexão aberta


In [16]:
query = 'SELECT * FROM Customers LIMIT 3'

df = pd.read_sql_query(query, con)
df

Unnamed: 0,CustomerKey,FirstName,LastName,BirthDate,MaritalStatus,Gender,EmailAddress,AnnualIncome,TotalChildren,EducationLevel,Occupation,HomeOwner
0,11000,JON,YANG,4/8/1966,M,M,jon24@adventure-works.com,90000.0,2,Bachelors,Professional,Y
1,11001,EUGENE,HUANG,5/14/1965,S,M,eugene10@adventure-works.com,60000.0,3,Bachelors,Professional,N
2,11002,RUBEN,TORRES,8/12/1965,M,M,ruben35@adventure-works.com,60000.0,3,Bachelors,Professional,Y


Criar uma tabela com valores nulos 

In [31]:
con.executescript('''
DROP TABLE IF EXISTS Customers_bk;
CREATE TABLE Customers_bk AS SELECT * FROM Customers;
'''
)

con.commit()

In [52]:
query = 'SELECT * FROM Customers_bk LIMIT 3'

df = pd.read_sql_query(query, con)
df

Unnamed: 0,CustomerKey,FirstName,LastName,BirthDate,MaritalStatus,Gender,EmailAddress,AnnualIncome,TotalChildren,EducationLevel,Occupation,HomeOwner
0,11000,JON,YANG,4/8/1966,M,M,jon24@adventure-works.com,90000.0,2,Bachelors,Professional,Y
1,11001,EUGENE,HUANG,5/14/1965,S,M,eugene10@adventure-works.com,,3,Bachelors,Professional,N
2,11002,RUBEN,TORRES,8/12/1965,M,M,ruben35@adventure-works.com,,3,Bachelors,Professional,Y


In [44]:
con.executescript('''
    UPDATE Customers_bk
    SET AnnualIncome = NULL
    WHERE AnnualIncome < 70000;
    '''
)

con.commit()

Encontrar valores nulos

In [50]:
query = '''
SELECT * 
FROM Customers_bk
WHERE AnnualIncome IS NULL
'''

df = pd.read_sql_query(query, con)
df

Unnamed: 0,CustomerKey,FirstName,LastName,BirthDate,MaritalStatus,Gender,EmailAddress,AnnualIncome,TotalChildren,EducationLevel,Occupation,HomeOwner
0,11001,EUGENE,HUANG,5/14/1965,S,M,eugene10@adventure-works.com,,3,Bachelors,Professional,N
1,11002,RUBEN,TORRES,8/12/1965,M,M,ruben35@adventure-works.com,,3,Bachelors,Professional,Y
2,11007,MARCO,MEHTA,5/9/1964,M,M,marco14@adventure-works.com,,3,Bachelors,Professional,Y
3,11008,ROBIN,VERHOFF,7/7/1964,S,F,rob4@adventure-works.com,,4,Bachelors,Professional,Y
4,11011,CURTIS,LU,11/4/1963,M,M,curtis9@adventure-works.com,,4,Bachelors,Professional,Y
...,...,...,...,...,...,...,...,...,...,...,...,...
11532,29479,TOMMY,TANG,7/4/1958,M,M,tommy2@adventure-works.com,,1,Graduate Degree,Clerical,Y
11533,29480,NINA,RAJI,11/10/1960,S,F,nina21@adventure-works.com,,3,Graduate Degree,Clerical,Y
11534,29481,IVAN,SURI,1/5/1960,S,M,ivan0@adventure-works.com,,3,Graduate Degree,Clerical,N
11535,29482,CLAYTON,ZHANG,3/5/1959,M,M,clayton0@adventure-works.com,,3,Bachelors,Clerical,Y


In [54]:
query = '''
SELECT COUNT(*) AS Nulos
FROM Customers_bk
WHERE AnnualIncome IS NULL
'''

df = pd.read_sql_query(query, con)
df

Unnamed: 0,Nulos
0,11537


Formas de lidar com nulos

Não usar as linhas com valores nulos

In [55]:
query = '''
SELECT *
FROM Customers_bk
WHERE AnnualIncome IS NOT NULL
'''

df = pd.read_sql_query(query, con)
df

Unnamed: 0,CustomerKey,FirstName,LastName,BirthDate,MaritalStatus,Gender,EmailAddress,AnnualIncome,TotalChildren,EducationLevel,Occupation,HomeOwner
0,11000,JON,YANG,4/8/1966,M,M,jon24@adventure-works.com,90000.0,2,Bachelors,Professional,Y
1,11003,CHRISTY,ZHU,2/15/1968,S,F,christy12@adventure-works.com,70000.0,0,Bachelors,Professional,N
2,11004,ELIZABETH,JOHNSON,8/8/1968,S,F,elizabeth5@adventure-works.com,80000.0,5,Bachelors,Professional,Y
3,11005,JULIO,RUIZ,8/5/1965,S,M,julio1@adventure-works.com,70000.0,0,Bachelors,Professional,Y
4,11009,SHANNON,CARLSON,4/1/1964,S,M,shannon38@adventure-works.com,70000.0,0,Bachelors,Professional,N
...,...,...,...,...,...,...,...,...,...,...,...,...
6606,29317,JAMES,HILL,7/3/1958,S,M,james64@adventure-works.com,90000.0,2,Bachelors,Professional,N
6607,29318,EDGAR,PEREZ,11/18/1958,M,M,edgar22@adventure-works.com,100000.0,4,Bachelors,Management,Y
6608,29319,ALVIN,PAL,7/14/1957,S,M,alvin34@adventure-works.com,70000.0,1,Partial College,Skilled Manual,N
6609,29320,WESLEY,HUANG,2/14/1960,M,M,wesley6@adventure-works.com,70000.0,5,Partial College,Skilled Manual,N


Calcular os valores de nulos e nao nulos

In [57]:
query = '''
SELECT SUM(CASE WHEN AnnualIncome IS NULL THEN 1 ELSE 0 END) AS Nulos,
    SUM(CASE WHEN AnnualIncome IS NOT NULL THEN 1 ELSE 0 END) AS NaoNulos,
    COUNT(*) AS Total
FROM Customers_bk
'''

df = pd.read_sql_query(query, con)
df

Unnamed: 0,Nulos,NaoNulos,Total
0,11537,6611,18148


Substituir valores nulos com valores sensíveis

In [59]:
query = 'SELECT ROUND(AVG(AnnualIncome), 0) FROM Customers_bk'

df = pd.read_sql_query(query, con)
df

Unnamed: 0,"ROUND(AVG(AnnualIncome), 0)"
0,90924.0


In [61]:
query = '''
SELECT *, COALESCE(AnnualIncome, 90924) AS Income
FROM Customers_bk
'''

df = pd.read_sql_query(query, con)
df

Unnamed: 0,CustomerKey,FirstName,LastName,BirthDate,MaritalStatus,Gender,EmailAddress,AnnualIncome,TotalChildren,EducationLevel,Occupation,HomeOwner,Income
0,11000,JON,YANG,4/8/1966,M,M,jon24@adventure-works.com,90000.0,2,Bachelors,Professional,Y,90000.0
1,11001,EUGENE,HUANG,5/14/1965,S,M,eugene10@adventure-works.com,,3,Bachelors,Professional,N,90924.0
2,11002,RUBEN,TORRES,8/12/1965,M,M,ruben35@adventure-works.com,,3,Bachelors,Professional,Y,90924.0
3,11003,CHRISTY,ZHU,2/15/1968,S,F,christy12@adventure-works.com,70000.0,0,Bachelors,Professional,N,70000.0
4,11004,ELIZABETH,JOHNSON,8/8/1968,S,F,elizabeth5@adventure-works.com,80000.0,5,Bachelors,Professional,Y,80000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
18143,29479,TOMMY,TANG,7/4/1958,M,M,tommy2@adventure-works.com,,1,Graduate Degree,Clerical,Y,90924.0
18144,29480,NINA,RAJI,11/10/1960,S,F,nina21@adventure-works.com,,3,Graduate Degree,Clerical,Y,90924.0
18145,29481,IVAN,SURI,1/5/1960,S,M,ivan0@adventure-works.com,,3,Graduate Degree,Clerical,N,90924.0
18146,29482,CLAYTON,ZHANG,3/5/1959,M,M,clayton0@adventure-works.com,,3,Bachelors,Clerical,Y,90924.0
