<a href="https://colab.research.google.com/github/sethkipsangmutuba/SQL/blob/main/1c.%20Using_logical_and_comparison_operators_%5BNotebook%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Logical & Comparison Tools with Example References

This table maps key SQL concepts to Titanic-specific examples covered in this notebook:

| SQL Concept             | Titanic Mapping Example                            | Covered in Examples |
|-------------------------|-----------------------------------------------------|----------------------|
| `IS NULL`               | `WHERE fare IS NULL`                                | #3                   |
| `IS NOT NULL`           | `WHERE age IS NOT NULL`                             |                      |
| `IN`                    | `WHERE embark_town IN ('Cherbourg', 'Queenstown')` | #2                   |
| `NOT IN`                | `WHERE class NOT IN ('First')`                      | #5                   |
| `NOT (...)`             | Logical negation of a condition                     | #1, #10              |
| `IN` with `AND`         | Combined set filtering                              | #2                   |
| `IS NULL` with `IN`     | Filter nulls inside an `IN` condition               | #3                   |
| `BETWEEN` + other filters | Range filter with additional conditions           | #4, #10              |
| `OR` with `IS NULL`     | Conditional logic with nulls                        | #6                   |
| `LIKE` pattern matching | Match text patterns (e.g., `LIKE 'S%'`)            | #7                   |
| `CASE` + `IS NULL` logic| Categorizing data with missing values              | #8                   |
| `GROUP BY` + logic      | Aggregation with filtering                          | #9                   |
| `AND`, `OR`             | Combine multiple conditions                         | Most queries         |
| `ORDER BY`              | Sort by `survived`, `fare`, etc.                   |                      |


In [150]:
import seaborn as sns
import sqlite3
import pandas as pd

df = sns.load_dataset("titanic")
conn = sqlite3.connect("titanic.db")
df.to_sql("titanic", conn, if_exists="replace", index=False)


891

Select passengers in a specific group (e.g., "adult_male" survivors)

In [151]:
pd.read_sql("""
SELECT who, survived, age, fare, class
FROM titanic
WHERE who = 'adult_male' AND survived = 1
""", conn)


Unnamed: 0,who,survived,age,fare,class


Check for NULL values in the fare column

In [152]:
pd.read_sql("""
SELECT who, age, fare, class
FROM titanic
WHERE fare IS NULL
""", conn)


Unnamed: 0,who,age,fare,class


Exclude rows where fare is NULL (i.e., clean data)

In [153]:
pd.read_sql("""
SELECT who, age, fare, survived
FROM titanic
WHERE fare IS NOT NULL
""", conn)


Unnamed: 0,who,age,fare,survived
0,man,22.0,7.2500,0
1,woman,38.0,71.2833,1
2,woman,26.0,7.9250,1
3,woman,35.0,53.1000,1
4,man,35.0,8.0500,0
...,...,...,...,...
886,man,27.0,13.0000,0
887,woman,19.0,30.0000,1
888,woman,,23.4500,0
889,man,26.0,30.0000,1


#Compare survival/fare among top 5 groups (IN)
Simulate "top economies" with five common embark_town values:

In [154]:
pd.read_sql("""
SELECT embark_town, fare, survived, class
FROM titanic
WHERE embark_town IN ('Southampton', 'Cherbourg', 'Queenstown')
AND fare IS NOT NULL
ORDER BY fare DESC
""", conn)


Unnamed: 0,embark_town,fare,survived,class
0,Cherbourg,512.3292,1,First
1,Cherbourg,512.3292,1,First
2,Cherbourg,512.3292,1,First
3,Southampton,263.0000,0,First
4,Southampton,263.0000,1,First
...,...,...,...,...
884,Southampton,0.0000,0,Second
885,Southampton,0.0000,0,Second
886,Southampton,0.0000,0,First
887,Southampton,0.0000,0,First


Look at the rest (excluding top 3 towns) using NOT IN

In [155]:
pd.read_sql("""
SELECT embark_town, fare, survived, class
FROM titanic
WHERE embark_town NOT IN ('Southampton', 'Cherbourg', 'Queenstown')
AND fare IS NOT NULL
ORDER BY fare DESC
""", conn)


Unnamed: 0,embark_town,fare,survived,class


Sort by "access to safety" proxy → survival rate by fare level

Group by Fare Range

In [156]:
pd.read_sql("""
SELECT
  CASE
    WHEN fare < 10 THEN 'Low Fare'
    WHEN fare BETWEEN 10 AND 50 THEN 'Mid Fare'
    WHEN fare > 50 THEN 'High Fare'
  END AS fare_group,
  COUNT(*) AS total,
  SUM(survived) AS survived,
  ROUND(AVG(survived)*100, 1) AS survival_rate
FROM titanic
WHERE fare IS NOT NULL
GROUP BY fare_group
ORDER BY survival_rate DESC
""", conn)


Unnamed: 0,fare_group,total,survived,survival_rate
0,High Fare,160,109,68.1
1,Mid Fare,395,166,42.0
2,Low Fare,336,67,19.9


#NOT with Combined Conditions
Find passengers who were NOT female and NOT from 1st class

In [157]:
pd.read_sql("""
SELECT sex, class, age
FROM titanic
WHERE NOT (sex = 'female' OR class = 'First')
""", conn)


Unnamed: 0,sex,class,age
0,male,Third,22.0
1,male,Third,35.0
2,male,Third,
3,male,Third,2.0
4,male,Third,20.0
...,...,...,...
450,male,Third,33.0
451,male,Second,28.0
452,male,Third,25.0
453,male,Second,27.0


#Complex IN Filter with AND
Find passengers who embarked in known towns AND are in 2nd or 3rd class

In [158]:
pd.read_sql("""
SELECT embark_town, class, fare
FROM titanic
WHERE embark_town IN ('Cherbourg', 'Southampton')
  AND class IN ('Second', 'Third')
""", conn)


Unnamed: 0,embark_town,class,fare
0,Southampton,Third,7.2500
1,Southampton,Third,7.9250
2,Southampton,Third,8.0500
3,Southampton,Third,21.0750
4,Southampton,Third,11.1333
...,...,...,...
595,Southampton,Third,10.5167
596,Southampton,Second,10.5000
597,Southampton,Third,7.0500
598,Southampton,Second,13.0000


#Combined IS NULL + IN
Passengers from major towns who are missing age

In [159]:
pd.read_sql("""
SELECT who, age, embark_town
FROM titanic
WHERE age IS NULL AND embark_town IN ('Cherbourg', 'Queenstown')
""", conn)


Unnamed: 0,who,age,embark_town
0,man,,Queenstown
1,woman,,Cherbourg
2,man,,Cherbourg
3,woman,,Queenstown
4,woman,,Cherbourg
...,...,...,...
82,man,,Queenstown
83,man,,Cherbourg
84,man,,Cherbourg
85,woman,,Cherbourg


#Range + Survival Check (BETWEEN, AND)
Passengers between ages 30–40 who survived

In [160]:
pd.read_sql("""
SELECT age, survived, class
FROM titanic
WHERE age BETWEEN 30 AND 40 AND survived = 1
""", conn)


Unnamed: 0,age,survived,class
0,38.0,1,First
1,35.0,1,First
2,34.0,1,Second
3,38.0,1,Third
4,38.0,1,First
...,...,...,...
74,31.0,1,Second
75,33.0,1,First
76,39.0,1,First
77,32.0,1,Third


#Use of NOT IN on Categorical Data
Passengers NOT in 1st or 2nd class

In [161]:
pd.read_sql("""
SELECT class, age, fare
FROM titanic
WHERE class NOT IN ('First', 'Second')
""", conn)


Unnamed: 0,class,age,fare
0,Third,22.0,7.2500
1,Third,26.0,7.9250
2,Third,35.0,8.0500
3,Third,,8.4583
4,Third,2.0,21.0750
...,...,...,...
486,Third,22.0,10.5167
487,Third,25.0,7.0500
488,Third,39.0,29.1250
489,Third,,23.4500


#Low Fare OR Missing Fare
Who paid very little OR missing fare?

In [162]:
pd.read_sql("""
SELECT who, fare, survived
FROM titanic
WHERE fare < 10 OR fare IS NULL
""", conn)


Unnamed: 0,who,fare,survived
0,man,7.2500,0
1,woman,7.9250,1
2,man,8.0500,0
3,man,8.4583,0
4,man,8.0500,0
...,...,...,...
331,man,7.8958,0
332,man,7.8958,0
333,man,7.8958,0
334,man,7.0500,0


#LIKE with Wildcard
Passengers with names starting with “Mrs”

In [163]:
pd.read_sql("""
SELECT who, sex, age
FROM titanic
WHERE who = 'woman'
""", conn)


Unnamed: 0,who,sex,age
0,woman,female,38.0
1,woman,female,26.0
2,woman,female,35.0
3,woman,female,27.0
4,woman,female,58.0
...,...,...,...
266,woman,female,25.0
267,woman,female,22.0
268,woman,female,39.0
269,woman,female,19.0


#Null vs Non-null Survival Stats
Compare counts: known vs unknown age

In [164]:
pd.read_sql("""
SELECT
  CASE WHEN age IS NULL THEN 'Unknown Age' ELSE 'Known Age' END AS age_group,
  COUNT(*) AS total,
  SUM(survived) AS survivors
FROM titanic
GROUP BY age_group
""", conn)


Unnamed: 0,age_group,total,survivors
0,Known Age,714,290
1,Unknown Age,177,52


#Survival Rate by Gender with IS NOT NULL filter

In [165]:
pd.read_sql("""
SELECT sex, COUNT(*) AS total, SUM(survived) AS survived,
       ROUND(AVG(survived)*100, 1) AS survival_rate
FROM titanic
WHERE age IS NOT NULL
GROUP BY sex
""", conn)


Unnamed: 0,sex,total,survived,survival_rate
0,female,261,197,75.5
1,male,453,93,20.5


#Nested NOT + Range Filtering
Male passengers not between age 15 and 50

In [166]:
pd.read_sql("""
SELECT age, sex, class
FROM titanic
WHERE sex = 'male' AND NOT (age BETWEEN 15 AND 50)
""", conn)


Unnamed: 0,age,sex,class
0,54.00,male,First
1,2.00,male,Third
2,2.00,male,Third
3,66.00,male,Second
4,7.00,male,Third
...,...,...,...
81,0.83,male,Second
82,4.00,male,Third
83,74.00,male,Third
84,51.00,male,First
