<a href="https://colab.research.google.com/github/sethkipsangmutuba/SQL/blob/main/2a_Data_Analysis_with_Numeric_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Titanic Dataset — Initial Data Analysis with Numeric Functions

In this notebook, we demonstrate how to extract basic metadata from the Titanic dataset using SQL numeric functions within a Colab notebook.

---

##  1. Setup in Google Colab


In [16]:
# Load required libraries
import pandas as pd
import sqlite3
import seaborn as sns

# Load Titanic dataset from seaborn
df = sns.load_dataset("titanic")

# Create SQLite in-memory database and load the DataFrame into it
conn = sqlite3.connect(":memory:")
df.to_sql("titanic", conn, index=False, if_exists="replace")


891

---

## Learning Objectives

By the end of this notebook, you should be able to:

- Perform basic SQL analysis using numeric functions (`COUNT`, `MIN`, `MAX`, `AVG`)
- Extract insights like total rows, range of numeric values, and unique counts
- Apply similar logic to other datasets in SQL

---

## Exercise: SQL Analysis on Titanic Dataset

**1. What is the total number of entries in the dataset?**


In [17]:
query = """
SELECT COUNT(*) AS number_of_observations
FROM titanic;
"""
pd.read_sql_query(query, conn)


Unnamed: 0,number_of_observations
0,891


**2. What is the minimum and maximum age in the dataset?**


In [18]:
query = """
SELECT
    MIN(age) AS min_age,
    MAX(age) AS max_age
FROM titanic;
"""
pd.read_sql_query(query, conn)


Unnamed: 0,min_age,max_age
0,0.42,80.0


**3. How many unique passenger classes (`pclass`) are there?**


In [19]:
query = """
SELECT COUNT(DISTINCT pclass) AS number_of_classes
FROM titanic;
"""
pd.read_sql_query(query, conn)


Unnamed: 0,number_of_classes
0,3


**4. What is the average fare paid by passengers?**


In [20]:
query = """
SELECT AVG(fare) AS average_fare
FROM titanic;
"""
pd.read_sql_query(query, conn)


Unnamed: 0,average_fare
0,32.204208


---

## Combine All Metrics in One Query

Write a single SQL query that returns the following metrics:

- Total number of entries
- Minimum age
- Maximum age
- Number of unique passenger classes (`pclass`)
- Average fare

Use SQL aggregation functions to compute all metrics in one result.


In [21]:
query = """
SELECT
    COUNT(*) AS number_of_observations,
    MIN(age) AS min_age,
    MAX(age) AS max_age,
    COUNT(DISTINCT pclass) AS number_of_classes,
    AVG(fare) AS average_fare
FROM titanic;
"""
pd.read_sql_query(query, conn)


Unnamed: 0,number_of_observations,min_age,max_age,number_of_classes,average_fare
0,891,0.42,80.0,3,32.204208
