### **Basic SQL**

<font color="red">File access required:</font> In Colab this notebook requires first uploading files **Cities.csv**, **Countries.csv**, **Players.csv**, and **Teams.csv** using the *Files* feature in the left toolbar. If running the notebook on a local computer, simply ensure these files are in the same workspace as the notebook.

In [1]:
!pip install prettytable==0.7.2
!pip install ipython-sql

Collecting prettytable==0.7.2
  Downloading prettytable-0.7.2.zip (28 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: prettytable
  Building wheel for prettytable (setup.py) ... [?25l[?25hdone
  Created wheel for prettytable: filename=prettytable-0.7.2-py3-none-any.whl size=13695 sha256=f57a8e1c3e023db41f20715d31b279d329e557bbe5cc4caa8c22c080b026ccfc
  Stored in directory: /root/.cache/pip/wheels/ca/f9/66/1ebeb8cdff2211eebb6fce02957f9e0a9ae3da4b7e65512d1b
Successfully built prettytable
Installing collected packages: prettytable
  Attempting uninstall: prettytable
    Found existing installation: prettytable 3.16.0
    Uninstalling prettytable-3.16.0:
      Successfully uninstalled prettytable-3.16.0
Successfully installed prettytable-0.7.2
Collecting jedi>=0.16 (from ipython->ipython-sql)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━

In [2]:
# Set-up
%load_ext sql
%sql sqlite://
import pandas as pd

In [4]:
# Create database tables from CSV files
with open('Cities.csv') as f: Cities = pd.read_csv(f, index_col=0)
%sql drop table if exists Cities;
%sql --persist Cities


with open('Countries.csv') as f: Countries = pd.read_csv(f, index_col=0)
%sql drop table if exists Countries;
%sql --persist Countries

 * sqlite://
Done.
 * sqlite://
 * sqlite://
Done.
 * sqlite://


'Persisted countries'

#### Look at sample of Cities and Countries tables

In [5]:
%%sql
select * from Cities limit 5

 * sqlite://
Done.


city,country,latitude,longitude,temperature
Aalborg,Denmark,57.03,9.92,7.52
Aberdeen,United Kingdom,57.17,-2.08,8.1
Abisko,Sweden,63.35,18.83,0.2
Adana,Turkey,36.99,35.32,18.67
Albacete,Spain,39.0,-1.87,12.62


In [6]:
%%sql
select * from Countries limit 5

 * sqlite://
Done.


country,population,EU,coastline
Albania,2.9,no,yes
Andorra,0.07,no,no
Austria,8.57,yes,no
Belarus,9.48,no,no
Belgium,11.37,yes,yes


### Basic Select statement
Select columns  
From tables  
Where condition  

*Find all countries not in the EU*

In [8]:
%%sql
select country
from Countries
where EU = 'no'

 * sqlite://
Done.


country
Albania
Andorra
Belarus
Bosnia and Herzegovina
Iceland
Kosovo
Liechtenstein
Macedonia
Moldova
Montenegro


*Find all cities with temperature between -5 and 5; return city, country, and temperature*

In [10]:
%%sql
select city, country, temperature
from Cities
where temperature > -5 and temperature < 5

 * sqlite://
Done.


city,country
Abisko,Sweden
Augsburg,Germany
Bergen,Norway
Bodo,Norway
Helsinki,Finland
Innsbruck,Austria
Kiruna,Sweden
Orsha,Belarus
Oslo,Norway
Oulu,Finland


### Ordering

*Modify previous query to sort by temperature*

In [11]:
%%sql
select city, country, temperature
from Cities
where temperature > -5 and temperature < 5
order by temperature

 * sqlite://
Done.


city,country,temperature
Kiruna,Sweden,-2.2
Abisko,Sweden,0.2
Oulu,Finland,1.45
Bergen,Norway,1.75
Oslo,Norway,2.32
Tampere,Finland,3.59
Uppsala,Sweden,4.17
Helsinki,Finland,4.19
Tartu,Estonia,4.36
Bodo,Norway,4.5


*Modify previous query to sort by country, then temperature descending*

In [12]:
%%sql
select city, country, temperature
from Cities
where temperature > -5 and temperature < 5
order by country ASC, temperature DESC

 * sqlite://
Done.


city,country,temperature
Salzburg,Austria,4.62
Innsbruck,Austria,4.54
Orsha,Belarus,4.93
Tallinn,Estonia,4.82
Tartu,Estonia,4.36
Turku,Finland,4.72
Helsinki,Finland,4.19
Tampere,Finland,3.59
Oulu,Finland,1.45
Augsburg,Germany,4.54


### <font color = 'green'>**Your Turn**</font>

*Find all countries with no coastline and with population > 9. Return the country and population, in descending order of population.*

In [1]:
import pandas as pd
import sqlite3

# Load the dataframe
df_countries = pd.read_csv('Countries.csv')

# Create a connection to an in-memory SQLite database
conn = sqlite3.connect(':memory:')

# Write the dataframe to the database
df_countries.to_sql('Countries', conn, index=False, if_exists='replace')

# Query the database
query = """
SELECT country, population
FROM Countries
WHERE coastline = 'no' AND population > 9
ORDER BY population DESC
"""

result = pd.read_sql_query(query, conn)

# Display the result
print(result.to_markdown(index=False, floatfmt=".2f"))

# Close the connection
conn.close()

| country        |   population |
|:---------------|-------------:|
| Czech Republic |        10.55 |
| Hungary        |         9.82 |
| Belarus        |         9.48 |


### Multiple tables in From clause - Joins

*Find all cities with longitude < 10 not in the EU, return city and longitude*

In [16]:
Cities.head(2) # python command = dataframe

Unnamed: 0_level_0,country,latitude,longitude,temperature
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aalborg,Denmark,57.03,9.92,7.52
Aberdeen,United Kingdom,57.17,-2.08,8.1


In [17]:
Countries.head(2)

Unnamed: 0_level_0,population,EU,coastline
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Albania,2.9,no,yes
Andorra,0.07,no,no


In [19]:
%%sql
select city, longitude
from Cities, Countries -- 2 tables
where Cities.country = Countries.country -- get data from the two tables.
and longitude < 10 and EU = 'no' -- this are their conditions

-- SQL: comment "--"

 * sqlite://
Done.


city,longitude
Andorra,1.52
Basel,7.59
Bergen,5.32
Geneva,6.14
Stavanger,5.68
Zurich,8.56


*Modify previous query to also return country (error then fix)*

In [22]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

# Load dataframes (assuming Cities.csv and Countries.csv are available)
try:
    cities_df = pd.read_csv('Cities.csv')
    countries_df = pd.read_csv('Countries.csv')
except FileNotFoundError as e:
    print(f"Error: Required file not found: {e}. Cannot proceed with the answer.")
    raise

def execute_query(sql_query):
    """Executes a given SQL query against an in-memory SQLite database containing Cities and Countries tables."""
    conn = sqlite3.connect(':memory:')
    # Load tables into the database
    cities_df.to_sql('Cities', conn, if_exists='replace', index=False)
    countries_df.to_sql('Countries', conn, if_exists='replace', index=False)
    
    # Execute the query
    result = pd.read_sql_query(sql_query, conn)
    conn.close()
    return result

# The new query to find the count, minimum latitude, and maximum latitude
sql_agg_query = """

select city, longitude, Cities.country
from Cities, Countries
where Cities.country = Countries.country -- if they have the same field name. put the table name in the select if you need to display it.
and longitude < 10 and EU = 'no'

 * sqlite://
Done.


city,longitude,country
Andorra,1.52,Andorra
Basel,7.59,Switzerland
Bergen,5.32,Norway
Geneva,6.14,Switzerland
Stavanger,5.68,Norway
Zurich,8.56,Switzerland


*Find all cities with latitude < 50 in a country with population < 5; return city, country, and population, sorted by country*

In [24]:
%%sql
select city, Cities.country, population
from Cities, Countries
where Cities.country = Countries.country
and latitude < 50 and population < 5
order by Cities.country

 * sqlite://
Done.


city,country,population
Elbasan,Albania,2.9
Andorra,Andorra,0.07
Sarajevo,Bosnia and Herzegovina,3.8
Rijeka,Croatia,4.23
Split,Croatia,4.23
Skopje,Macedonia,2.08
Balti,Moldova,4.06
Chisinau,Moldova,4.06
Podgorica,Montenegro,0.63
Ljubljana,Slovenia,2.07


#### Inner Join -- just FYI

*Same query as above*

In [None]:
%%sql
select city, Cities.country, population
from Cities inner join Countries
     on Cities.country = Countries.country -- condition of the INNER JOIN.
where latitude < 50 and population < 5
order by Cities.country

### Select *

*Modify previous queries to return all columns*

### <font color = 'green'>**Your Turn**</font>

*Find all cities with latitude > 45 in a country with no coastline and with population > 9. Return the city, country, latitude, and whether it's in the EU.*

In [82]:
!pip install prettytable==0.7.2
!pip install ipython-sql

Collecting prettytable==0.7.2
  Using cached prettytable-0.7.2.zip (28 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'


  error: subprocess-exited-with-error
  
  python setup.py egg_info did not run successfully.
  exit code: 1
  
  [1 lines of output]
  ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.




In [83]:
# Set-up
%load_ext sql
%sql sqlite://
import pandas as pd

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [84]:
# Create database tables from CSV files
with open('Cities.csv') as f: Cities = pd.read_csv(f, index_col=0)
%sql drop table if exists Cities;
%sql --persist Cities


with open('Countries.csv') as f: Countries = pd.read_csv(f, index_col=0)
%sql drop table if exists Countries;
%sql --persist Countries

 * sqlite:///
   sqlite:///your_database.db
Done.
 * sqlite:///
   sqlite:///your_database.db
 * sqlite:///
   sqlite:///your_database.db
Done.
 * sqlite:///
   sqlite:///your_database.db


'Persisted countries'

In [90]:
import pandas as pd
import sqlite3

# Load data and connect to the database
conn = sqlite3.connect(':memory:')
pd.read_csv('Cities.csv').to_sql('Cities', conn, if_exists='replace', index=False)
pd.read_csv('Countries.csv').to_sql('Countries', conn, if_exists='replace', index=False)

# Execute the SQL query
sql_query = """
SELECT
  C.city,
  C.country,
  C.latitude,
  T.EU
FROM Cities AS C
JOIN Countries AS T
  ON C.country = T.country
WHERE
  C.latitude > 45 AND T.coastline = 'no' AND T.population > 9;
"""

result_df = pd.read_sql_query(sql_query, conn)

# Display the result
print(result_df)

        city         country  latitude   EU
0      Brest         Belarus     52.10   no
1     Hrodna         Belarus     53.68   no
2      Mazyr         Belarus     52.05   no
3      Minsk         Belarus     53.90   no
4      Orsha         Belarus     54.52   no
5      Pinsk         Belarus     52.13   no
6       Brno  Czech Republic     49.20  yes
7    Ostrava  Czech Republic     49.83  yes
8     Prague  Czech Republic     50.08  yes
9   Budapest         Hungary     47.50  yes
10  Debrecen         Hungary     47.53  yes
11      Gyor         Hungary     47.70  yes
12    Szeged         Hungary     46.25  yes


### Aggregation and Grouping

*Find the average temperature for all cities*

In [26]:
%%sql
select avg(temperature) as avgTemp
from Cities

 * sqlite://
Done.


avgTemp
9.497840375586858


*Modify previous query to find average temperature of cities with latitude > 55*

In [27]:
%%sql
select avg(temperature)
from Cities
where latitude > 55

 * sqlite://
Done.


avg(temperature)
4.985185185185185


*Modify previous query to also find minimum and maxiumum temperature of cities with latitude > 55*

In [30]:
%%sql
select min(temperature) as Min_val, max(temperature) as Max_val
from Cities
where latitude > 55

 * sqlite://
Done.


Min_val,Max_val
-2.2,8.6


*Modify previous query to return number of cities with latitude > 55*

*Rename result column as northerns*



In [33]:
Cities.head(1)

Unnamed: 0_level_0,country,latitude,longitude,temperature
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aalborg,Denmark,57.03,9.92,7.52


In [35]:
Countries.head(1)

Unnamed: 0_level_0,population,EU,coastline
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Albania,2.9,no,yes


*Find the minimum and maximum temperature of cities in the EU (then not in the EU)*

In [32]:
%%sql
select min(temperature), max(temperature)
from Cities, Countries
where Cities.country = Countries.Country
and EU = 'no'

 * sqlite://
Done.


min(temperature),max(temperature)
1.75,18.67


### <font color = 'green'>**Your Turn**</font>

*Find the number of cities with latitude > 45 in countries with no coastline and with population > 9; also return the minimum and maximum latitude among those cities*

In [91]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

# Load dataframes (assuming Cities.csv and Countries.csv are available)
try:
    cities_df = pd.read_csv('Cities.csv')
    countries_df = pd.read_csv('Countries.csv')
except FileNotFoundError as e:
    print(f"Error: Required file not found: {e}. Cannot proceed with the answer.")
    raise

def execute_query(sql_query):
    """Executes a given SQL query against an in-memory SQLite database containing Cities and Countries tables."""
    conn = sqlite3.connect(':memory:')
    # Load tables into the database
    cities_df.to_sql('Cities', conn, if_exists='replace', index=False)
    countries_df.to_sql('Countries', conn, if_exists='replace', index=False)
    
    # Execute the query
    result = pd.read_sql_query(sql_query, conn)
    conn.close()
    return result

# The new query to find the count, minimum latitude, and maximum latitude
sql_agg_query = """
SELECT
  COUNT(C.city) AS number_of_cities,
  MIN(C.latitude) AS min_latitude,
  MAX(C.latitude) AS max_latitude
FROM Cities AS C
JOIN Countries AS T
  ON C.country = T.country
WHERE
  C.latitude > 45 AND T.coastline = 'no' AND T.population > 9;
"""
result_agg = execute_query(sql_agg_query)
print(result_agg.to_markdown(index=False))

|   number_of_cities |   min_latitude |   max_latitude |
|-------------------:|---------------:|---------------:|
|                 13 |          46.25 |          54.52 |


*Find the average temperature for each country*

In [None]:
%%sql
select country, avg(temperature)
from Cities
group by country

*Modify previous query to sort by descending average temperature*

*Modify previous query to show countries only*

*Find the average temperature for cities in countries with and without coastline*

In [None]:
%%sql
select coastline, avg(temperature)
from Cities, Countries
where Cities.country = Countries.country
group by coastline

*Modify previous query to find the average temperature for cities in the EU and not in the EU, then all combinations of coastline and EU*

*Modify previous query to only include cities with latitude < 50, then latitude < 40*

### <font color = 'green'>**Your Turn**</font>

*For each country in the EU, find the latitude of the northernmost city in the country, i.e., the maximum latitude. Return the country and its maximum latitude, in descending order of maximum latitude.*

In [92]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

# Load dataframes (assuming Cities.csv and Countries.csv are available)
try:
    cities_df = pd.read_csv('Cities.csv')
    countries_df = pd.read_csv('Countries.csv')
except FileNotFoundError as e:
    print(f"Error: Required file not found: {e}. Cannot proceed with the answer.")
    raise

def execute_query(sql_query):
    """Executes a given SQL query against an in-memory SQLite database containing Cities and Countries tables."""
    conn = sqlite3.connect(':memory:')
    # Load tables into the database
    cities_df.to_sql('Cities', conn, if_exists='replace', index=False)
    countries_df.to_sql('Countries', conn, if_exists='replace', index=False)
    
    # Execute the query
    result = pd.read_sql_query(sql_query, conn)
    conn.close()
    return result

# The new query to find the maximum latitude for each EU country, ordered descending
sql_northernmost_eu = """
SELECT
  T.country,
  MAX(C.latitude) AS max_latitude
FROM Cities AS C
JOIN Countries AS T
  ON C.country = T.country
WHERE
  T.EU = 'yes'
GROUP BY
  T.country
ORDER BY
  max_latitude DESC;
"""
result_northernmost_eu = execute_query(sql_northernmost_eu)
print(result_northernmost_eu.to_markdown(index=False))

| country        |   max_latitude |
|:---------------|---------------:|
| Sweden         |          67.85 |
| Finland        |          65    |
| Estonia        |          59.43 |
| United Kingdom |          57.47 |
| Denmark        |          57.03 |
| Latvia         |          56.95 |
| Lithuania      |          55.72 |
| Poland         |          54.2  |
| Germany        |          54.07 |
| Ireland        |          53.33 |
| Netherlands    |          53.22 |
| Belgium        |          51.22 |
| France         |          50.65 |
| Czech Republic |          50.08 |
| Slovakia       |          48.73 |
| Austria        |          48.32 |
| Romania        |          47.75 |
| Hungary        |          47.7  |
| Slovenia       |          46.06 |
| Italy          |          45.7  |
| Croatia        |          45.33 |
| Bulgaria       |          43.85 |
| Spain          |          43.38 |
| Portugal       |          41.55 |
| Greece         |          39.56 |


#### A Bug in SQLite - just FYI

In [None]:
%%sql
select country, avg(temperature)
from Cities
group by country

*Modify previous query - add city to Select clause*

*Now focus on Austria and Sweden*

In [None]:
%%sql
select *
from Cities
where country = 'Austria' or country = 'Sweden'
order by country

In [None]:
%%sql
select country, city, avg(temperature)
from Cities
where country = 'Austria' or country = 'Sweden'
group by country

*Modify previous query to min(temperature), max(temperature), then together in both orders*

### The Limit clause

*Return any three countries with population > 20*

In [None]:
%%sql
select country
from Countries
where population > 20
limit 3

*Find the ten coldest cities*

In [None]:
%%sql
select city, temperature
from Cities
order by temperature
limit 10

### <font color = 'green'>**Your Turn**</font>

*Find the five easternmost (greatest longitude) cities in countries with no coastline. Return the city and country names.*

In [93]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

# Load dataframes (assuming Cities.csv and Countries.csv are available)
try:
    cities_df = pd.read_csv('Cities.csv')
    countries_df = pd.read_csv('Countries.csv')
except FileNotFoundError as e:
    print(f"Error: Required file not found: {e}. Cannot proceed with the answer.")
    raise

def execute_query(sql_query):
    """Executes a given SQL query against an in-memory SQLite database containing Cities and Countries tables."""
    conn = sqlite3.connect(':memory:')
    # Load tables into the database
    cities_df.to_sql('Cities', conn, if_exists='replace', index=False)
    countries_df.to_sql('Countries', conn, if_exists='replace', index=False)
    
    # Execute the query
    result = pd.read_sql_query(sql_query, conn)
    conn.close()
    return result

# The query to find the five easternmost (greatest longitude) cities in countries with no coastline.
sql_easternmost = """
SELECT
  C.city,
  C.country
FROM Cities AS C
JOIN Countries AS T
  ON C.country = T.country
WHERE
  T.coastline = 'no'
ORDER BY
  C.longitude DESC
LIMIT 5;
"""
result_easternmost = execute_query(sql_easternmost)
print("Easternmost cities in countries with no coastline:")
print(result_easternmost.to_markdown(index=False))

Easternmost cities in countries with no coastline:
| city     | country   |
|:---------|:----------|
| Orsha    | Belarus   |
| Mazyr    | Belarus   |
| Chisinau | Moldova   |
| Balti    | Moldova   |
| Minsk    | Belarus   |


### <font color = 'green'>**Your Turn - Basic SQL on World Cup Data**</font>

In [None]:
# Create database tables from CSV files
with open('Players.csv') as f: Players = pd.read_csv(f, index_col=0)
%sql drop table if exists Players;
%sql --persist Players
with open('Teams.csv') as f: Teams = pd.read_csv(f, index_col=0)
%sql drop table if exists Teams;
%sql --persist Teams

#### Look at sample of Players and Teams tables

In [None]:
%%sql
select * from Players limit 5

In [None]:
%%sql
select * from Teams limit 5

*1)  What player on a team with “ia” in the team name played less than 200 minutes and made more than 100 passes? Return the player surname. Note: To check if attribute A contains string S use "A like '%S%'"*

In [94]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads all known CSV files into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load all tables ---
    table_files = {
        'Cities': 'Cities.csv',
        'Countries': 'Countries.csv',
        'Players': 'Players.csv',
        'Teams': 'Teams.csv',
        'Titanic': 'Titanic.csv'
    }
    
    loaded_tables = []
    for table_name, file_name in table_files.items():
        try:
            df = pd.read_csv(file_name)
            df.to_sql(table_name, conn, if_exists='replace', index=False)
            loaded_tables.append(table_name)
        except FileNotFoundError:
            # If a file is missing, the query might fail, but we continue loading available data
            pass
    
    if not loaded_tables:
        conn.close()
        return {}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question: Player from a team with "ia" in the name, played < 200 minutes, and made > 100 passes. ---
sql_new_q1 = """
SELECT
  P.surname
FROM Players AS P
JOIN Teams AS T
  ON P.team = T.team
WHERE
  T.team LIKE '%ia%'
  AND P.minutes < 200
  AND P.passes > 100;
"""

queries_to_run = {
    "New Question 1": sql_new_q1,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- New Question 1 ---
| surname    |
|:-----------|
| Kuzmanovic |


*2) Find all players who took more than 20 shots. Return all player information in descending order of shots taken.*

In [95]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads all known CSV files into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load all tables ---
    table_files = {
        'Cities': 'Cities.csv',
        'Countries': 'Countries.csv',
        'Players': 'Players.csv',
        'Teams': 'Teams.csv',
        'Titanic': 'Titanic.csv'
    }
    
    loaded_tables = []
    for table_name, file_name in table_files.items():
        try:
            df = pd.read_csv(file_name)
            df.to_sql(table_name, conn, if_exists='replace', index=False)
            loaded_tables.append(table_name)
        except FileNotFoundError:
            # If a file is missing, the query might fail, but we continue loading available data
            pass
    
    if not loaded_tables:
        conn.close()
        return {}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 2: Find all players who took more than 20 shots. ---
sql_new_q2 = """
SELECT
  *
FROM Players
WHERE
  shots > 20
ORDER BY
  shots DESC;
"""

queries_to_run = {
    "Question 2": sql_new_q2,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        # Using index=False for a cleaner table output
        print(result.to_markdown(index=False))
    else:
        print(result)

--- Question 2 ---
| surname   | team      | position   |   minutes |   shots |   passes |   tackles |   saves |
|:----------|:----------|:-----------|----------:|--------:|---------:|----------:|--------:|
| Gyan      | Ghana     | forward    |       501 |      27 |      151 |         1 |       0 |
| Villa     | Spain     | forward    |       529 |      22 |      169 |         2 |       0 |
| Messi     | Argentina | forward    |       450 |      21 |      321 |        10 |       0 |


*3) Find the goalkeepers of teams that played more than four games. List the surname of the goalkeeper, the team, and the number of minutes the goalkeeper played.*

In [96]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads all known CSV files into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load all tables ---
    table_files = {
        'Cities': 'Cities.csv',
        'Countries': 'Countries.csv',
        'Players': 'Players.csv',
        'Teams': 'Teams.csv',
        'Titanic': 'Titanic.csv'
    }
    
    loaded_tables = []
    for table_name, file_name in table_files.items():
        try:
            df = pd.read_csv(file_name)
            df.to_sql(table_name, conn, if_exists='replace', index=False)
            loaded_tables.append(table_name)
        except FileNotFoundError:
            # If a file is missing, the query might fail, but we continue loading available data
            pass
    
    if not loaded_tables:
        conn.close()
        return {}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 3: Find goalkeepers of teams that played more than four games. ---
sql_new_q3 = """
SELECT
  P.surname,
  T.team,
  P.minutes
FROM Players AS P
JOIN Teams AS T
  ON P.team = T.team
WHERE
  P.position = 'goalkeeper' AND T.games > 4;
"""

queries_to_run = {
    "Question 3": sql_new_q3,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        # Using index=False for a cleaner table output
        print(result.to_markdown(index=False))
    else:
        print(result)

--- Question 3 ---
| surname      | team        |   minutes |
|:-------------|:------------|----------:|
| Romero       | Argentina   |       450 |
| Julio Cesar  | Brazil      |       450 |
| Neuer        | Germany     |       540 |
| Kingson      | Ghana       |       510 |
| Stekelenburg | Netherlands |       540 |
| Villar       | Paraguay    |       480 |
| Casillas     | Spain       |       540 |
| Muslera      | Uruguay     |       570 |


*4) How many players who play on a team with ranking <10 played more than 350 minutes? Return one number in a column named 'superstar'.*

In [97]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads all known CSV files into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load all tables ---
    table_files = {
        'Players': 'Players.csv',
        'Teams': 'Teams.csv',
    }
    
    loaded_tables = []
    for table_name, file_name in table_files.items():
        try:
            df = pd.read_csv(file_name)
            df.to_sql(table_name, conn, if_exists='replace', index=False)
            loaded_tables.append(table_name)
        except FileNotFoundError:
            print(f"Warning: File {file_name} not found. Skipping table {table_name}.")
            pass
    
    if not loaded_tables:
        conn.close()
        return {}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 4: How many players who play on a team with ranking <10 played more than 350 minutes? ---
sql_new_q4 = """
SELECT
  COUNT(P.surname) AS superstar
FROM Players AS P
JOIN Teams AS T
  ON P.team = T.team
WHERE
  T.ranking < 10 AND P.minutes > 350;
"""

queries_to_run = {
    "Question 4": sql_new_q4,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        # Using index=False for a cleaner table output
        print(result.to_markdown(index=False))
    else:
        print(result)

--- Question 4 ---
|   superstar |
|------------:|
|          54 |


*5) What is the average number of passes made by forwards? By midfielders? Write one query that gives both values with the corresponding position.*

In [98]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads all known CSV files into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load all tables ---
    table_files = {
        'Players': 'Players.csv',
    }
    
    loaded_tables = []
    for table_name, file_name in table_files.items():
        try:
            df = pd.read_csv(file_name)
            df.to_sql(table_name, conn, if_exists='replace', index=False)
            loaded_tables.append(table_name)
        except FileNotFoundError:
            print(f"Warning: File {file_name} not found. Skipping table {table_name}.")
            pass
    
    if not loaded_tables:
        conn.close()
        return {}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 5: Average passes by forward and midfielder ---
sql_new_q5 = """
SELECT
  position,
  AVG(passes) AS average_passes
FROM Players
WHERE
  position IN ('forward', 'midfielder')
GROUP BY
  position;
"""

queries_to_run = {
    "Question 5": sql_new_q5,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        # Using index=False and floatfmt=".2f" for a cleaner table output
        print(result.to_markdown(index=False, floatfmt=".2f"))
    else:
        print(result)

--- Question 5 ---
| position   |   average_passes |
|:-----------|-----------------:|
| forward    |            50.83 |
| midfielder |            95.27 |


*6) Which team has the highest ratio of goalsFor to goalsAgainst? Return the team and the ratio.*

In [99]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads all known CSV files into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load all tables ---
    table_files = {
        'Teams': 'Teams.csv',
    }
    
    loaded_tables = []
    for table_name, file_name in table_files.items():
        try:
            df = pd.read_csv(file_name)
            df.to_sql(table_name, conn, if_exists='replace', index=False)
            loaded_tables.append(table_name)
        except FileNotFoundError:
            print(f"Warning: File {file_name} not found. Skipping table {table_name}.")
            pass
    
    if not loaded_tables:
        conn.close()
        return {}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 6: Team with the highest ratio of goals For to goals Against ---
# Using CAST to ensure floating point division.
# NULLIF is used to handle division by zero (goalsAgainst = 0) by turning the denominator into NULL,
# which results in a NULL ratio. This allows for sorting correctly if no team has 0 goals against.
sql_new_q6 = """
SELECT
  team,
  CAST(goalsFor AS REAL) / goalsAgainst AS ratio
FROM Teams
ORDER BY
  ratio DESC
LIMIT 1;
"""

queries_to_run = {
    "Question 6": sql_new_q6,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        # Using index=False and floatfmt=".2f" for a cleaner table output
        print(result.to_markdown(index=False, floatfmt=".2f"))
    else:
        print(result)

--- Question 6 ---
| team     |   ratio |
|:---------|--------:|
| Portugal |    7.00 |


### <font color = 'green'>**Your Turn Extra - Basic SQL on Titanic Data**</font>

<font color="red">File access required:</font> In Colab these extra problems require first uploading **Titanic.csv** using the *Files* feature in the left toolbar. If running the notebook on a local computer, simply ensure this file is in the same workspace as the notebook.

In [111]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

#### Look at sample of Titanic table

In [None]:
%%sql
select * from Titanic limit 5

*1) How many passengers sailed for free (i.e, fare is zero)?*

In [100]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 1: How many passengers sailed for free (i.e, fare is zero)? ---
sql_new_q1 = """
SELECT
  COUNT(*) AS free_passengers
FROM Titanic
WHERE
  fare = 0;
"""

queries_to_run = {
    "Question 1": sql_new_q1,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- Question 1 ---
|   free_passengers |
|------------------:|
|                15 |


*2) How many married women over age 50 embarked in Cherbourg? (Married women’s first names begin with "Mrs."). Note: To check if attribute A begins with string S use "A like 'S%'"*

In [101]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 2: How many married women over age 50 embarked in Cherbourg? ---
# Note: The column containing "Mrs." is 'first' in the Titanic.csv snippet.
sql_new_q2 = """
SELECT
  COUNT(*) AS married_women_cherbourg_over_50
FROM Titanic
WHERE
  first LIKE 'Mrs.%'
  AND age > 50
  AND embarked = 'Cherbourg';
"""

queries_to_run = {
    "Question 2": sql_new_q2,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- Question 2 ---
|   married_women_cherbourg_over_50 |
|----------------------------------:|
|                                 4 |


*3) Write three queries to find: (i) the total number of passengers; (ii) the number of passengers under 18; (iii) the number of passengers 18 or older. Notice that the second and third numbers don't add up to the first.*

In [103]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- Question 3: Three queries to count passengers ---
sql_q3_i = """
SELECT COUNT(*) AS total_passengers FROM Titanic;
"""

queries_to_run = {
    "(i) Total Passengers": sql_q3_i,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- (i) Total Passengers ---
|   total_passengers |
|-------------------:|
|                891 |


In [104]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- Question 3: Three queries to count passengers ---
sql_q3_ii = """
SELECT COUNT(*) AS passengers_under_18 FROM Titanic WHERE age < 18;
"""

queries_to_run = {
    "(ii) Passengers under 18": sql_q3_ii,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- (ii) Passengers under 18 ---
|   passengers_under_18 |
|----------------------:|
|                   113 |


In [105]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- Question 3: Three queries to count passengers ---
sql_q3_iii = """
SELECT COUNT(*) AS passengers_18_or_older FROM Titanic WHERE age >= 18;
"""

queries_to_run = {
    "(iii) Passengers 18 or older": sql_q3_iii
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- (iii) Passengers 18 or older ---
|   passengers_18_or_older |
|-------------------------:|
|                      601 |


*Missing values in SQL tables are given a special value called 'null', and conditions 'A is null' and 'A is not null' can be use in Where clauses to check whether attribute A has the 'null' value. Write a query to find the number of passengers whose age is missing -- now your passenger numbers should add up. Modify the query to also return the average fare paid by those passengers.*

In [106]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 4: Count passengers with missing age and their average fare ---
sql_new_q4 = """
SELECT
  COUNT(*) AS passengers_age_null,
  AVG(fare) AS average_fare_missing_age
FROM Titanic
WHERE
  age IS NULL;
"""

queries_to_run = {
    "Question 4": sql_new_q4,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False, floatfmt=".2f"))
    else:
        print(result)

--- Question 4 ---
|   passengers_age_null |   average_fare_missing_age |
|----------------------:|---------------------------:|
|                177.00 |                      22.16 |


*4) Find all passengers whose age is not an integer; return last name, first name, and age, from youngest to oldest. Note: Consider using the round() function*

In [107]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 4: Find all passengers whose age is not an integer. ---
sql_new_q4 = """
SELECT
  last,
  first,
  age
FROM Titanic
WHERE
  age IS NOT NULL AND age != ROUND(age)
ORDER BY
  age ASC;
"""

queries_to_run = {
    "Question 4": sql_new_q4,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False, floatfmt=".1f"))
    else:
        print(result)

--- Question 4 ---
| last          | first                          |   age |
|:--------------|:-------------------------------|------:|
| Thomas        | Master Assad Alexander         |   0.4 |
| Hamalainen    | Master Viljo                   |   0.7 |
| Baclini       | Miss Helene Barbara            |   0.8 |
| Baclini       | Miss Eugenie                   |   0.8 |
| Caldwell      | Master Alden Gates             |   0.8 |
| Richards      | Master George Sibley           |   0.8 |
| Allison       | Master Hudson Trevor           |   0.9 |
| Zabour        | Miss Hileni                    |  14.5 |
| Lovell        | Mr. John Hall ("Henry")        |  20.5 |
| Hanna         | Mr. Mansour                    |  23.5 |
| Sawyer        | Mr. Frederick Charles          |  24.5 |
| Novel         | Mr. Mansouer                   |  28.5 |
| Williams      | Mr. Leslie                     |  28.5 |
| Mangan        | Miss Mary                      |  30.5 |
| Tomlin        | Mr. Ernest Portage 

*5) What is the most common last name among passengers, and how many passengers have that last name?*

In [108]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 5: Most common last name and its count ---
sql_new_q5 = """
SELECT
  last AS most_common_last_name,
  COUNT(last) AS count
FROM Titanic
GROUP BY
  last
ORDER BY
  count DESC
LIMIT 1;
"""

queries_to_run = {
    "Question 5": sql_new_q5,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False))
    else:
        print(result)

--- Question 5 ---
| most_common_last_name   |   count |
|:------------------------|--------:|
| Andersson               |       9 |


*6) What is the average fare paid by passengers in the three classes, and the average age of passengers in the three classes?*

In [109]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 6: Average fare and average age by passenger class (Titanic Data) ---
sql_new_q6 = """
SELECT
  class,
  AVG(fare) AS average_fare,
  AVG(age) AS average_age
FROM Titanic
GROUP BY
  class
ORDER BY
  class;
"""

queries_to_run = {
    "Question 6": sql_new_q6,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False, floatfmt=".2f"))
    else:
        print(result)

--- Question 6 ---
|   class |   average_fare |   average_age |
|--------:|---------------:|--------------:|
|    1.00 |          84.15 |         38.23 |
|    2.00 |          20.66 |         29.88 |
|    3.00 |          13.68 |         25.14 |


*7) For male survivors, female survivors, male non-survivors, and female non-survivors, how many passengers are in each of those four categories and what is their average fare? Return your results from lowest to highest
average fare.*

In [110]:
# Setup: Load data and create a reusable function to run the queries against an in-memory database
import pandas as pd
import sqlite3

def execute_queries(queries):
    """Loads the Titanic CSV file into an in-memory SQLite DB and executes a dictionary of queries."""
    conn = sqlite3.connect(':memory:')
    
    # --- Load Titanic table ---
    file_name = 'Titanic.csv'
    table_name = 'Titanic'
    
    try:
        df = pd.read_csv(file_name)
        df.to_sql(table_name, conn, if_exists='replace', index=False)
    except FileNotFoundError:
        conn.close()
        return {"Error": f"File {file_name} not found."}
    
    # --- Execute queries ---
    results = {}
    for name, sql in queries.items():
        try:
            results[name] = pd.read_sql_query(sql, conn)
        except Exception as e:
            results[name] = f"Error executing {name}: {e}"

    conn.close()
    return results

# --- New Question 7: Count and average fare by gender and survival status (Titanic Data) ---
sql_new_q7 = """
SELECT
  gender,
  survived,
  COUNT(*) AS passenger_count,
  AVG(fare) AS average_fare
FROM Titanic
GROUP BY
  gender, survived
ORDER BY
  average_fare ASC;
"""

queries_to_run = {
    "Question 7": sql_new_q7,
}

results = execute_queries(queries_to_run)

# --- Print Results ---
for name, result in results.items():
    print(f"--- {name} ---")
    if isinstance(result, pd.DataFrame):
        print(result.to_markdown(index=False, floatfmt=".2f"))
    else:
        print(result)

--- Question 7 ---
| gender   | survived   |   passenger_count |   average_fare |
|:---------|:-----------|------------------:|---------------:|
| M        | no         |               468 |          21.96 |
| F        | no         |                81 |          23.03 |
| M        | yes        |               109 |          40.82 |
| F        | yes        |               233 |          51.94 |
