# 04 Performance Profiling

**Assignment 1: Query profiling and code profiling with quantified improvements**

- Query profiling (SQLite EXPLAIN QUERY PLAN)
- Code profiling (cProfile)
- Outputs in profiling/query_results.txt and profiling/code_profiling.txt

In [2]:
import sys
from pathlib import Path
project_root = Path.cwd() if (Path.cwd() / "src").exists() else Path.cwd().parent
sys.path.insert(0, str(project_root))

from sqlalchemy import text
from src.utils import get_engine

engine = get_engine(sample=True)

## 1. Query profiling (EXPLAIN QUERY PLAN)

In [3]:
queries = [
    ("Count reviews", "SELECT COUNT(*) FROM reviews"),
    ("Avg rating by hotel", "SELECT offering_id, AVG(rating_overall) FROM reviews GROUP BY offering_id"),
    ("Top 10 hotels", "SELECT offering_id, COUNT(*) AS n FROM reviews GROUP BY offering_id ORDER BY n DESC LIMIT 10"),
    ("Reviews with rating >= 4", "SELECT * FROM reviews WHERE rating_overall >= 4 LIMIT 1000"),
]
out = []
with engine.connect() as conn:
    for name, sql in queries:
        out.append(f"--- {name} ---")
        out.append(f"SQL: {sql}")
        for row in conn.execute(text(f"EXPLAIN QUERY PLAN {sql}")).fetchall():
            out.append(str(row))
        out.append("")
# Write profiling/query_results.txt (assignment: only 2 .txt files in profiling/)
(project_root / "profiling").mkdir(parents=True, exist_ok=True)
(project_root / "profiling" / "query_results.txt").write_text("\n".join(out), encoding="utf-8")
print("\n".join(out))
print("Wrote profiling/query_results.txt")

--- Count reviews ---
SQL: SELECT COUNT(*) FROM reviews
(4, 0, 0, 'SCAN TABLE reviews USING COVERING INDEX idx_reviews_rating_overall')

--- Avg rating by hotel ---
SQL: SELECT offering_id, AVG(rating_overall) FROM reviews GROUP BY offering_id
(7, 0, 0, 'SCAN TABLE reviews USING INDEX idx_reviews_offering')

--- Top 10 hotels ---
SQL: SELECT offering_id, COUNT(*) AS n FROM reviews GROUP BY offering_id ORDER BY n DESC LIMIT 10
(8, 0, 0, 'SCAN TABLE reviews USING COVERING INDEX idx_reviews_offering')
(42, 0, 0, 'USE TEMP B-TREE FOR ORDER BY')

--- Reviews with rating >= 4 ---
SQL: SELECT * FROM reviews WHERE rating_overall >= 4 LIMIT 1000
(4, 0, 0, 'SEARCH TABLE reviews USING INDEX idx_reviews_rating_overall (rating_overall>?)')

Wrote profiling/query_results.txt


## 2. Code profiling (cProfile)

The cell below runs cProfile and writes profiling/code_profiling.txt (assignment: profiling folder has only these two .txt files).

In [4]:
import cProfile
import pstats
import io
from src.benchmarking import get_reviews_df, comparable_groups_by_volume_and_rating

prof = cProfile.Profile()
prof.enable()
df = get_reviews_df(sample=True)
peers = comparable_groups_by_volume_and_rating(df)
prof.disable()
s = io.StringIO()
ps = pstats.Stats(prof, stream=s).strip_dirs().sort_stats("cumulative")
ps.print_stats(30)
code_profiling_txt = s.getvalue()
(project_root / "profiling" / "code_profiling.txt").write_text(code_profiling_txt, encoding="utf-8")
print(code_profiling_txt)
print("Wrote profiling/code_profiling.txt")

         18229 function calls (17956 primitive calls) in 0.046 seconds

   Ordered by: cumulative time
   List reduced from 1139 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.000    0.000    0.046    0.015 interactiveshell.py:3543(run_code)
      4/3    0.000    0.000    0.046    0.015 {built-in method builtins.exec}
        1    0.000    0.000    0.031    0.031 benchmarking.py:13(get_reviews_df)
        1    0.000    0.000    0.031    0.031 sql.py:572(read_sql)
        1    0.000    0.000    0.029    0.029 sql.py:1791(read_query)
        3    0.000    0.000    0.019    0.006 result.py:1331(fetchall)
        3    0.000    0.000    0.019    0.006 result.py:555(_allrows)
        3    0.000    0.000    0.017    0.006 cursor.py:2251(_fetchall_impl)
        3    0.000    0.000    0.017    0.006 cursor.py:1191(fetchall)
        3    0.017    0.006    0.017    0.006 {method 'fetchall' of 'sqlite3.Cursor' objects}
        