# COMS W4111 -- Introduction to Databases, Spring 2025, Lecture 1 Examples

# Notebook Setup

In [1]:
%load_ext sql

In [10]:
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

In [3]:
db_url = "mysql+pymysql://root:dbuserdbuser@localhost"

In [4]:
%sql $db_url

## Generative AI

```"Using Sean Lahman's baseball database, write a MySQL query to calculate the career statistics for players, including first and last names, total games (G), at-bats (AB), hits (H), batting average (BA), on-base percentage (OBP) calculated using only hits and walks, slugging percentage (SLG), home runs (HR), and RBIs. Limit the results to players with at least 10,000 career at-bats, sort by batting average in descending order, and return only the top 15 players."```

In [17]:
%%sql

use lahmansbaseballdb;

SELECT
    p.nameFirst AS FirstName,
    p.nameLast AS LastName,
    SUM(a.G_all) AS Games,
    SUM(b.AB) AS AtBats,
    SUM(b.H) AS Hits,
    ROUND(SUM(b.H) * 1.0 / NULLIF(SUM(b.AB), 0), 3) AS BattingAverage,
    ROUND(
        (SUM(b.H) + SUM(b.BB)) * 1.0 /
        NULLIF(SUM(b.AB) + SUM(b.BB), 0), 3
    ) AS OnBasePercentage,
    ROUND(
        (SUM(b.H) + SUM(b.`2B`) + 2 * SUM(b.`3B`) + 3 * SUM(b.HR)) * 1.0 /
        NULLIF(SUM(b.AB), 0), 3
    ) AS SluggingPercentage,
    SUM(b.HR) AS HomeRuns,
    SUM(b.RBI) AS RBIs
FROM
    People p
JOIN
    Batting b ON p.playerID = b.playerID
JOIN
    Appearances a ON p.playerID = a.playerID
GROUP BY
    p.playerID, p.nameFirst, p.nameLast
HAVING
    SUM(b.AB) >= 10000 -- Limit to players with at least 10,000 at-bats
ORDER BY
    BattingAverage DESC
LIMIT 15; -- Limit the result to the top 15 players




 * mysql+pymysql://root:***@localhost
0 rows affected.
15 rows affected.


FirstName,LastName,Games,AtBats,Hits,BattingAverage,OnBasePercentage,SluggingPercentage,HomeRuns,RBIs
Ty,Cobb,72816,274464,100536,0.366,0.429,0.512,2808,46656
Ross,Barnes,4491,21519,7740,0.36,0.389,0.468,54,3114
Rogers,Hornsby,54216,196152,70320,0.358,0.431,0.577,7224,38016
Shoeless Joe,Jackson,18648,69734,24808,0.356,0.417,0.517,756,10990
Levi,Meyerle,2456,11544,4104,0.356,0.36,0.479,80,2224
Lefty,O'Doul,11640,39168,13680,0.349,0.41,0.532,1356,6504
Cal,McVey,4770,22617,7821,0.346,0.354,0.447,99,4041
Ed,Delahanty,29392,120160,41552,0.346,0.405,0.505,1616,23456
Tris,Speaker,61424,224290,77308,0.345,0.423,0.5,2574,33638
Ted,Williams,43548,146414,50426,0.344,0.481,0.634,9899,34941


```"Using MySQL and Sean Lahman's baseball database (lahmansbaseballdb), create a new schema called lahmansfast. Copy the People, Batting, and Appearances tables into the new schema using CREATE TABLE ... AS SELECT. After copying, add appropriate primary keys: playerID for the People table, a composite primary key of (playerID, yearID, stint) for the Batting table, and a composite primary key of (playerID, yearID, teamID) for the Appearances table. Additionally, add indexes to optimize queries on Batting.AB and Appearances.G_all. Drop the schema lahmansfast if it already exists before creating it."```

In [22]:
%%sql

-- Step 1: Drop the schema if it already exists
DROP SCHEMA IF EXISTS lahmansfast;

-- Step 2: Create the new schema
CREATE SCHEMA lahmansfast;


 * mysql+pymysql://root:***@localhost
1 rows affected.
1 rows affected.


[]

In [23]:
%%sql

-- Step 3: Create the People table with data and add the primary key
CREATE TABLE lahmansfast.People AS
SELECT * FROM lahmansbaseballdb.People;
ALTER TABLE lahmansfast.People ADD PRIMARY KEY (playerID); -- Assuming playerID uniquely identifies each person

-- Step 4: Create the Batting table with data and add the primary key
CREATE TABLE lahmansfast.Batting AS
SELECT * FROM lahmansbaseballdb.Batting;
ALTER TABLE lahmansfast.Batting 
ADD PRIMARY KEY (playerID, yearID, stint); -- Composite key to uniquely identify each record
-- Add necessary indexes for performance
CREATE INDEX idx_batting_AB ON lahmansfast.Batting (AB);

-- Step 5: Create the Appearances table with data and add the primary key
CREATE TABLE lahmansfast.Appearances AS
SELECT * FROM lahmansbaseballdb.Appearances;
ALTER TABLE lahmansfast.Appearances 
ADD PRIMARY KEY (playerID, yearID, teamID); -- Composite key for uniqueness
-- Add necessary indexes for performance
CREATE INDEX idx_appearances_games ON lahmansfast.Appearances (G_all);



 * mysql+pymysql://root:***@localhost
19878 rows affected.
0 rows affected.
107429 rows affected.
0 rows affected.
0 rows affected.
107356 rows affected.
0 rows affected.
0 rows affected.


[]

I could continue with example after example.

ChatGPT is remarkably good at __all of the material__ in this course.

If I gave only take home exams and homework assignments, I would be testing ChatGPT instead of students. Many/most of you would not use ChatGPT or generative AI if I did not allow it. But some of you would, and that would put the honorable students at a disadvantage.

## Scenario

I am building an application to help me manage my classes, especially the cloud computing courses.

The primary complexities are:
- Assigning and managing project teams and students on teams.
- Tracking and commenting on project documents.
- Meeting management.
- ... ...

<img src="./course-management-application.jpg" width="700px">

This is primarily an interaction application using multiple databases.


There are some data engineering and analysis problems, however. For example, loading data from CourseWorks.


CourseWorks has an API. I can write code to call the API and extract the information.

## Some Setup