In [1]:
import pyodbc
pyodbc.drivers()


['ODBC Driver 18 for SQL Server']

In [2]:
import pandas as pd
from sqlalchemy import create_engine, URL

USER = "sa"                 # <-- edit
PWD  = "YourStrong!Passw0rd"    # <-- edit
HOST = "localhost"          
PORT = 1433                 # default for SQL Server
DB   = "AdventureWorks" # <-- edit if different

url = URL.create(
    "mssql+pyodbc",
    username=USER,
    password=PWD,
    host=HOST,
    port=PORT,
    database=DB,
    query={
        "driver": "ODBC Driver 18 for SQL Server",
        "Encrypt": "yes",                    # or "no" if your setup needs it
        "TrustServerCertificate": "yes"      # often needed for local/dev
    },
)

engine = create_engine(
    url,
    pool_pre_ping=True,          # avoids stale connections
    fast_executemany=True
)

def run(sql: str) -> pd.DataFrame:
    with engine.connect() as conn:
        return pd.read_sql(sql, conn)

# smoke test (AdventureWorks table name is singular "Customer")
df = run("SELECT TOP (5) * FROM Sales.Customer;")
df.head()




Unnamed: 0,CustomerID,PersonID,StoreID,TerritoryID,AccountNumber,rowguid,ModifiedDate
0,1,,934,1,AW00000001,3F5AE95E-B87D-4AED-95B4-C3797AFCB74F,2014-09-12 11:15:07.263
1,2,,1028,1,AW00000002,E552F657-A9AF-4A7D-A645-C429D6E02491,2014-09-12 11:15:07.263
2,3,,642,4,AW00000003,130774B1-DB21-4EF3-98C8-C104BCD6ED6D,2014-09-12 11:15:07.263
3,4,,932,4,AW00000004,FF862851-1DAA-4044-BE7C-3E85583C054D,2014-09-12 11:15:07.263
4,5,,1026,4,AW00000005,83905BDC-6F5E-4F71-B162-C98DA069F38A,2014-09-12 11:15:07.263


### Proposition 1 – StateProvinceID for every state except CountryRegionCode = 'FR'  

**Description:**  
Lists all states and provinces from the `Person.StateProvince` table, excluding those that belong to the country region code `'FR'` (France).  

**How it works:**  
The first `SELECT` statement retrieves all state or province records from the table.  
The second `SELECT` retrieves only those records where `CountryRegionCode = 'FR'`.  
Using `EXCEPT` removes all matching rows from the first result set, leaving every other country’s state or province.  

**Why it’s useful:**  
This query demonstrates how the `EXCEPT` operator can efficiently exclude a subset of data without manually filtering through a `WHERE` clause.  
It’s particularly useful in business analysis when you want to view all regions except those from a specific country — for example, to prepare non-European reports or analyze region-specific trends.



In [7]:
df1= run("""SELECT StateProvinceID, Name, CountryRegionCode
FROM Person.StateProvince
EXCEPT
SELECT StateProvinceID, Name, CountryRegionCode
FROM Person.StateProvince
WHERE CountryRegionCode = 'FR'
ORDER BY CountryRegionCode, StateProvinceID;
""")
df1


Unnamed: 0,StateProvinceID,Name,CountryRegionCode
0,5,American Samoa,AS
1,50,New South Wales,AU
2,64,Queensland,AU
3,66,South Australia,AU
4,71,Tasmania,AU
...,...,...,...
80,79,Washington,US
81,80,Wisconsin,US
82,81,West Virginia,US
83,82,Wyoming,US


### Proposition 2 – Products except those that are discontinued  

**Description:**  
Shows all products in the company’s catalog except the ones that have been discontinued.  

**How it works:**  
The first `SELECT` retrieves every product from `Production.Product`.  
The second `SELECT` retrieves only the discontinued products, where `DiscontinuedDate` is not null.  
Using `EXCEPT` removes those from the full product list, leaving only currently active products.  

**Why it’s useful:**  
This query helps inventory or sales analysts quickly isolate all active products that are still being sold, without manually filtering by date or status.



In [8]:
df2 = run("""
SELECT ProductID, Name, DiscontinuedDate
FROM Production.Product

EXCEPT

SELECT ProductID, Name, DiscontinuedDate
FROM Production.Product
WHERE DiscontinuedDate IS NOT NULL

ORDER BY ProductID;
""")
df2



Unnamed: 0,ProductID,Name,DiscontinuedDate
0,1,Adjustable Race,
1,2,Bearing Ball,
2,3,BB Ball Bearing,
3,4,Headset Ball Bearings,
4,316,Blade,
...,...,...,...
499,995,ML Bottom Bracket,
500,996,HL Bottom Bracket,
501,997,"Road-750 Black, 44",
502,998,"Road-750 Black, 48",


### Proposition 3 – Email addresses of people with EmailPromotion = 2  
**Description:**  
Finds people who have `EmailPromotion = 2` and also possess an email record.  

**How it works:**  
The `INTERSECT` keeps only `BusinessEntityID`s that appear in both the Person table (promotion = 2) and the EmailAddress table.  

**Why it’s useful:**  
Ensures marketing emails are sent only to people who opted in and actually have a valid address, improving targeting accuracy.  
This is especially useful for verifying clean and marketable contact segments.



In [6]:
df3 = run("""
SELECT BusinessEntityID, ea.EmailAddress
FROM Person.EmailAddress AS ea
WHERE ea.BusinessEntityID IN (
    SELECT BusinessEntityID FROM Person.Person WHERE EmailPromotion = 2 AND FirstName LIKE 'Dev%'
    INTERSECT
    SELECT BusinessEntityID FROM Person.EmailAddress
)
ORDER BY ea.EmailAddress;
""")
df3


Unnamed: 0,BusinessEntityID,EmailAddress
0,4910,devin0@adventure-works.com
1,4934,devin12@adventure-works.com
2,4952,devin20@adventure-works.com
3,4959,devin24@adventure-works.com
4,4970,devin29@adventure-works.com
5,4974,devin32@adventure-works.com
6,4983,devin36@adventure-works.com
7,13699,devin45@adventure-works.com
8,4919,devin5@adventure-works.com
9,13716,devin54@adventure-works.com


### Proposition 4 – Job candidates who became employees  
**Description:**  
Identifies job candidates that later appear as employees.  

**How it works:**  
`INTERSECT` compares `BusinessEntityID`s from JobCandidate and Employee tables, returning only people found in both lists. Grouping shows their job titles.  

**Why it’s useful:**  
Useful for HR analysis to see which candidate applications resulted in hires and which positions were successfully filled internally.



In [7]:
df4 = run("""
SELECT e.JobTitle, COUNT(*) AS NumEmployees
FROM HumanResources.Employee AS e
WHERE e.BusinessEntityID IN (
    SELECT BusinessEntityID FROM HumanResources.JobCandidate
    WHERE BusinessEntityID IS NOT NULL
    INTERSECT
    SELECT BusinessEntityID FROM HumanResources.Employee
)
GROUP BY e.JobTitle
ORDER BY NumEmployees DESC, e.JobTitle;
""")
df4


Unnamed: 0,JobTitle,NumEmployees
0,North American Sales Manager,1
1,Quality Assurance Supervisor,1


### Proposition 5 – Addresses located in StateProvinceID = 79  
**Description:**  
Shows addresses that belong specifically to the state with ID 79.  

**How it works:**  
Both SELECTs pull `AddressID`s from Person.Address; the second limits results to `StateProvinceID = 79`.  
`INTERSECT` returns only those IDs present in both sets.  

**Why it’s useful:**  
Helps analysts quickly isolate addresses from one region for targeted mailings or region-specific service tracking.



In [8]:
df5 = run("""
SELECT AddressID, AddressLine1, City, StateProvinceID
FROM Person.Address
WHERE AddressID IN (
    SELECT AddressID FROM Person.Address
    INTERSECT
    SELECT AddressID FROM Person.Address WHERE StateProvinceID = 79
)
ORDER BY AddressID;
""")
df5


Unnamed: 0,AddressID,AddressLine1,City,StateProvinceID
0,1,1970 Napa Ct.,Bothell,79
1,2,9833 Mt. Dias Blv.,Bothell,79
2,3,7484 Roundtree Drive,Bothell,79
3,4,9539 Glenside Dr,Bothell,79
4,5,1226 Shoe St.,Bothell,79
...,...,...,...,...
2631,32517,177 11th Ave,Sammamish,79
2632,32518,8040 Hill Ct,Redmond,79
2633,32519,137 Mazatlan,Seattle,79
2634,32520,5863 Sierra,Bellevue,79


### Proposition 6 – Employees from departments 1 and 2  
**Description:**  
Combines employee lists from Department 1 and Department 2 into one view.  

**How it works:**  
Two SELECTs retrieve the same columns (name and department).  
`UNION` merges both sets, removing duplicate names automatically.  

**Why it’s useful:**  
Useful for management when comparing or merging staff rosters from related departments or teams in joint projects.



In [9]:
df6 = run("""
SELECT p.FirstName + ' ' + p.LastName AS EmployeeName, e.DepartmentID
FROM HumanResources.EmployeeDepartmentHistory AS e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID
WHERE e.DepartmentID = 1
UNION
SELECT p.FirstName + ' ' + p.LastName, e.DepartmentID
FROM HumanResources.EmployeeDepartmentHistory AS e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID
WHERE e.DepartmentID = 2;
""")
df6


Unnamed: 0,EmployeeName,DepartmentID
0,Gail Erickson,1
1,Janice Galvin,2
2,Jossef Goldberg,1
3,Michael Sullivan,1
4,Ovidiu Cracium,2
5,Rob Walters,1
6,Rob Walters,2
7,Roberto Tamburello,1
8,Sharon Salavaria,1
9,Terri Duffy,1


### Proposition 7 – Employees in Department 1 except those with names starting with 'D'  
**Description:**  
Lists Department 1 employees but removes anyone whose first name starts with D.  

**How it works:**  
The first query selects all employees in Department 1.  
The second selects Department 1 employees whose names start with D.  
`EXCEPT` subtracts the second set from the first.  

**Why it’s useful:**  
Demonstrates how `EXCEPT` filters out a subset of data; could be used to exclude a specific group from communications or reports.



In [10]:
df7 = run("""
SELECT p.FirstName + ' ' + p.LastName AS EmployeeName, e.DepartmentID
FROM HumanResources.EmployeeDepartmentHistory AS e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID
WHERE e.DepartmentID = 1
EXCEPT
SELECT p.FirstName + ' ' + p.LastName, e.DepartmentID
FROM HumanResources.EmployeeDepartmentHistory AS e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID
WHERE e.DepartmentID = 1 AND p.FirstName LIKE 'D%';
""")
df7


Unnamed: 0,EmployeeName,DepartmentID
0,Gail Erickson,1
1,Jossef Goldberg,1
2,Michael Sullivan,1
3,Rob Walters,1
4,Roberto Tamburello,1
5,Sharon Salavaria,1
6,Terri Duffy,1


### Proposition 8 – Employee job titles: Managers and Engineers  
**Description:**  
Combines job titles that include 'Manager' or 'Engineer' using `UNION`.  

**How it works:**  
Two simple SELECTs from the same column; `UNION` merges them and removes duplicates.  

**Why it’s useful:**  
Allows HR to view all technical and managerial roles together for workforce analysis without duplication.



In [11]:
df8 = run("""
SELECT JobTitle, LoginID 
FROM HumanResources.Employee
WHERE JobTitle LIKE '%Manager%'
UNION
SELECT JobTitle, LoginID 
FROM HumanResources.Employee
WHERE JobTitle LIKE '%Engineer%'
ORDER BY JobTitle;
""")
df8


Unnamed: 0,JobTitle,LoginID
0,Accounts Manager,adventure-works\david6
1,Design Engineer,adventure-works\gail0
2,Design Engineer,adventure-works\jossef0
3,Design Engineer,adventure-works\sharon0
4,Document Control Manager,adventure-works\zainal0
5,Engineering Manager,adventure-works\roberto0
6,European Sales Manager,adventure-works\amy0
7,Facilities Manager,adventure-works\gary1
8,Finance Manager,adventure-works\wendy0
9,Human Resources Manager,adventure-works\paula0


### Proposition 9 – Addresses located in Dallas or Nevada  
**Description:**  
Retrieves all addresses that are located in Dallas or Nevada.  
By using the `UNION` operator, we combine two separate queries—one for each city—into a single unified result set.  
Unlike `UNION ALL`, the `UNION` operator removes duplicates automatically, ensuring that only unique addresses appear.  

**How it works:**  
1. The first SELECT gets all addresses where City = 'Dallas'.  
2. The second SELECT gets all addresses where City = 'Nevada'.  
3. `UNION` merges both result sets and removes duplicates.  
4. `ORDER BY` sorts the results alphabetically by City.  

**Why it’s useful (Analyst POV):**  
Helps analysts merge regional data (e.g., addresses from multiple states or cities) into a single clean list for reporting, logistics, or marketing campaigns.  
It simplifies analysis by avoiding repeated rows and combining two datasets efficiently.



In [12]:
df9 = run("""
SELECT AddressID, AddressLine1, City
FROM Person.Address
WHERE City LIKE 'Dallas'
UNION
SELECT AddressID, AddressLine1, City
FROM Person.Address
WHERE City LIKE 'Nevada';
""")
df9


Unnamed: 0,AddressID,AddressLine1,City
0,577,2500 North Stemmons Freeway,Dallas
1,25,9178 Jumping St.,Dallas
2,325,9491 Toyon Dr,Dallas
3,588,"99828 Routh Street, Suite 825",Dallas
4,572,P.O. Box 6256916,Dallas
5,581,Po Box 8035996,Dallas
6,574,Po Box 8259024,Dallas
7,27,2487 Riverside Drive,Nevada


### Proposition 10 – Employees without middle names  
**Description:**  
Displays all employees who do not have a recorded middle name.  
Uses `EXCEPT` to remove anyone who has a middle name from the full list of employees.  

**How it works:**  
The first SELECT gets all employee names.  
The second SELECT gets names of employees who have a non-NULL middle name.  
`EXCEPT` subtracts the second set from the first, leaving only employees with no middle name.  

**Why it’s useful:**  
Helpful for HR data cleanup and completeness checks—identifies missing information fields (such as incomplete name records) so they can be updated in personnel files.


In [13]:
df10 = run("""
SELECT p.FirstName + ' ' + p.LastName AS EmployeeName
FROM Person.Person AS p
WHERE p.BusinessEntityID IN (SELECT BusinessEntityID FROM HumanResources.Employee)
EXCEPT
SELECT p.FirstName + ' ' + p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID IN (SELECT BusinessEntityID FROM HumanResources.Employee)
  AND p.MiddleName IS NOT NULL;
""")
df10


Unnamed: 0,EmployeeName
0,A. Scott Wright
1,Bryan Baker
2,Eric Gubbels
3,Jian Shuo Wang
4,Jillian Carson
5,Krishna Sunkammurali
6,Michael Raheem
7,Nuan Yu
8,Rob Walters
9,Roberto Tamburello


### Final Reflection – NACE Competencies & Collaboration Notes  

During this assignment, I was originally paired with **Andrea**, but she was unfortunately dropped from the group at the last moment.  
By that point, most teams were already finalized, so the path of least resistance was for our team to restructure into a **2 – 2 – 1** split.  
I took on the **individual role**, contributing independently while still aligning with my teammates’ shared objectives and formatting standards.

This project required us to transition into the **new SQLAlchemy + engine.connect() setup**, which introduced unexpected connection issues and debugging challenges at first.  
Working through that process strengthened several key **NACE Career Competencies**:

- **Critical Thinking / Problem Solving:** I had to troubleshoot the new database connection syntax and ensure every query executed properly in the updated environment.  
- **Technology Application:** I adapted from the older `%sql` magic workflow to the more modern `create_engine()` and `run()` function pattern, learning how professional analysts interact with databases programmatically.  
- **Professionalism / Work Ethic:** Despite the group changes and setup challenges, I stayed consistent in formatting, commenting, and testing all ten propositions for reproducibility.  
- **Teamwork / Collaboration:** Even working as a single contributor in the 2 – 2 – 1 division, I maintained shared communication and cross-checked query structures with the rest of Group 5 to preserve overall consistency.

Overall, this assignment pushed me to be **self-reliant, adaptive, and technically precise**, mirroring real-world scenarios where tools or teams can change unexpectedly and you must still deliver clean, functional results.
