### **Key Advanced Topics**

## **CRUD Ops**

In [None]:
# Initialize "users table"
users = []

# -------------------------------
# CREATE (equivalent of SQL INSERT)
# -------------------------------
def create_user(user_id, name):
    """Insert a new user into the table."""
    users.append({"id": user_id, "name": name})
    return f"User {name} added!"

# -------------------------------
# READ (equivalent of SQL SELECT)
# -------------------------------
def read_user(user_id):
    """Select a user by ID."""
    for user in users:
        if user["id"] == user_id:
            return user
    return "User not found."

# -------------------------------
# UPDATE (equivalent of SQL UPDATE)
# -------------------------------
def update_user(user_id, new_name):
    """Update a user's name by ID."""
    for user in users:
        if user["id"] == user_id:
            user["name"] = new_name
            return f"User {user_id} updated to {new_name}"
    return "User not found."

# -------------------------------
# DELETE (equivalent of SQL DELETE)
# -------------------------------
def delete_user(user_id):
    """Delete a user by ID."""
    global users
    users = [u for u in users if u["id"] != user_id]
    return f"User {user_id} deleted."

# -------------------------------
# EXAMPLE RUN
# -------------------------------
print(create_user(1, "Kev"))         # INSERT
print(create_user(2, "Jordan"))      # INSERT
print(read_user(1))                  # SELECT
print(update_user(2, "Mike"))        # UPDATE
print(delete_user(1))                # DELETE
print(users)                         # Show remaining table



## **Command Line Interface (CLI) Program**

In [None]:
from argparse import ArgumentParser  # For parsing command line arguments
 
parser = ArgumentParser() # Create the parser

parser.add_argument('--output', '-o', required=True, help='The destination file for the output of this program') # Add the arguments to the parser
parser.add_argument('--text', '-t', required=True, help='The text to write to the file') # Add the arguments to the parser


args = parser.parse_args()  # Parse the command line arguments

with open(args.output, 'w') as f: # Open the output file for writing
    f.write(args.text+'\n') # Write the text to the file

print(f'Wrote "{args.text}" to file "{args.output}"')  # Print a message indicating success

## THINGS you would want to adjust:
# File Path - If you want flexibility, make the filename a CLI argument (--file users.json).
# Data Format - JSON is easiest, but you could also do CSV (though then you need csv module and fixed columns).
# Validation - With persistence, you may want to prevent duplicates or enforce unique IDs.
# Error Handling - If the file is empty or corrupted, handle gracefully (try/except).

usage: ipykernel_launcher.py [-h] --output OUTPUT --text TEXT
ipykernel_launcher.py: error: the following arguments are required: --output/-o, --text/-t


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


**Example**

**1. Navigation & Files****

**# CLI commands**

ls       # list files

cd ..    # change directory

pwd      # show current path

mkdir test_folder   # make directory

rm -r test_folder   # remove directory


**2. Chaining**

In [None]:
cat file.txt | grep "error" > errors.txt


**3. Process Control**

In [None]:
ps aux       # show processes
kill -9 PID  # kill process


**4. Scripting with Python + argparse**

In [None]:
# save as cli_demo.py
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--name", "-n", required=True)
args = parser.parse_args()

print(f"Hello {args.name}, CLI works!")


**Then run file in terminal**




In [None]:
python cli_demo.py -n Kevin

**How They Work Together (ETL Pipeline)**

In [None]:
"""
ETL Demo:
1. Fetch data from API (Extract).
2. Transform with pandas.
3. Save to CSV (Load).
4. Automate with cron/Task Scheduler.
"""
import requests
import pandas as pd

# Extract
url = "https://jsonplaceholder.typicode.com/users"
data = requests.get(url).json()

# Transform
df = pd.DataFrame(data)[["id", "name", "email"]]

# Load
df.to_csv("users.csv", index=False)

print("Pipeline complete! Data saved to users.csv")


## **Python & SQL Connection**

**Python & PostgresSQL connection using SQLAlchemy | STEP 1**

In [None]:
## First do: pip install sqlalchemy psycopg2   # psycopg2 for PostgreSQL, or another driver

from sqlalchemy import create_engine, text

## Created an Engine in Python
engine = create_engine("postgresql+psycopg2://postgres:password@localhost:0000/database_name") ## Database Engine Connector

## Opened a Connection & Sent SQL Commands
with engine.connect() as conn:
    result = conn.execute(text("SELECT version();"))
    print(result.fetchone())

##Proof of Concept Output: ('PostgreSQL 17.6 on x86_64-windows, compiled by msvc-19.44.35213, 64-bit',)

## Things that Change:
# Password
# Localhost
# Database name


('PostgreSQL 17.6 on x86_64-windows, compiled by msvc-19.44.35213, 64-bit',)


**Creating Table in Python that will Reflect in PostegreSQL | STEP 2**

In [None]:
with engine.begin() as conn:
    conn.execute(text("CREATE SCHEMA IF NOT EXISTS portfolio;"))
    conn.execute(text("""
        CREATE TABLE IF NOT EXISTS portfolio.ai_learning_population (        
            id SERIAL PRIMARY KEY,
            score NUMERIC,
            stage VARCHAR(20)
        )
    """))

## Things that Changing
# Schema Design


## **APIs & Hypertext Transfer Protocols (HTTP Methods)**

In [None]:
import requests

base_url = "https://jsonplaceholder.typicode.com/users"

# GET (Read)
resp = requests.get(f"{base_url}/1") 
print(resp.json())   # user with id=1

# POST (Create)
resp = requests.post(base_url, json={"name": "Kev"})
print(resp.json())

# PUT (Update - full)
resp = requests.put(f"{base_url}/1", json={"name": "Kevin Updated"})
print(resp.json())

# PATCH (Update - partial)
resp = requests.patch(f"{base_url}/1", json={"name": "Kev R"})
print(resp.json())

# DELETE
resp = requests.delete(f"{base_url}/1")
print(resp.status_code)  # 200 if success


**REST API**

In [None]:
# REST API Demo with JSONPlaceholder
import requests

url = "https://jsonplaceholder.typicode.com/users"

# GET (read) 
print("GET:", requests.get(url + "/1").json()) 

# POST (create)
print("POST:", requests.post(url, json={"name": "Kev"}).json()) 

# PUT (update full)
print("PUT:", requests.put(url + "/1", json={"name": "Kevin Updated"}).json())

# DELETE
print("DELETE:", requests.delete(url + "/1").status_code)


**SOAP API (Legacy & XML)**

In [None]:
from zeep import Client

wsdl = "http://www.dneonline.com/calculator.asmx?WSDL"
client = Client(wsdl)
print("SOAP Add:", client.service.Add(5, 7))

# --- Step 1: Import the SOAP client library (zeep) ---
# Zeep is a Python library that makes it easy to work with SOAP APIs
# SOAP (Simple Object Access Protocol) = XML-based web services (older than REST)
# --- Step 2: Point to the WSDL (Web Services Description Language) ---
# A WSDL file describes the SOAP service: available operations, inputs, outputs.
# Here we use a public demo service: "Calculator"
# --- Step 3: Create a SOAP client bound to the WSDL ---
# The Client object reads the WSDL and creates Python methods
# that map to the SOAP operations (e.g., Add, Subtract, Multiply, Divide).
# --- Step 4: Call a SOAP operation ---
# Now we can call the service like a Python method.
# Example: Add(5, 7) should return 12.

# --- Step 5: Print the result ---



SOAP Add: 12


**GraphQL API**

In [None]:
import requests

url = "https://countries.trevorblades.com/"
query = """
{
  country(code: "US") {
    name
    capital
    emoji
  }
}
"""
r = requests.post(url, json={"query": query})
print("GraphQL:", r.json())

# --- Step 1: Import requests library ---
# requests lets us send HTTP requests in Python (GET, POST, etc.)
# --- Step 2: Define the GraphQL endpoint ---
# This is a public GraphQL API that provides data about countries.
# --- Step 3: Write the GraphQL query ---
# GraphQL queries look like JSON, but they describe *what fields we want back*.
# Here: get the country with code "US" and return only name, capital, and emoji.
# --- Step 4: Send the query to the API ---
# GraphQL uses POST requests.
# The query must be sent as JSON with the key "query".
# --- Step 5: Print the response in JSON format ---
# The API will only return the fields we asked for.


GraphQL: {'data': {'country': {'capital': 'Washington D.C.', 'emoji': '🇺🇸', 'name': 'United States'}}}


**Authentication**

In [None]:
import requests

API_KEY = "your_api_key_here"
url = "https://api.openweathermap.org/data/2.5/weather"
params = {"q": "London", "appid": API_KEY}

r = requests.get(url, params=params)
print("Weather:", r.json())

# --- Step 1: Import requests library ---
# We'll use requests to send HTTP requests to the weather API.
# --- Step 2: Define API credentials and endpoint ---
# You need a valid API key from OpenWeatherMap (free signup).
# --- Step 3: Define parameters for the request ---
# "q" = city name
# "appid" = authentication with your API key
# --- Step 4: Send GET request with parameters ---
# requests.get() will build the full URL with ?q=London&appid=...
# --- Step 5: Print the response in JSON format ---
# The API responds with weather data in JSON format.


Weather: {'cod': 401, 'message': 'Invalid API key. Please see https://openweathermap.org/faq#error401 for more info.'}


## **WEBSCRAPING**

**Web Scraping Example with BeautifulSoup #1**

In [None]:
import requests # type: ignore
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/120.0.0.0 Safari/537.36"
    )
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

books = []

# Each book is inside <article class="product_pod">
for book in soup.find_all("article", class_="product_pod"):
    title = book.h3.a["title"]
    price = book.find("p", class_="price_color").get_text(strip=True)
    books.append({"title": title, "price": price})

# Show first 5
for b in books[:5]:
    print(b)

# --- Step 1: Import libraries ---
# requests: to send HTTP requests and fetch HTML
# BeautifulSoup: to parse and extract data from HTML
# --- Step 2: Define the target URL ---
# We're scraping a demo site made for practice: "Books to Scrape"
# --- Step 3: Add headers ---
# This makes the request look like it's coming from a browser (helps avoid blocks).
##headers = {
   ## "User-Agent": (
    ## "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    ## "AppleWebKit/537.36 (KHTML, like Gecko) "
    ##  "Chrome/120.0.0.0 Safari/537.36"
   ## )
##}
# --- Step 4: Send GET request to fetch the page ---
# --- Step 5: Parse HTML with BeautifulSoup ---
# --- Step 6: Extract info ---



{'title': 'A Light in the Attic', 'price': 'Â£51.77'}
{'title': 'Tipping the Velvet', 'price': 'Â£53.74'}
{'title': 'Soumission', 'price': 'Â£50.10'}
{'title': 'Sharp Objects', 'price': 'Â£47.82'}
{'title': 'Sapiens: A Brief History of Humankind', 'price': 'Â£54.23'}


**Web Scraping Example with BeautifulSoup #2**

In [None]:
import requests
from bs4 import BeautifulSoup

url = "http://quotes.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []

for q in soup.find_all("div", class_="quote"):
    text = q.find("span", class_="text").get_text(strip=True)
    author = q.find("small", class_="author").get_text(strip=True)
    quotes.append({"quote": text, "author": author})

# Show first 5
for q in quotes[:5]:
    print(q)

# --- Step 1: Import libraries ---
# requests: send HTTP requests to fetch HTML
# BeautifulSoup: parse and extract information from HTML
# --- Step 2: Define target URL ---
# "Quotes to Scrape" is a sandbox site for practicing scraping
# --- Step 3: Fetch the page content ---
# --- Step 4: Parse HTML with BeautifulSoup ---
## soup = BeautifulSoup(response.text, "html.parser")
# --- Step 5: Extract quotes ---
## quotes = []
# Each quote is inside <div class="quote">
# --- Step 6: Show sample results ---
# Print the first 5 scraped quotes

{'quote': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein'}
{'quote': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'author': 'J.K. Rowling'}
{'quote': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'author': 'Albert Einstein'}
{'quote': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'author': 'Jane Austen'}
{'quote': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'author': 'Marilyn Monroe'}


**Scraping per Request Directly Into Table for Pandas Analysis Example**

In [None]:
import requests
import pandas as pd

url = "https://jsonplaceholder.typicode.com/posts"
response = requests.get(url)
data = response.json()

df = pd.DataFrame(data)
print(df.head())

# --- Step 1: Import libraries ---
# requests: to call the API
# pandas: to store and analyze data in a DataFrame
# --- Step 2: Define the API endpoint ---
# JSONPlaceholder is a free fake REST API for testing and prototyping
# --- Step 3: Send GET request ---
# --- Step 4: Convert API response to JSON ---
# .json() turns the response into a Python list of dictionaries
# --- Step 5: Load JSON into a pandas DataFrame ---
# Each dictionary becomes a row, each key becomes a column
# --- Step 6: Preview the first x rows ---

   userId  id                                              title  \
0       1   1  sunt aut facere repellat provident occaecati e...   
1       1   2                                       qui est esse   
2       1   3  ea molestias quasi exercitationem repellat qui...   
3       1   4                               eum et est occaecati   
4       1   5                                 nesciunt quas odio   

                                                body  
0  quia et suscipit\nsuscipit recusandae consequu...  
1  est rerum tempore vitae\nsequi sint nihil repr...  
2  et iusto sed quo iure\nvoluptatem occaecati om...  
3  ullam et saepe reiciendis voluptatem adipisci\...  
4  repudiandae veniam quaerat sunt sed\nalias aut...  


**Webscraping with For Loop**

In [None]:
##Web Scraping Example
import requests
from bs4 import BeautifulSoup   # pip install beautifulsoup4

url = "https://www.cnn.com/"
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

# Example: get all links
for link in soup.find_all("a"):
    print(link.get("href"))

## --- Web Scraping Example ---
# pip install beautifulsoup4

# --- Step 1: Define the target URL ---
# --- Step 2: Send an HTTP GET request to fetch the page ---
# --- Step 3: Parse the HTML with BeautifulSoup ---
# --- Step 4: Extract all links ---
# Each link is inside an <a> tag
# .find_all("a") finds every anchor element on the page
    # .get("href") extracts the hyperlink (destination URL)

"""
===============================
 SCRAPING DECISION MATRIX
===============================

 What you CAN scrape directly from HTML (requests + BeautifulSoup):
- Static text (headlines, paragraphs, quotes) → <h1>, <p>, <span>
- Links → <a href=...>
- Images → <img src=...>
- Tables → <table>, <tr>, <td>
- Metadata → <title>, <meta>
- Lists of products, quotes, or posts inside <div> blocks

 When HTML ALONE is not enough:
- Dynamic content (loaded with JavaScript) → requests won’t see it
- Infinite scroll feeds (social media, product catalogs) → only loads on scroll
- Interactive forms / login required → HTML just shows login, not data
- Live data (stocks, sports scores) → comes from hidden API calls
- Embedded media (videos, audio streams) → often split into chunks

 Solutions when HTML is not enough:
- Check Network Tab in browser dev tools → often reveals hidden JSON APIs
- If API exists, use requests to hit it directly (cleaner & faster than scraping)
- If no API, simulate browser with Selenium or Playwright to render JS

RULE OF THUMB:
- Start with BeautifulSoup (fast & lightweight).
- If missing data → look for API in Network Tab.
- If still stuck → use Selenium/Playwright as a last resort.
"""




https://www.cnn.com
https://www.cnn.com/us
https://www.cnn.com/world
https://www.cnn.com/politics
https://www.cnn.com/business
https://www.cnn.com/health
https://www.cnn.com/entertainment
https://www.cnn.com/cnn-underscored
https://www.cnn.com/style
https://www.cnn.com/travel
https://www.cnn.com/sports
https://www.cnn.com/science
https://www.cnn.com/climate
https://www.cnn.com/weather
https://www.cnn.com/world/europe/ukraine
https://www.cnn.com/world/middleeast/israel
https://www.cnn.com/games
https://www.cnn.com/cnn-underscored/deals/prime-day
None
https://www.cnn.com/us
https://www.cnn.com/world
https://www.cnn.com/politics
https://www.cnn.com/business
https://www.cnn.com/health
https://www.cnn.com/entertainment
https://www.cnn.com/cnn-underscored
https://www.cnn.com/style
https://www.cnn.com/travel
https://www.cnn.com/sports
https://www.cnn.com/science
https://www.cnn.com/climate
https://www.cnn.com/weather
https://www.cnn.com/world/europe/ukraine
https://www.cnn.com/world/middleeas

## **CRON Automated Job Scheduling**

**STEPS & WORKFLOW EXAMPLE**

**1. The crontab schedule format**

In [29]:
0 6 * * * /usr/bin/python3 /home/kev/scripts/etl_pipeline.py 

## → Run every day at 6:00 AM.

SyntaxError: invalid syntax (2694831360.py, line 1)

**2. A Python ETL script (etl_pipeline.py)**

In [None]:
import requests, pandas as pd
url = "https://jsonplaceholder.typicode.com/users" 
data = requests.get(url).json()
df = pd.DataFrame(data)[["id", "name", "email"]]
df.to_csv("/home/kev/data/users.csv", index=False)

## → Pulls API data, transforms with pandas, saves to CSV.

**3. How to register it with CRON**

In [None]:
crontab -e
0 6 * * * /usr/bin/python3 /home/kev/scripts/etl_pipeline.py >> /home/kev/logs/etl.log 2>&1

## → Makes it run automatically at 6AM, logs output.