# URL Operations - Learning Exercises

This notebook contains step-by-step exercises to practice working with URLs in Python using `urllib`, `requests`, and related libraries.

Fill in the code cells as instructed.


## Exercise 1: Parse a URL
Parse the URL `'https://example.com/search?q=python&sort=asc'` and print:
- Scheme
- Netloc
- Path
- Query string

In [31]:
from urllib.parse import urlparse, parse_qs

url = 'https://example.com/search?q=python&sort=asc#section2'

# Your code here
parsed = urlparse(url)

print(parsed.scheme)
print(parsed.netloc)
print(parsed.path)
print(parsed.query)
print(parsed.fragment)
##
# print(parse_qs(parsed.query))  # parameters


https
example.com
/search
q=python&sort=asc
section2


In [32]:
## Parse for all six structures - <scheme>://<netloc>/<path>;<params>?<query>#<fragment>

from urllib.parse import urlparse, parse_qs

url = "https://example.com:8080/products/list;view?category=books&sort=asc#reviews"

# Parse URL
parsed = urlparse(url)

print("Scheme   :", parsed.scheme)
print("Netloc   :", parsed.netloc)
print("Path     :", parsed.path)
print("Params   :", parsed.params)
print("Query    :", parsed.query)
print("Fragment :", parsed.fragment)

# Convert query string into dictionary
params = parse_qs(parsed.query)
print("Query Params (dict):", params)

# Extract hostname and port
domain = parsed.hostname
port = parsed.port

print("Domain and Port are :", domain,"and" ,port)


Scheme   : https
Netloc   : example.com:8080
Path     : /products/list
Params   : view
Query    : category=books&sort=asc
Fragment : reviews
Query Params (dict): {'category': ['books'], 'sort': ['asc']}
Domain and Port are : example.com and 8080


## Exercise 2: Construct a URL
Construct a URL from the following parts:
- Scheme: `https`
- Domain: `openai.com`
- Path: `research`
- Query: `topic=ml`

In [None]:
# Construct for all 6 parts - 
#   <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
from urllib.parse import urlunparse

# Parts of a URL
scheme = "https"
netloc = "example.com:8080"
path = "products/list"
params = "view"
query = "category=books&sort=asc"
fragment = "reviews"

# Construct URL
url = urlunparse((scheme, netloc, path, params, query, fragment))

print(url)


https://openai.com/research?topic=ml#fragment_data


In [34]:
# Construct using saperate hostname and port:  
#   <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
from urllib.parse import urlunparse

# Define parts
scheme = "https"
hostname = "example.com"
port = 8080
netloc = f"{hostname}:{port}"   # combine hostname and port

path = "products/list"
params = "view"
query = "category=books&sort=asc"
fragment = "reviews"

# Construct URL
url = urlunparse((scheme, netloc, path, params, query, fragment))
print(url)



https://example.com:8080/products/list;view?category=books&sort=asc#reviews


## Exercise 3: Encode & Decode Query Parameters
Use `urllib.parse.urlencode` to encode a dictionary into a query string, then decode it back using `parse_qs`.

In [35]:
from urllib.parse import urlencode, parse_qs

params = {'name': 'Alice', 'age': 25}
# Your code here
query_string = urlencode(params)
print(query_string)

decoded = parse_qs(query_string)
print(decoded)

name=Alice&age=25
{'name': ['Alice'], 'age': ['25']}


## Exercise 4: Fetch HTML Content
Fetch content from `https://httpbin.org/html` and print:
- Status code
- First 200 characters of the response

In [39]:
import requests

url = 'https://httpbin.org/html'
# Your code here

response = requests.get(url)

print("Status code: ", response.status_code)

print("Content preview: ", response.text[:200])

Status code:  200
Content preview:  <!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
      <h1>Herman Melville - Moby-Dick</h1>

      <div>
        <p>
          Availing himself of the mild, summer-cool weather that now reigned in t


## Exercise 5: GET Request with Parameters
Make a GET request to `https://httpbin.org/get` with parameters `name=John`, `age=30`.

In [None]:
import requests

url = 'https://httpbin.org/get'
params = {'name': 'John', 'age': 30}
# Your code here
response = requests.get(url, params=params)

print("Status code: ", response.status_code)
print("Final URL:", response.url)
print("Response JSON: ", response.json())

Status code:  200
Final URL: https://httpbin.org/get?name=John&age=30
Response JSON:  {'args': {'age': '30', 'name': 'John'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.3', 'X-Amzn-Trace-Id': 'Root=1-68a2c94f-1c73f3c267c57df84d48b003'}, 'origin': '49.36.182.14', 'url': 'https://httpbin.org/get?name=John&age=30'}


## Exercise 6: POST Request with JSON
POST to `https://httpbin.org/post` with JSON body: `{'username': 'test_user', 'password': '1234'}`

In [49]:
import requests

url = 'https://httpbin.org/post'
data = {'username': 'test_user', 'password': '1234'}
# Your code here
response = requests.post(url, json=data)

print(response.json())

{'args': {}, 'data': '{"username": "test_user", "password": "1234"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '45', 'Content-Type': 'application/json', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.3', 'X-Amzn-Trace-Id': 'Root=1-68a2d080-6f73e99159193f4524a8c737'}, 'json': {'password': '1234', 'username': 'test_user'}, 'origin': '103.188.127.76', 'url': 'https://httpbin.org/post'}


In [45]:
# ✅ Option 1: Send JSON body (preferred for APIs)
import requests

url = "https://httpbin.org/post"

payload = {
    "username": "test_user",
    "password": "1234"
}
response = requests.post(url, json=payload)

print(response.json())


{'args': {}, 'data': '{"username": "test_user", "password": "1234"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '45', 'Content-Type': 'application/json', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.3', 'X-Amzn-Trace-Id': 'Root=1-68a2cec1-1dcd7d9745df296c7d2d037d'}, 'json': {'password': '1234', 'username': 'test_user'}, 'origin': '103.188.127.76', 'url': 'https://httpbin.org/post'}


In [46]:
# ✅ Option 2: Send Form Data
import requests

url = "https://httpbin.org/post"

form_data = {
    "user": "alice",
    "role": "admin"
}
response = requests.post(url, data=form_data)

print(response.json())


{'args': {}, 'data': '', 'files': {}, 'form': {'role': 'admin', 'user': 'alice'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '21', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.3', 'X-Amzn-Trace-Id': 'Root=1-68a2cecd-36a4cbff7ddd015158cf5931'}, 'json': None, 'origin': '103.188.127.76', 'url': 'https://httpbin.org/post'}


In [47]:
## ✅ Option 3: Send both (not common, but possible)
import requests
import json

url = "https://httpbin.org/post"

form_data = {
    "user": "alice",
    "extra": json.dumps({"username": "test_user", "password": "1234"})
}

response = requests.post(url, data=form_data)

print(response.json())


{'args': {}, 'data': '', 'files': {}, 'form': {'extra': '{"username": "test_user", "password": "1234"}', 'user': 'alice'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br, zstd', 'Content-Length': '88', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.32.3', 'X-Amzn-Trace-Id': 'Root=1-68a2cf1e-57c42de8610a7f0f178eac3a'}, 'json': None, 'origin': '103.188.127.76', 'url': 'https://httpbin.org/post'}


## Exercise 7: Handle Redirects
Request `http://github.com` and check the final redirected URL.

In [53]:
import requests

url = 'http://github.com'
# Your code here
response = requests.get(url)

final_url = response.url

print("final_url :", final_url)
print("status_code :", response.status_code)

final_url : https://github.com/
status_code : 200


## Exercise 8: Download an Image
Download the image from `https://httpbin.org/image/png` and save it as `downloaded.png`.

In [56]:
import requests

url = 'https://httpbin.org/image/png'
# Your code here
filename = "download.png"

response = requests.get(url)

with open(filename, "wb") as f:
    f.write(response.content)

print(f"Image saved as {filename}")

Image saved as download.png


In [54]:
import requests

url = "https://httpbin.org/image/png"   # sample image
filename = "downloaded.png"

# Fetch image
response = requests.get(url)

# Save as file (in binary mode)
with open(filename, "wb") as f:
    f.write(response.content)

print(f"Image saved as {filename}")


Image saved as downloaded.png


In [55]:
import requests

url = "https://httpbin.org/image/jpeg"
filename = "downloaded.jpg"

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open(filename, "wb") as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print(f"Image saved as {filename}")


Image saved as downloaded.jpg


## Exercise 9: Scrape Links
Fetch all `<a href=...>` links from `https://example.com`.
(Hint: use `BeautifulSoup`)

In [61]:
import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
# Your code here
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

links = [a.get("href") for a in soup.find_all("a", href=True)]

print("Found links: ")
for link in links:
    print(link)



Found links: 
https://www.iana.org/domains/example


In [59]:
import requests
from bs4 import BeautifulSoup

url = "https://example.com"

# Fetch page
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Extract all <a> tags with href
links = [a.get("href") for a in soup.find_all("a", href=True)]

print("Found links:")
for link in links:
    print(link)


Found links:
https://www.iana.org/domains/example


In [58]:
from urllib.parse import urljoin

absolute_links = [urljoin(url, link) for link in links]
print("Absolute links:", absolute_links)


Absolute links: ['https://www.iana.org/domains/example']


## Exercise 10: Handle Errors
Request a non-existing page and handle errors gracefully with `try-except`.

In [30]:
import requests

url = 'https://httpbin.org/status/404'
# Your code here

In [62]:
import requests

url = "https://httpbin.org/status/404"  # non-existing page

try:
    response = requests.get(url)
    response.raise_for_status()  # raises HTTPError for 4xx/5xx responses
    print("Success:", response.status_code)
except requests.exceptions.HTTPError as e:
    print("HTTP Error:", e)
except requests.exceptions.RequestException as e:
    print("Request failed:", e)


HTTP Error: 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404


In [63]:
## Three times recheck
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

url = "https://httpbin.org/status/500"  # server error

# Retry strategy
retry_strategy = Retry(
    total=3,                # retry up to 3 times
    backoff_factor=1,       # wait 1s, then 2s, then 4s...
    status_forcelist=[500, 502, 503, 504],
    allowed_methods=["GET", "POST"]
)

# Attach retry logic to a Session
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)

try:
    response = session.get(url)
    response.raise_for_status()
    print("Success:", response.status_code)
except requests.exceptions.HTTPError as e:
    print("HTTP Error:", e)
except requests.exceptions.RequestException as e:
    print("Request failed:", e)


Request failed: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /status/500 (Caused by ResponseError('too many 500 error responses'))
