# Networking and Web

In this notebook, we will explore fundamental concepts of **Networking and Web** in Python:

1. **Sockets Programming**
2. **Working with APIs (REST, GraphQL)**
3. **Web Scraping (Beautiful Soup, Selenium)**

We'll include theoretical explanations and illustrative code examples.

## Table of Contents
1. [Sockets Programming](#sockets)
2. [Working with APIs](#apis)
    - [REST API Requests](#rest)
    - [GraphQL Requests](#graphql)
3. [Web Scraping](#webscraping)
    - [Beautiful Soup](#beautifulsoup)
    - [Selenium](#selenium)

Let's begin!

## 1. Sockets Programming <a name="sockets"></a>

Sockets are low-level endpoints used for sending and receiving data between **clients** and **servers** over networks. Python provides the built-in `socket` library for this purpose.

### Key Concepts
- **AF_INET**: Address Family for IPv4
- **SOCK_STREAM**: TCP sockets (reliable, connection-based)
- **SOCK_DGRAM**: UDP sockets (unreliable, connectionless)
- **Server**: Binds to a particular IP/Port, listens for connections
- **Client**: Connects to the server, sends/receives data

Below is a minimal TCP client-server example.

In [28]:
### server.py example
# Run this code in one cell (or a separate script) to act as a server.

import socket
import sys

def run_server(host='127.0.0.1', port=65432):
    # Create socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind((host, port))
        s.listen()
        print(f"Server listening on {host}:{port}...")

        # Accept a connection
        conn, addr = s.accept()
        with conn:
            print(f"Connected by {addr}")
            while True:
                data = conn.recv(1024)
                if not data:
                    break
                print(f"Received from client: {data.decode('utf-8')}")
                response = "Hello from server!".encode('utf-8')
                conn.sendall(response)

# Note: In a typical scenario, you'd run this in a separate file or console.
if __name__ == "__main__":
    # Uncomment the line below to actually run the server in a script.
    # run_server()
    pass

In [30]:
### client.py example
# Run this code in another cell (or a separate script) to act as a client.

import socket

def run_client(host='127.0.0.1', port=65432):
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((host, port))
        message = "Hello from client!".encode('utf-8')
        s.sendall(message)
        data = s.recv(1024)
        print(f"Received from server: {data.decode('utf-8')}")

# Note: In a typical scenario, you'd run this in a separate file or console.
if __name__ == "__main__":
    # Uncomment the line below to actually run the client in a script.
    # run_client()
    pass

**How to run** (simplified steps):
1. Open one terminal, run the server code (this will listen on `host`, `port`).
2. Open another terminal, run the client code (this will connect to the server, send, and receive messages).

That’s the basic flow of creating a socket-based client-server application in Python!

## 2. Working with APIs <a name="apis"></a>

APIs (Application Programming Interfaces) allow communication between different software systems or services. In the **web** world, this commonly involves **HTTP** requests.

### Python Tools
- **requests** library for simple HTTP(S) requests.
- **json** library for parsing JSON responses.
- Dedicated libraries or frameworks for GraphQL (e.g., `gql`), though you can also use plain `requests`.

Below, let's see how we can work with **REST** and **GraphQL** APIs.

### 2.1 REST API Requests <a name="rest"></a>

**REST (Representational State Transfer)** is a common architectural style for building web services. Typically uses **JSON** for data.

**Example**: We will perform a GET request to a public API. For instance, [JSONPlaceholder](https://jsonplaceholder.typicode.com/) is a free fake REST API for testing.

We can fetch posts from `/posts` endpoint.

In [35]:
import requests

def get_example_posts():
    url = "https://jsonplaceholder.typicode.com/posts"
    response = requests.get(url)
    if response.status_code == 200:
        posts = response.json()  # This is a list of dictionaries
        return posts
    else:
        print(f"Request failed with status: {response.status_code}")
        return []

posts_data = get_example_posts()
print("Total posts fetched:", len(posts_data))
print("First post:", posts_data[0] if posts_data else "No data")

Total posts fetched: 100
First post: {'userId': 1, 'id': 1, 'title': 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit', 'body': 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto'}


**Explanation**:
- We use `requests.get(url)` to make a GET request.
- `response.json()` directly parses JSON data into a Python object.
- Typically, we handle errors by checking `response.status_code`.

### 2.2 GraphQL Requests <a name="graphql"></a>

**GraphQL** is a query language for APIs. Instead of multiple endpoints, GraphQL typically has a single endpoint (e.g., `/graphql`), and the client sends **queries** or **mutations** describing exactly what data is needed.

We can use the Python `requests` library as well. For instance, if you have a GraphQL endpoint, you typically send a POST request with a JSON body containing your **query** (and optionally **variables**).

In [39]:
import requests

def get_graphql_data():
    # This is an example GraphQL endpoint (placeholder).
    # You would need a real endpoint to test.
    url = "https://api.spacex.land/graphql/"  # A publicly accessible GraphQL endpoint
    # Example query: let's fetch some data about launches.
    query = """
    {
      launchesPast(limit: 2) {
        mission_name
        launch_date_utc
        launch_site {
          site_name_long
        }
      }
    }
    """

    response = requests.post(url, json={'query': query})
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"Query failed with status code {response.status_code}")

try:
    data = get_graphql_data()
    # Data is a dictionary with 'data' and possibly 'errors'.
    print("GraphQL response:", data)
except Exception as e:
    print(str(e))

HTTPSConnectionPool(host='api.spacex.land', port=443): Max retries exceeded with url: /graphql/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001812A92F2C0>: Failed to resolve 'api.spacex.land' ([Errno 11001] getaddrinfo failed)"))


**Explanation**:
- GraphQL requests typically use `POST`.
- The body of the request includes a key `query` containing the GraphQL string.
- Optionally, one might include `variables`.
- The response usually contains `data` (the requested fields) or `errors`.

## 3. Web Scraping <a name="webscraping"></a>

**Web Scraping** is the process of extracting data from websites. In Python, two common libraries are:
1. **Beautiful Soup** (BS4) for parsing HTML.
2. **Selenium** for browser automation (which can handle dynamic JavaScript-based pages).

> **Note**: Always check a website’s **robots.txt** and **Terms of Service** before scraping. Make sure you have the right to scrape the information.


### 3.1 Beautiful Soup <a name="beautifulsoup"></a>

**Beautiful Soup** is perfect for scraping static HTML content. We typically:
1. Fetch the page HTML using `requests`.
2. Parse it with Beautiful Soup.
3. Locate tags, classes, IDs, etc., and extract the desired data.

In [44]:
import requests
from bs4 import BeautifulSoup

def scrape_example():
    url = "https://example.com/"  # A simple, static HTML page
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        # Example: Extract the text inside <h1> tag
        h1 = soup.find('h1')
        if h1:
            print("H1 text:", h1.get_text())
        else:
            print("No <h1> found.")
    else:
        print(f"Failed to retrieve page: status {response.status_code}")

# Let's call the function
scrape_example()

H1 text: Example Domain


**Explanation**:
- `BeautifulSoup(response.text, 'html.parser')` will parse HTML content.
- `soup.find('h1')` finds the first `<h1>` element in the document.
- We can also use `soup.find_all(...)` or CSS selectors `soup.select(...)` for more advanced queries.

### 3.2 Selenium <a name="selenium"></a>

**Selenium** drives a real (or headless) browser, making it suitable for **dynamic** sites where HTML is generated by JavaScript.

**Setup**:
- Install Selenium: `pip install selenium`
- You also need a **WebDriver** (e.g., [ChromeDriver](https://chromedriver.chromium.org/), GeckoDriver for Firefox, etc.)

**Basic Steps**:
1. Download the browser driver (e.g., `chromedriver.exe`).
2. Specify the driver path.
3. Use Selenium to open a browser, navigate, and interact with elements.


In [48]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

def selenium_example():
    # Path to your locally installed ChromeDriver
    # On Windows, e.g.: Service('C:/path/to/chromedriver.exe')
    # On macOS/Linux, e.g.: Service('/usr/local/bin/chromedriver')
    # For demonstration, we keep it generic.
    service = Service("/path/to/chromedriver")
    driver = webdriver.Chrome(service=service)

    try:
        driver.get("https://example.com")
        time.sleep(2)  # Let page load fully

        # Extract the <h1> using Selenium
        element = driver.find_element(By.TAG_NAME, 'h1')
        print("H1 text via Selenium:", element.text)

    finally:
        driver.quit()

# Note: This won't run here if you don't have Selenium properly set up.
# Uncomment the line below after installing selenium & chromedriver.
# selenium_example()

**Explanation**:
- `webdriver.Chrome` launches a Chrome browser (or Chrome in headless mode if configured).
- `driver.get(url)` navigates to a web page.
- You can locate elements using `By.ID`, `By.CSS_SELECTOR`, `By.XPATH`, etc.
- **Selenium** is excellent for websites that rely heavily on JavaScript for content rendering.

# Summary
In this notebook, we covered:

1. **Sockets Programming**: Using the built-in `socket` library for low-level TCP client-server communication.
2. **Working with APIs (REST, GraphQL)**: Using `requests` to make HTTP requests and parse JSON. Demonstrated a GraphQL query example.
3. **Web Scraping**:
    - **Beautiful Soup** for parsing static HTML content.
    - **Selenium** for automating a browser and scraping dynamic web pages.

These are core skills for many **data engineering**, **backend**, and **automation** tasks. Make sure to handle them responsibly (respecting Terms of Service and avoiding heavy loads on servers).