# APIs for Web Data Collection and AI Workflows

APIs (Application Programming Interfaces) are structured ways for programs to communicate and exchange data. In web data collection and AI pipelines, APIs allow us to request or send specific information directly to a server — usually in JSON or XML format — without scraping or parsing full web pages.

---

## What Is an API?

APIs provide a structured way for developers to access external data or services and integrate them directly into their own applications. When a domain (like `api.weather.com` or `api.openai.com`) offers an API, it's essentially saying: “Here’s a standardized way for your application to access and reuse our services or data — without having to build that functionality from the ground up.”

You don’t need to understand the internal logic of the system. You only need to refer to the API documentation to know what you can request and how to do it.

An API typically has a base URL (like `https://api.example.com`) and multiple **endpoints** attached to it, each designed to access or modify a specific resource. For example:

* `GET https://api.example.com/users` – returns a list of users
* `GET https://api.example.com/users/7` – returns details of user with ID 7
* `POST https://api.example.com/users` – creates a new user



These URLs, combined with an HTTP method (GET, POST, etc.), make up an API **request**.

You can think of the **base URL** as the main address of a restaurant, and the **endpoints** as the individual items or actions listed on the menu. You send a request to a specific endpoint (like placing an order), and the server responds with the appropriate data..

**So, “using an API” essentially means accessing one or more of its endpoints programmatically in your app.**

---

## HTTP and APIs

Most APIs use **HTTP (HyperText Transfer Protocol)** to send and receive data. HTTP defines methods that specify the type of action you want to perform.

### Common HTTP Methods:

* `GET`: Retrieve data from the server
* `POST`: Send data to the server
* `PUT`: Update existing data
* `DELETE`: Remove data

These methods form the foundation of how clients interact with web-based APIs.

### Why APIs Differ from Regular HTTP Requests

A common misconception is that a GET request to a website and a GET request to an API are the same. While both use the same HTTP method, they differ in intent and structure:

* A **GET request to a website** returns an entire HTML page meant for browsers. You often need to manually parse it to extract relevant data.
* A **GET request to an API** returns only the specific, structured data you asked for, such as JSON, without presentation elements.

APIs are designed for programs, not humans, making data access cleaner and more predictable.

---

## HTTP Clients

An **HTTP client** is a tool or library that sends HTTP requests and handles the responses.

Examples:

* **Python libraries**: `requests`, `httpx`
* **Command-line**: `curl`
* **GUI tools**: Postman
* **Browsers**: Send GET requests when you visit a URL

---

## Example: GET Request in Python

```python
import requests

response = requests.get("https://api.github.com/users/mayurondata")

print(response.status_code)  # e.g., 200 OK
print(response.json())       # Parsed response content
```

---

## Example: GET Request Using curl

```bash
curl https://api.github.com/users/mayurondata
```

---

## Example: POST Request in Python

```python
import requests

url = "https://api.example.com/users"
data = {
    "name": "Mayuresh",
    "email": "mayu@example.com"
}

response = requests.post(url, json=data)

print(response.status_code)  # e.g., 201 Created
print(response.json())       # Parsed response
```

---

## Example: POST Request Using curl

```bash
curl -X POST https://api.example.com/users \
     -H "Content-Type: application/json" \
     -d '{"name": "Mayuresh", "email": "mayu@example.com"}'
```

---

## RESTful APIs

A RESTful API is a type of web API that follows a specific set of design rules called REST (Representational State Transfer). It’s a way of organizing how clients (like your Python script) and servers (like a weather or user database) communicate with each other over the internet.

The main idea behind REST is:

- *“Treat everything as a resource (like users, posts, or products), and use standard web actions (GET, POST, PUT, DELETE) to work with them.”*

Each resource has its own URL (called an API endpoint), and you interact with it using HTTP methods. These methods tell the server what action to perform — such as fetching data, creating a new item, updating something, or deleting it.

### Key REST Principles:

* Each resource has its own URL
* Standard HTTP methods are used
* Stateless communication: Each request from the client to the server is processed independently. The server does not remember any previous interaction. This means all the necessary context (such as authentication details or parameters) must be included in every request.
* Data is returned in a structured format (usually JSON)

### Example: REST API Endpoints

| Action            | URL        | Method |
| ----------------- | ---------- | ------ |
| Get all users     | `/users`   | GET    |
| Get specific user | `/users/7` | GET    |
| Create new user   | `/users`   | POST   |
| Update user       | `/users/7` | PUT    |
| Delete user       | `/users/7` | DELETE |

---

## Common Data Formats Returned by APIs

| Format | Description                          |
| ------ | ------------------------------------ |
| JSON   | Lightweight, easy to parse           |
| XML    | More verbose, used in legacy systems |
| CSV    | Tabular data                         |
| YAML   | Used in config-heavy environments    |

---

## Real-Life Analogy: Restaurant and API

| Concept     | API Equivalent                        |
| ----------- | ------------------------------------- |
| Customer    | Client (script or browser)            |
| Menu        | API documentation                     |
| Order       | HTTP request                          |
| Waiter      | HTTP client (requests, curl, Postman) |
| Kitchen     | API server                            |
| Served meal | HTTP response (JSON, XML, etc.)       |

---

## APIs vs Web Scraping

| Feature       | APIs                          | Web Scraping                    |
| ------------- | ----------------------------- | ------------------------------- |
| Data Format   | Structured (JSON, XML)        | Unstructured (HTML)             |
| Stability     | Stable if documented          | Breaks with layout changes      |
| Legal Clarity | Usually allowed via docs      | May violate terms of service    |
| Performance   | Faster, more efficient        | Slower, parsing overhead        |
| Use Case      | Preferred when API is present | Needed when no API is available |

---

## Why APIs Matter in AI and Data Science

APIs are crucial for modern AI and data pipelines. They allow:

* Access to live or large-scale data sources
* Integration with third-party services (e.g., OpenAI, Hugging Face)
* Deployment of ML models as services
* Access to vector search systems (e.g., Pinecone, Qdrant)
* Real-time dashboards and workflow automation

---

## When to Use an API

Use an API when:

* A public or documented API is available
* You want structured and reliable data
* You need to avoid the fragility of web scraping
* Your application requires real-time or automated access to data

---

## Summary: APIs vs Regular GET Requests

While both API calls and traditional website visits use HTTP GET under the hood, APIs are different because:

* They are designed to deliver specific, structured data (like JSON)
* They eliminate the need to manually extract information from HTML
* They are meant for software-to-software communication

In contrast, a GET request to a regular web page returns all the HTML, requiring additional parsing to extract only what you need.
