
# APIs for Web Data Collection and AI Workflows

APIs (Application Programming Interfaces) are structured ways for programs to communicate and exchange data. In web data collection and AI pipelines, APIs allow us to request specific information directly from a server — usually in JSON or XML format — without scraping or parsing web pages.

---

## What Is an API?

An API is like a menu at a restaurant. It tells you what information or actions you can request from a system. You (the client) send a request, and the system (the server) returns what you asked for — usually in a well-organized format like JSON.

You don’t need to know how the kitchen works (the internal code or database). You just need to know what is available (API documentation) and how to ask for it (via API requests).

---

## What Is HTTP?

**HTTP (HyperText Transfer Protocol)** is the main method used to send and receive data on the web. APIs use HTTP to communicate.

Most API URLs start with:

```

https\://

```

This stands for **HyperText Transfer Protocol Secure** — it ensures the data sent and received is encrypted and safe from tampering or spying.

An HTTP message generally includes:

1. **Method** — the action you want to perform:
   - `GET` → retrieve data
   - `POST` → send new data
   - `PUT` → update existing data
   - `DELETE` → remove data

2. **URL** — the location of the resource you want to interact with

3. **Headers** — extra information like content type or API key

4. **Body** — the data you are sending to the server (only for methods like POST and PUT)

Example (not an actual command):

```

GET [https://api.example.com/users/7](https://api.example.com/users/7)

````

This means: "Fetch the data about user 7 from the API."

---

## What Is an HTTP Client?

An **HTTP client** is a program or tool that can send HTTP requests and receive responses from a server.

Simple explanation:  
If the API is a restaurant menu, the HTTP client is the waiter — it takes your order (request) and brings back your food (response).

### Common HTTP Clients:

- **Your browser** — sends GET requests to websites
- **Python libraries**:
  - `requests` — easy to use for most basic API work
  - `httpx` — more modern and supports async operations
- **Command-line tools**:
  - `curl` — sends HTTP requests via the terminal
- **Graphical tools**:
  - `Postman` — a GUI tool to test, explore, and document APIs

---

## Example: Making a GET Request in Python

```python
import requests

response = requests.get("https://api.github.com/users/mayurondata")

print(response.status_code)  # Shows the response status (e.g., 200 OK)
print(response.json())       # Prints the returned data as a Python dictionary
````

---

## Using curl to Make GET Requests

```bash
curl https://api.github.com/users/mayurondata
```

This sends an HTTP GET request to GitHub's API and prints the response (likely in JSON format).

---

## Making a POST Request – Explanation and Examples

### What is POST?

* A **POST** request is used when you want to **send data** to a server.
* This is common when submitting a form, uploading something, or creating a new resource (like a user or post).

---

### Example: POST Request Using Python

```python
import requests

url = "https://api.example.com/users"
data = {
    "name": "Mayuresh",
    "email": "mayu@example.com"
}

response = requests.post(url, json=data)

print(response.status_code)  # e.g., 201 Created
print(response.json())       # The server's response
```

---

### Example: POST Request Using curl

```bash
curl -X POST https://api.example.com/users \
     -H "Content-Type: application/json" \
     -d '{"name": "Mayuresh", "email": "mayu@example.com"}'
```

Explanation:

* `-X POST`: explicitly tells curl to make a POST request
* `-H`: adds a header (`Content-Type: application/json`)
* `-d`: the data to send in the body of the request

---

## RESTful APIs – Explained Simply

A **RESTful API** follows a design pattern that makes APIs consistent and easy to understand. REST stands for **REpresentational State Transfer**.

### REST principles:

* Each piece of data has its own URL (called a **resource**)
* You use standard HTTP methods (`GET`, `POST`, etc.) to interact with resources
* The server does not store information about previous requests (it is **stateless**)
* Data is usually returned in structured formats like JSON

---

### Example: REST API Endpoints

| Action              | URL        | Method |
| ------------------- | ---------- | ------ |
| Get all users       | `/users`   | GET    |
| Get a specific user | `/users/7` | GET    |
| Create a new user   | `/users`   | POST   |
| Update a user       | `/users/7` | PUT    |
| Delete a user       | `/users/7` | DELETE |

REST APIs are popular because they are easy to work with using tools like `requests`, `curl`, Postman, and many others.

---

## Common Data Formats Returned by APIs

| Format | Description                          |
| ------ | ------------------------------------ |
| JSON   | Easy to read, widely used            |
| XML    | Verbose format, used in older APIs   |
| CSV    | Tabular, useful for spreadsheets     |
| YAML   | Used for configuration-heavy systems |

---

## Real-Life Analogy: Restaurant and API

| Real-World Example | API Equivalent                        |
| ------------------ | ------------------------------------- |
| You (customer)     | Client (browser, script, etc.)        |
| Menu               | API documentation                     |
| Order              | HTTP request                          |
| Waiter             | HTTP client (requests, curl, Postman) |
| Kitchen            | API server                            |
| Served meal        | HTTP response (JSON, XML, etc.)       |

---

## Common Use Cases of APIs

* **Weather Data** — Real-time info from OpenWeatherMap or WeatherAPI
* **Financial Data** — Stocks, crypto prices, and financial news
* **Social Media Bots** — Fetching or posting on Twitter, Reddit, etc.
* **Location and Maps** — Directions, distances, coordinates (e.g., Google Maps)
* **AI/ML Services** — Text generation, classification, embeddings (e.g., OpenAI, HuggingFace)

---

## Key Concepts When Working with APIs

| Concept        | Meaning                                               |
| -------------- | ----------------------------------------------------- |
| Endpoint       | The URL you send your request to                      |
| Method         | HTTP action type (GET, POST, PUT, DELETE)             |
| Parameters     | Extra info sent with the request (in URL or body)     |
| Headers        | Metadata like `Content-Type`, API key, etc.           |
| Status Codes   | Response codes like 200 (OK), 404 (Not Found), etc.   |
| Authentication | API key or token required for access                  |
| Rate Limiting  | Limits the number of allowed requests per time period |
| Pagination     | Breaks large responses into smaller "pages"           |

---

## Tools for Working with APIs

| Tool/Library | Purpose                                  |
| ------------ | ---------------------------------------- |
| `requests`   | Send API calls in Python                 |
| `httpx`      | Async-capable HTTP client in Python      |
| `curl`       | Command-line HTTP requests               |
| `Postman`    | GUI to explore and test APIs             |
| `axios`      | JavaScript HTTP client (Node.js/browser) |

---

## APIs vs Web Scraping – Comparison

| Feature        | APIs                          | Web Scraping                         |
| -------------- | ----------------------------- | ------------------------------------ |
| Data Format    | Structured (JSON, XML)        | Unstructured (HTML)                  |
| Stability      | Stable if documented          | Breaks if the page structure changes |
| Legal Clarity  | Usually legal (if public API) | May violate terms of service         |
| Performance    | Fast and clean                | Slower and fragile                   |
| Preferred When | An API is available           | No API exists but data is needed     |

---

## Why APIs Matter in AI and Data Science

APIs are essential building blocks of modern data workflows. They allow us to:

* Fetch live training or analysis data
* Build real-time dashboards and pipelines
* Serve machine learning models as APIs
* Use services like OpenAI or HuggingFace
* Work with vector databases (Pinecone, Weaviate, Qdrant) for advanced search and retrieval

---

## When to Use an API

* The data is offered through a public or official API
* You want consistent, structured, and reliable data
* You’re building scalable, modular systems
* You want to avoid the legal or technical risks of scraping

---

## What’s Next?

Try using one of these public APIs:

* [GitHub REST API](https://docs.github.com/en/rest)
* [OpenWeatherMap](https://openweathermap.org/api)
* [NewsAPI](https://newsapi.org/)
* [Reqres (Fake REST API for Testing)](https://reqres.in)
* [httpbin (Echo testing)](https://httpbin.org/)

Then explore them using:

* `curl` on the command line
* `requests` in Python
* Postman for visual exploration

```


```
