## Azure Data Factory - Web Activity
This notebook explains how to use **Web Activity** in Azure Data Factory (ADF) to call REST APIs with practical examples.

What is Web Activity in ADF?
- Web Activity is used to call a custom REST API from Azure Data Factory.
- It allows you to send and receive data via HTTP requests such as GET, POST, PUT, DELETE.
- You can also pass datasets and linked services as JSON payloads.

Common Use Cases
- Triggering REST APIs to retrieve or send data.
- Interacting with services that expose a REST endpoint.
- Calling APIs for third-party integrations.

Example API
- Website: https://dummy.restapiexample.com/, https://jsonplaceholder.typicode.com/users
- Endpoints:
  - GET `/employee` – Get all employee data.
  - GET `/employee/{id}` – Get single employee data.
  - POST `/create` – Create a new employee record.
  - DELETE `/delete/{id}` – Delete an employee record.

Steps to Use Web Activity in ADF
1. Open Azure Data Factory.
2. Create a new pipeline.
3. Drag **Web Activity** from the General section.
4. Configure the following in **Settings**:
   - **URL**: Endpoint of the REST API (e.g., `https://dummy.restapi.example.com/api/v1/employees`)
   - **Method**: `GET`, `POST`, etc.
   - **Authentication** (if needed): Basic, MSI, Client Certificate, etc.
5. Optional: Pass Body (JSON), Headers, Datasets.
6. Publish the pipeline.
7. Trigger and monitor execution.

## Implement Incremental Uploads for REST & GraphQL APIs
What:
Instead of pulling full data dumps, configure ADF to load only new or changed records from API sources like VMSNext.

Why:
Reduces processing time, API load, cost, and improves efficiency.

Action:
- Identify fields like lastModifiedDate, createdAt, or cursors
- Store checkpoint values in a parameter table
- Configure ADF pipelines to read only delta changes since last run

✅ OVERVIEW: Incremental Load Flow (REST or GraphQL API)
- Read Last Timestamp from a checkpoint table
- Call API with filter using lastModified or createdAt
- Copy data to staging layer
- Update checkpoint with latest max(modified_date) value

1. 🔒 Checkpoint Table (Azure SQL / Snowflake / etc.)

``` sql
CREATE TABLE api_checkpoint (
    source_name VARCHAR(100) PRIMARY KEY,
    last_run_time DATETIME
);

-- Insert initial value
INSERT INTO api_checkpoint (source_name, last_run_time)
VALUES ('vmsnext_api', '2024-01-01T00:00:00Z');

```

2. 📦 ADF Pipeline Flow
Activities:
- Lookup: Get last_run_time
- Web: Call REST/GraphQL with dynamic timestamp
- Copy: Load to raw zone
- Stored Procedure: Update checkpoint after success

🔍 2.1 Lookup Activity(lkup_lastdate) – Fetch last_run_time

``` sql
SELECT last_run_time FROM api_checkpoint WHERE source_name = 'vmsnext_api';
```

2.2 Set Variable – Store lastRunTime
Name: lastRunTime

Output: @activity('lkup_lastdate').output.firstRow.last_run_time

Note: debug the lookup and set variable


2.3 REST Dataset with Dynamic Parameter
Create a parameterized REST dataset:
``` json
{
  "type": "RestResource",
  "typeProperties": {
    "relativeUrl": "records?modifiedAfter=@{pipeline().parameters.lastRunTime}"
  }
}

```
Set lastRunTime as a pipeline parameter and pass in the dynamic timestamp from the Lookup.

2.4 GraphQL Example (Using Web Activity)
GraphQL doesn't use URL params, so use a POST body:

Request body (Web Activity):
``` json
{
  "query": "query { getUpdatedRecords(modifiedAfter: \"@{activity('Get Last Run Time').output.firstRow.last_run_time}\") { id name updatedAt } }"
}

```

 2.5 Copy Activity – Load into Staging Table
Map the output of the REST or Web activity to:
- Blob / ADLS Gen2 (raw layer)
- or direct into Snowflake/Azure SQL via staging

Example mapping (if JSON):

``` json
"format": {
    "type": "JsonFormat",
    "filePattern": "setOfObjects"
}

```

2.6 Stored Procedure – Update Checkpoint
```json
CREATE PROCEDURE update_checkpoint
    @source_name VARCHAR(100),
    @new_timestamp DATETIME
AS
BEGIN
    UPDATE api_checkpoint
    SET last_run_time = @new_timestamp
    WHERE source_name = @source_name;
END

```
In ADF:
Pass @{activity('Copy Activity').output.maxModifiedDate} (or compute it via Data Flow or post-load script).

Azure Data Factory incremental data uploads without timestamps in source table