# 10 minutes for a Point of Presence (v0.4.0)

**Note:** You can run this notebook online [here](https://drive.google.com/file/d/1ZahNm4Jrf6pOzj80VkuB9vogRKQKGv8a/view?usp=sharing).

The **pointofpresence** library (Point of Presence) is a Python client designed to simplify interactions with a `POP API`. Concretely, this library handles the HTTP requests (GET, POST, PUT, DELETE) to the POP API and formats the results to make them more understandable for the user. This way, you can focus on working with the data directly without worrying about low-level API details.

In this tutorial, we will cover the following points:

1. [**Setting Up**](#1-setting-up): How to install and configure the `pointofpresence` library.
2. [**Initializing the Client**](#2-initializing-the-client): Creating and authenticating the client to connect with the POP API.
3. [**Managing Organizations**](#3-managing-organizations): An organization is how the POP API groups different resources. In this section, we will learn how to add and delete organizations.

   3.1. [**Registering an Organization**](#31-registering-an-organization)

   3.2. [**Listing Organizations**](#32-listing-organizations)

   3.3. [**Deleting an Organization**](#33-deleting-an-organization)

4. [**Working with Resources**](#4-working-with-resources): Creating, updating, and deleting various types of resources.
   
   4.1 [**Working with Kafka Topics**](#41-working-with-kafka-topics): Managing resources specific to Kafka topics.

   4.2 [**Working with S3 Resources**](#42-working-with-s3-resources): Managing resources stored in S3.

   4.3 [**Working with URLs**](#43-working-with-urls): Managing URL-based resources.

   4.4. [**Searching Resources**](#44-searching-resources)

   4.5. [**Deleting Resources**](#45-deleting-resources)

   4.6. [**Working with Kafka Connection Details**](#46-working-with-kafka-connection-details)


## 1. Setting Up

To start using the `pointofpresence` library in Google Colab, follow these steps:

### Step 1: Install the Library

If you haven't already installed `pointofpresence`, you can do so via pip. Run the following command in your terminal:

```bash
pip install pointofpresence
```

Or, run here the following cell:

In [None]:
!pip install pointofpresence

Or, if you're working in a virtual environment and have cloned the repository, you can install it by running the following command in the root folder:

```bash
pip install -e .
```

### Step 2: Enter API Credentials

To configure the client, you need either:
1. The base URL where the POP API is hosted, and valid username and password for authentication, **or**
2. A valid token for authentication.

#### Option 1: Username and Password Authentication

Use the following code to configure the client using your username and password:

In [1]:
from getpass import getpass

# Prompt the user for API credentials
api_base_url = input("Enter the POP API base URL (include http:// or https://): ")
api_username = input("Enter your POP API username: ")
api_password = getpass("Enter your POP API password: ")
api_token=None

#### Option 2: Token-Based Authentication

If you already have a token, you can use it directly instead of providing a username and password:

In [1]:
# Prompt the user for API credentials
api_base_url = input("Enter the POP API base URL (include http:// or https://): ")
api_token = input("Enter your POP API token: ")
api_username=None
api_password=None

## 2. Initializing the Client

Now that you have the required credentials, you can import and configure the `pointofpresence` client:

In [2]:
from pointofpresence import APIClient

# Initialize the API client
client = APIClient(base_url=api_base_url,
                   username=api_username,
                   password=api_password,
                   token=api_token)

Your client is now configured and ready to interact with the POP API. In the next sections, we will explore how to use this client to manage organizations, resources, and more.

## 3. Managing Organizations

In the POP API, organizations are logical groupings that help categorize and manage resources. You can think of an organization as a directory, and a resources as a file within it.

### 3.1. Registering an Organization

You can create an organization using the `register_organization` method. This method requires a dictionary as input, with the following fields:

- **name**: A unique identifier for the organization (must be distinct from existing organizations).
- **title**: The display name of the organization, which will appear in user interfaces.
- **description**: An optional field to describe the purpose or nature of the organization.

Make sure to include these fields in your dictionary when registering a new organization.

#### Example: Registering an Organization

In [None]:
organization_data = {
    "name": "example_org",  # Unique identifier for the organization
    "title": "Example Organization",  # Display name for the organization
    "description": "This organization groups datasets for project X."
}

# Call the register_organization method to create the organization
try:
    response = client.register_organization(organization_data)
    print("Organization registered successfully with ID:", response["id"])
except ValueError as e:
    print("Failed to register organization.")
    print(e)

### 3.2. Listing Organizations

The `list_organizations` method allows you to retrieve the name of registered organizations in the POP API. This is useful for quickly identifying the available organizations.

The method accepts an optional filtering parameter:

- **name** (optional): A partial string to filter the organization names. The filtering is case-insensitive.


#### Example: Listing All Organizations

In [None]:
# List all organizations without filtering
try:
    all_organizations = client.list_organizations()
    print("All organizations:", all_organizations)
except ValueError as e:
    print("Failed to retrieve organizations.")
    print(e)

#### Example: Listing Organizations with a Filter

In [None]:
# List organizations with a filter by name
try:
    filtered_organizations = client.list_organizations(name="example")
    print("Filtered organizations:", filtered_organizations)
except ValueError as e:
    print("Failed to retrieve filtered organizations.")
    print(e)

### 3.3. Deleting an Organization

You can delete an existing organization using the `delete_organization` method. This method will remove an organization and **all associated resources**.

#### Example: Deleting an Organization

Here's an example of how to delete an organization. In this example, we specify the name of the organization we want to delete:

In [None]:
# Specify the name of the organization to delete
organization_name = "example_org"

# Call the delete_organization method to remove the organization
try:
    response = client.delete_organization(organization_name)
    print("Organization deleted successfully:", response["message"])
except ValueError as e:
    print("Failed to delete organization.")
    print(e)

## 4. Working with Resources

In the POP API, a resource is a link to a specific data source along with a set of metadata associated with that source. Currently, you can register three types of resources:

- **Kafka topics**: For managing data streams within Kafka.
- **S3 repository links**: For referencing data stored in Amazon S3 or similar storage services.
- **URL links to other data sources**: For linking to external data sources through a standard URL.

Each resource type allows you to define the location and metadata of a specific dataset, making it easier to organize and manage data access.


### 4.1 Working with Kafka Topics

The `register_kafka_topic` method requires a dictionary as an argument, which must contain the following fields:

- **dataset_name**: A unique name for the dataset you are updating.
- **dataset_title**: The title for the dataset.
- **owner_org**: The name of the organization to which the dataset belongs.
- **kafka_topic**: The name of the Kafka topic.
- **kafka_host**: The host of the Kafka server.
- **kafka_port**: The port of the Kafka server.
- **dataset_description** (optional): A description of the dataset.
- **extras** (optional): Additional metadata in key-value format to include with the dataset.
- **mapping** (optional): Mapping information for selecting and renaming fields to send.
- **processing** (optional): Information on how to process the dataset.

Make sure to include the required fields in your dictionary when calling this method to successfully update a Kafka topic.

#### Example: Registering a Kafka Topic

Here's an example of how to register a Kafka topic with all the required and optional fields:

In [None]:
# Define the payload data for the Kafka topic registration
data = {
    "dataset_name": "example_kafka_dataset_3",
    "dataset_title": "Example Kafka Dataset",
    "owner_org": "example_org",
    "kafka_topic": "example_topic",
    "kafka_host": "kafka_host",
    "kafka_port": "9092",
    "dataset_description": "This is a sample Kafka dataset.",
    "extras": {"key1": "value1", "key2": "value2"},
    "mapping": {"field1": "mapped_field1", "field2": "mapped_field2"},
    "processing": {"data_key": "data", "info_key": "info"}
}

# Call the register_kafka_topic method to add the Kafka topic
try:
    response = client.register_kafka_topic(data)
    print("Kafka topic registered successfully with ID:", response["id"])
except ValueError as e:
    print("Failed to register Kafka topic.")
    print(f"{e}.")

You can update an existing Kafka topic by using the `update_kafka_topic` method. The fields that can be updated are the following:

- **dataset_name** (optional): The unique name of the dataset.
- **dataset_title** (optional): The title of the dataset.
- **owner_org** (optional): The ID of the organization that owns the dataset.
- **kafka_topic** (optional): The name of the Kafka topic.
- **kafka_host** (optional): The host of the Kafka server.
- **kafka_port** (optional): The port of the Kafka server.
- **dataset_description** (optional): A description of the dataset.
- **extras** (optional): Additional metadata in key-value format.
- **mapping** (optional): Mapping information for structuring the dataset.
- **processing** (optional): Processing details for the dataset.

### Example: Updating a Kafka Topic

Here's an example of how to update an existing Kafka topic. In this example, we modify the `dataset_name` and `kafka_topic` fields:

In [None]:
# Define the payload data for updating the Kafka topic
update_data = {
    "dataset_name": "example_kafka_dataset_3",
    "kafka_topic": "example_topic_updated"
}

# Specify the dataset ID of the Kafka topic to update
dataset_id = "64d34019-21ef-41ac-82c7-91d86fbc5a8c"

# Call the update_kafka_topic method to modify the Kafka topic
try:
    response = client.update_kafka_topic(dataset_id, update_data)
    print("Kafka topic updated successfully:", response["message"])
except ValueError as e:
    print("Failed to update Kafka topic.")
    print(e)

### 4.2 Working with S3 Resources

An S3 resource represents a reference or link to data stored in an S3 (Simple Storage Service) bucket.

You can register an S3 resource using the `register_s3_link` method. When registering an S3 resource, ensure the following fields are included in input dictionary argument:

- **resource_name**: A unique identifier for the S3 resource.
- **resource_title**: The title of the S3 resource.
- **owner_org**: The ID of the organization to which the resource belongs.
- **resource_s3**: The S3 URL of the resource (e.g., `"s3://bucket-name/object-key"`).
- **notes** (optional): Additional notes describing the resource.
- **extras** (optional): Additional metadata in key-value format associated with the resource.

#### Example: Registering an S3 Resource Link

Here’s an example of how to register an S3 resource link with both required and optional fields:

In [None]:
# Define the payload data for the S3 resource link registration
data = {
    "resource_name": "example_s3_resource_2",
    "resource_title": "Example S3 Resource",
    "owner_org": "example_org",
    "resource_s3": "s3://bucket-name/object-key",
    "notes": "This is a sample S3 resource.",
    "extras": {"key1": "value1", "key2": "value2"}
}

# Call the register_s3_link method to add the S3 resource link
try:
    response = client.register_s3_link(data)
    print("S3 resource link registered successfully with ID:", response["id"])
except ValueError as e:
    print("Failed to register S3 resource link.")
    print(e)

You can update an existing S3 resource using the `update_s3_resource` method providing any of the following fields in the input dict. These fields will overwrite the existing values for the specified resource:

- **resource_name** (optional): The unique name of the resource.
- **resource_title** (optional): The title of the resource.
- **owner_org** (optional): The ID of the organization that owns the resource.
- **resource_s3** (optional): The S3 URL of the resource.
- **notes** (optional): Additional notes about the resource.
- **extras** (optional): Additional metadata as key-value pairs.

### Example: Updating an S3 Resource

Here's an example of how to update an existing S3 resource. In this example, we modify the `resource_name` and `resource_s3` fields:


In [None]:
# Define the payload data for updating the S3 resource
update_data = {
    "resource_name": "updated_resource_name",
    "resource_s3": "http://new-s3-url.com/resource"
}

# Specify the resource ID of the S3 resource to update
resource_id = "e6d2c2a2-dcd7-4780-8385-1ab436798578"

# Call the update_s3_resource method to modify the S3 resource
try:
    response = client.update_s3_resource(resource_id, update_data)
    print("S3 resource updated successfully:", response["message"])
except ValueError as e:
    print("Failed to update S3 resource.")
    print(e)

## 4.3 Working with URLs

You can register a URL resource by using the `register_url` method. The input data dictionary must contain the following fields:

- **resource_name**: A unique name for the resource you are creating.
- **resource_title**: The title for the resource.
- **owner_org**: The ID of the organization to which the resource belongs.
- **resource_url**: The URL of the resource.
- **file_type**: The type of the file (`stream`, `CSV`, `TXT`, `JSON`, `NetCDF`).
- **notes** (optional): Additional notes about the resource.
- **extras** (optional): Additional metadata as key-value pairs to include with the resource package.
- **mapping** (optional): Mapping information for structuring and renaming fields.
- **processing** (optional): Processing details specific to the selected file type.

### File Type-Specific Processing Information (Optional)

For the `processing` field, ensure the structure matches the selected `file_type`. Here’s a breakdown for each type:

1. **Stream**:
    ```json
    "processing": {
        "refresh_rate": "5 seconds",
        "data_key": "results"
    }
    ```

2. **CSV**:
    ```json
    "processing": {
        "delimiter": ",",
        "header_line": 1,
        "start_line": 2,
        "comment_char": "#"
    }
    ```

3. **TXT**:
    ```json
    "processing": {
        "delimiter": "\t",
        "header_line": 1,
        "start_line": 2
    }
    ```

4. **JSON**:
    ```json
    "processing": {
        "info_key": "count",
        "additional_key": "metadata",
        "data_key": "results"
    }
    ```

5. **NetCDF**:
    ```json
    "processing": {
        "group": "group_name"
    }
    ```

### Example: Registering a URL Resource

Here’s an example showing how to register a URL resource using the `register_url` method, along with the required fields and optional metadata:

In [None]:
# Define the payload data for the URL resource registration
data = {
    "resource_name": "example_url_resource_3",
    "resource_title": "Example URL Resource",
    "owner_org": "example_org",
    "resource_url": "http://example.com/data.csv",
    "file_type": "CSV",
    "notes": "This is a sample URL resource.",
    "extras": {"extra_source": "external", "extra_author": "data_provider"},
    "mapping": {"field1": "mapped_field1", "field2": "mapped_field2"},
    "processing": {
        "delimiter": ",",
        "header_line": 1,
        "start_line": 2,
        "comment_char": "#"
    }
}

# Call the register_url method to add the URL resource
try:
    response = client.register_url(data)
    print("URL resource registered successfully with ID:", response["id"])
except ValueError as e:
    print("Failed to register URL resource.")
    print(e)

You can update an existing URL resource using the `update_url_resource` method providing any of the following fields in the input data dictionary:

- **resource_name** (optional): The unique name of the resource.
- **resource_title** (optional): The title of the resource.
- **owner_org** (optional): The ID of the organization that owns the resource.
- **resource_url** (optional): The URL of the resource.
- **file_type** (optional): The type of the file (`stream`, `CSV`, `TXT`, `JSON`, `NetCDF`).
- **notes** (optional): Additional notes about the resource.
- **extras** (optional): Additional metadata as key-value pairs.
- **mapping** (optional): Mapping information for structuring and renaming fields.
- **processing** (optional): Processing details specific to the selected `file_type`.

### Example: Updating a URL Resource

Here's an example of how to update an existing URL resource. In this example, we modify the `resource_name`, `resource_title`, and `resource_url` fields:


In [None]:
# Define the payload data for updating the URL resource
update_data = {
    "resource_name": "example_resource_name",
    "resource_title": "Example Resource Title",
    "owner_org": "example_org_id",
    "resource_url": "http://example.com/resource",
    "file_type": "CSV",
    "notes": "Additional notes about the resource.",
    "extras": {"key1": "value1", "key2": "value2"},
    "mapping": {"field1": "mapping1", "field2": "mapping2"},
    "processing": {
        "delimiter": ",", "header_line": 1, "start_line": 2,
        "comment_char": "#"
    }
}

# Specify the resource ID of the URL resource to update
resource_id = "12345678-abcd-efgh-ijkl-1234567890ab"

# Call the update_url_resource method to modify the URL resource
try:
    response = client.update_url_resource(resource_id, update_data)
    print("URL resource updated successfully:", response["message"])
except ValueError as e:
    print("Failed to update URL resource.")
    print(e)

## 4.4. Searching Resources

The `search_datasets` function allows users to search for datasets across local or global servers. This function supports both global searches (searching terms across all fields) and key-specific searches (searching terms within specific fields). Users can combine global and key-specific searches in a single query.

### Parameters

- **`terms`** *(List[str], required)*:  
  A list of keywords to search within dataset titles, descriptions, or other fields.  

- **`keys`** *(List[Optional[str]], optional)*:  
  An optional list of keys specifying the fields to search for each corresponding term.  
  - If `keys` is omitted, all `terms` are searched globally.  
  - If `keys[i]` is `None`, the corresponding `terms[i]` is searched globally.  
  - If `keys[i]` specifies a particular field, the corresponding `terms[i]` is searched within that field.  

- **`server`** *(Literal['local', 'global'], optional)*:  
  Specifies which server to query for the datasets. Defaults to `'local'`.

### Examples

#### Global Search (Default Behavior)
Search for terms globally across all fields.

In [None]:
# Example search terms
search_terms = ["example", "test"]

# Perform search on local server
results = client.search_datasets(terms=search_terms, server="local")

# Display results
print("Search Results:", results)

#### Key-Specific Search
Search for terms within specific fields using the `keys` parameter.

In [None]:
# Example search terms and corresponding keys
search_terms = ["example", "temperature"]
search_keys = ["description", "extras.key1"]

# Perform key-specific search on local server
results = client.search_datasets(terms=search_terms, keys=search_keys, server="local")

# Display results
print("Search Results:", results)

#### Mixed Global and Key-Specific Search
Combine global and key-specific searches in a single query.

In [None]:
# Example with mixed global and key-specific search
search_terms = ["example", "temperature"]
search_keys = [None, "extras.key1"]  # Global search for "example", specific search for "temperature"

# Perform mixed search
results = client.search_datasets(terms=search_terms, keys=search_keys, server="local")

# Display results
print("Search Results:", results)

#### Error Handling for Invalid Input
An exception is raised if the number of terms does not match the number of keys.

In [None]:
# Mismatched terms and keys
search_terms = ["example", "temperature"]
search_keys = ["description"]  # Missing key for second term

try:
    results = client.search_datasets(terms=search_terms, keys=search_keys, server="local")
except ValueError as e:
    print(f"Error: {e}")

### Notes

1. If the `keys` parameter is not provided, the function behaves as it did previously, performing a global search for all terms.
2. The new `keys` functionality enables more precise searches, particularly useful when working with datasets that contain detailed metadata or custom fields.
3. The server parameter determines whether the search is performed on local datasets (`'local'`) or across all federated datasets (`'global'`).

## 4.5. Deleting Resources

You can delete a resource in two ways:

1. By specifying the **resource ID** with the `delete_resource_by_id` method.
2. By specifying the **resource name** with the `delete_resource_by_name` method.

These methods allow you to remove resources from CKAN either by their unique identifier (ID) or by their name, making it easier to manage your resources.

### Example: Deleting a Resource by ID

In [None]:
# Specify the ID of the resource to delete
# resource_id = "example_id"
resource_id = "0c97be99-2bf8-4577-80e4-751d304db8f9"

# Call the delete_resource_by_id method to remove the resource
try:
    response = client.delete_resource_by_id(resource_id)
    print("Resource deleted successfully:", response["message"])
except ValueError as e:
    print("Failed to delete resource by ID.")
    print(e)

### Example: Deleting a Resource by Name

In [None]:
# Specify the name of the resource to delete
resource_name = "example_url_resource_3"

# Call the delete_resource_by_name method to remove the resource
try:
    response = client.delete_resource_by_name(resource_name)
    print("Resource deleted successfully:", response["message"])
except ValueError as e:
    print("Failed to delete resource by name.")
    print(e)

## 4.6 Working with Kafka Connection Details

We can retrieve the connection details for Kafka connection. This includes the Kafka host, port, and connection status.

The following steps demonstrate how to make a request to the endpoint and interpret the response.

#### Example Code

Run the following code cell to get Kafka connection details:


In [None]:
# Import the client and necessary modules
from pointofpresence.client import APIClient

# Initialize the client (ensure proper credentials are configured)
client = APIClient(base_url="https://your-api-url", token="your-auth-token")

# Retrieve Kafka connection details
kafka_details = client.get("/status/kafka-details")
print("Kafka Connection Details:", kafka_details)


#### Expected Output

The output will display the Kafka connection details in the following format:

```json
{
    "kafka_host": "localhost",
    "kafka_port": 9092,
    "kafka_connection": true
}
```

Make sure the Kafka environment is configured correctly for the endpoint to return accurate details.