# IPARO Implementation using IPFS

This notebook demonstrates a simple implementation of InterPlanetary Archival Record Objects (IPAROs) using IPFS. We'll create, store, and link IPAROs, and explore how to retrieve and navigate between them.

## Prerequisites

1. Install IPFS and ensure it's running on your system with:

   ```bash
   ipfs daemon
2. Install the `requests` library for Python to interact with the IPFS HTTP API.

   ```bash
   pip install requests

In [1]:
import json
import requests
from datetime import datetime

# IPFS API URL
ipfs_api_url = 'http://127.0.0.1:5001/api/v0'

## Step 2: Define Functions for Creating and Storing IPAROs

We will define functions to create IPAROs, add them to IPFS, and update links between them.

In [2]:
def create_iparo(content, prev_cid=None, next_cid=None):
    """
    Create an IPARO with the given content and links to previous and next IPAROs.

    Args:
        content (str): The content of the IPARO (WARC files which we're simplifyng into a string).
        prev_cid (str): CID of the previous IPARO.
        next_cid (str): CID of the next IPARO.

    Returns:
        dict: The created IPARO.
    """
    iparo = {
        'content': content,
        'prev_cid': prev_cid,
        'next_cid': next_cid,
        'timestamp': datetime.utcnow().isoformat()
    }
    return iparo


def add_to_ipfs(iparo):
    """
    Add the given IPARO to IPFS and return its CID.

    Args:
        iparo (dict): The IPARO to add to IPFS.

    Returns:
        str: The CID of the added IPARO.
    """
    iparo_json = json.dumps(iparo)
    response = requests.post(f'{ipfs_api_url}/add', files={'file': iparo_json})
    cid = response.json()['Hash']
    return cid

## Step 3: Create and Link IPAROs

We will create a few IPAROs and link them in a singly linked list.

In [3]:
# Create initial IPARO
iparo1 = create_iparo('Initial content')
cid1 = add_to_ipfs(iparo1)

# Create second IPARO and link to the first
iparo2 = create_iparo('Second version', prev_cid=cid1)
cid2 = add_to_ipfs(iparo2)

# Update first IPARO to include link to the second
iparo1['next_cid'] = cid2
cid1_updated = add_to_ipfs(iparo1)

# Print CIDs
print(f"CID of IPARO 1: {cid1}")
print(f"CID of IPARO 2: {cid2}")
print(f"Updated CID of IPARO 1: {cid1_updated}")

CID of IPARO 1: QmfE4qtpdmB3cnmSowdwiFvK3Lrot9TzLMaeSGThYhcs5o
CID of IPARO 2: Qme3cECUhX4p5JUDHtttaNEzK8bcigpTrrRgQ95r1sadp6
Updated CID of IPARO 1: QmVWVtrGP2p9skf7RqR7Z1qV2dox6EdkZRvkLGvDXUmhvS


  'timestamp': datetime.utcnow().isoformat()


## Step 4: Retrieve and Navigate IPAROs

We will define a function to retrieve IPAROs from IPFS and demonstrate how to navigate between them.


In [4]:
def get_iparo(cid):
    """
    Retrieve an IPARO from IPFS using its CID.

    Args:
        cid (str): The CID of the IPARO to retrieve.

    Returns:
        dict: The retrieved IPARO.
    """
    response = requests.post(f'{ipfs_api_url}/cat?arg={cid}')
    iparo_json = response.content.decode('utf-8')
    iparo = json.loads(iparo_json)
    return iparo


# Retrieve and print IPARO 1
retrieved_iparo1 = get_iparo(cid1_updated)
print("Retrieved IPARO 1:", json.dumps(retrieved_iparo1, indent=2), "\n")

# Retrieve and print IPARO 2
retrieved_iparo2 = get_iparo(retrieved_iparo1['next_cid'])
print("Retrieved IPARO 2:", json.dumps(retrieved_iparo2, indent=2))

Retrieved IPARO 1: {
  "content": "Initial content",
  "prev_cid": null,
  "next_cid": "Qme3cECUhX4p5JUDHtttaNEzK8bcigpTrrRgQ95r1sadp6",
  "timestamp": "2024-07-13T15:02:09.183282"
} 

Retrieved IPARO 2: {
  "content": "Second version",
  "prev_cid": "QmfE4qtpdmB3cnmSowdwiFvK3Lrot9TzLMaeSGThYhcs5o",
  "next_cid": null,
  "timestamp": "2024-07-13T15:02:09.242960"
}


## Summary

This notebook demonstrates a very basic implementation of IPAROs as a doubly linked list using IPFS. We created, stored, linked, and retrieved IPAROs, providing a foundational understanding of how to work with decentralized web archiving objects. Below are some possible metrics that can be used to test implementations of IPAROs with explainations


## Possible Metrics to Test with the IPARO Data Structure

### 1. **Storage Efficiency**

- **Data Size**: Measure the size of each IPARO stored in IPFS. Compare the storage size of the original content with the IPARO structure to understand the overhead introduced by metadata and linking information.
- **Deduplication**: Evaluate how well the data structure handles duplicate content. Since IPFS uses content-based addressing, identical content should only be stored once. Check the storage savings achieved through deduplication.

### 2. **Retrieval Time**

- **Latency**: Measure the time it takes to retrieve an IPARO from IPFS. This includes the time to fetch the IPARO from the network and decode the content.
- **Traversal Time**: For linked IPAROs, measure the time required to traverse from one IPARO to another, especially when navigating through a sequence of linked versions.

### 3. **Version Control**

- **Version Linking**: Test the efficiency of linking and retrieving multiple versions of content. Ensure that each version correctly references the previous and next versions, and measure the ease of navigating between them.
- **Conflict Resolution**: If implementing a more complex linking method (e.g., graph-based), test the system’s ability to handle conflicts and merge changes from different versions.

### 4. **Scalability**

- **Large-scale Archiving**: Evaluate the performance of the IPARO data structure with a large number of archived objects. Test how the system scales as the number of IPAROs increases, particularly in terms of storage and retrieval time.
- **Network Performance**: Measure the impact on network performance when adding and retrieving a large number of IPAROs. Assess the bandwidth usage and the system’s ability to handle network congestion.

### 5. **Data Integrity**

- **Content Verification**: Verify the integrity of the stored content by comparing the original content with the retrieved IPARO content. Check for any data corruption or loss.
- **Merkle Trees**: If using Merkle trees for linking, test the efficiency and reliability of the structure for verifying data integrity.

### 6. **Redundancy and Reliability**

- **Replication**: Test the redundancy mechanisms in IPFS to ensure that IPAROs remain accessible even if some nodes in the network go down. Measure the replication factor and the reliability of data retrieval under different network conditions.
- **Pinning**: Evaluate the effectiveness of pinning IPAROs to ensure they are retained in the IPFS network. Measure the success rate of retrieving pinned IPAROs over time.

### 7. **Access Control**

- **Permission Management**: If implementing access control mechanisms, test the system’s ability to manage permissions for different users. Ensure that only authorized users can access, modify, or delete IPAROs.
- **Audit Logs**: Evaluate the system’s ability to maintain audit logs of access and modifications to IPAROs. Measure the efficiency of tracking changes and retrieving audit logs.
