# IPARO Implementation using IPFS

This notebook demonstrates a simple implementation of InterPlanetary Archival Record Objects (IPAROs) using IPFS. We'll create, store, and link IPAROs, and explore how to retrieve and navigate between them.

## Prerequisites

1. Install IPFS and ensure it's running on your system with:

   ```bash
   ipfs daemon
2. Install the `requests` library for Python to interact with the IPFS HTTP API.

   ```bash
   pip install requests

In [25]:
import json
import requests
from datetime import datetime, UTC

# IPFS API URL
ipfs_api_url = 'http://127.0.0.1:5001/api/v0'

## Step 2: Define Functions for Creating and Storing IPAROs

We will define functions to create IPAROs, add them to IPFS, and update links between them.

In [26]:
def create_iparo(content, prev_cids=None, next_cid=None):
    """
    Create an IPARO with the given content and links to previous and next IPAROs.

    Args:
        content (str): The content of the IPARO.
        prev_cids (list): List of CIDs of the previous IPAROs.
        next_cid (str): CID of the next IPARO.

    Returns:
        dict: The created IPARO.
    """
    iparo = {
        'content': content,
        'prev_cids': prev_cids or [],
        'next_cid': next_cid,
        'timestamp': datetime.now(UTC).isoformat()
    }
    return iparo


def add_to_ipfs(iparo):
    """
    Add the given IPARO to IPFS and return its CID.

    Args:
        iparo (dict): The IPARO to add to IPFS.

    Returns:
        str: The CID of the added IPARO.
    """
    iparo_json = json.dumps(iparo)
    response = requests.post(f'{ipfs_api_url}/add', files={'file': iparo_json})
    cid = response.json()['Hash']
    return cid

## Step 3: Create Functions for Different Linkages

### 3.1: Create a Chain with Each Node Linking to All Preceding Nodes

In [27]:
def create_chain_all_preceding(num_nodes):
    """
    Create a chain of IPAROs where each node links to all preceding nodes.

    Args:
        num_nodes (int): The number of nodes to create.

    Returns:
        list: A list of CIDs for the created IPAROs.
    """
    cids = []
    for i in range(num_nodes):
        content = f"Node {i + 1}"
        prev_cids = cids.copy() if i > 0 else []
        iparo = create_iparo(content, prev_cids=prev_cids)
        cid = add_to_ipfs(iparo)
        cids.append(cid)
    return cids


# Example usage
cids_all_preceding = create_chain_all_preceding(5)
print("CIDs (All Preceding):", cids_all_preceding)

CIDs (All Preceding): ['QmWPyMVLc8LAUGHXaXYGxyYaezC1bfZZGbSXomj85YQrGT', 'QmeEGML2bP8aVe663vjWgid3qNq12CLEbdvdWQkED55gE7', 'QmNa2ShZSo9zWAfwt81e69mviH5BUoYZrUS7GWuymHQ9xf', 'QmYkfoJc8Z3uxXMNSXZfXg61qBKro3KHdc117xcYKvj8FG', 'Qmaf2BYc9XNXyhyxHc86d26nQJvi2rXGtojG1qTvZ1PwKJ']


### 3.2: Create a Chain with Each Node Linking Only to the Prior Node


In [28]:
def create_chain_prior_node(num_nodes):
    """
    Create a chain of IPAROs where each node links only to the prior node.

    Args:
        num_nodes (int): The number of nodes to create.

    Returns:
        list: A list of CIDs for the created IPAROs.
    """
    cids = []
    prev_cid = None
    for i in range(num_nodes):
        content = f"Node {i + 1}"
        prev_cids = [prev_cid] if prev_cid else []
        iparo = create_iparo(content, prev_cids=prev_cids)
        cid = add_to_ipfs(iparo)
        cids.append(cid)
        prev_cid = cid
    return cids


# Example usage
cids_prior_node = create_chain_prior_node(5)
print("CIDs (Prior Node):", cids_prior_node)

CIDs (Prior Node): ['QmR2kJnBGF3wx7fcvgf3cLbvmNtCLwcrDYLYLV31Kbqb9F', 'QmWCNpeCE4w8ssuJ1RrcZhaZYoVoF6mABrbpLFkysXQg67', 'QmS6dpo7ugLDYHkWNxjj3eGg9hZp5DPxYBxyu1DsAQNLz7', 'QmXPC8MDvE5hP1Vx5NnBx2TT9Zgnf7dbgkWhRo5oMuVipu', 'QmeU7VstgBhrRAYCrXXZweg1MBFkFzZCVtxBuWNG1d9AU8']


In [29]:
def get_iparo(cid):
    """
    Retrieve an IPARO from IPFS using its CID.

    Args:
        cid (str): The CID of the IPARO to retrieve.

    Returns:
        dict: The retrieved IPARO.
    """
    response = requests.post(f'{ipfs_api_url}/cat?arg={cid}')
    iparo_json = response.content.decode('utf-8')
    iparo = json.loads(iparo_json)
    return iparo


# Retrieve and print IPARO 1
retrieved_iparo1 = get_iparo(cid1_updated)
print("Retrieved IPARO 1:", json.dumps(retrieved_iparo1, indent=2), "\n")

# Retrieve and print IPARO 2
retrieved_iparo2 = get_iparo(retrieved_iparo1['next_cid'])
print("Retrieved IPARO 2:", json.dumps(retrieved_iparo2, indent=2))

Retrieved IPARO 1: {
  "content": "Initial content",
  "prev_cids": [],
  "next_cid": "QmTMA9b9c4FFnZ13zvQykb23WLyBJB6im2EyC9JLauB4KD",
  "timestamp": "2024-07-20T13:22:40.773936+00:00"
} 

Retrieved IPARO 2: {
  "content": "Second version",
  "prev_cids": "QmXL9MUTbDKspA8CwSXTP8nERQ4gZFfXmXNarngfbtGwr5",
  "next_cid": null,
  "timestamp": "2024-07-20T13:22:40.789843+00:00"
}


## Summary

This notebook demonstrates a foundational implementation of InterPlanetary Archival Record Objects (IPAROs) using IPFS. We created, stored, linked, and retrieved IPAROs, providing a basic understanding of how to work with decentralized web archiving objects.

### Implemented Features:

1. **IPARO Creation**: Defined a function to create IPAROs with content, links to previous and next IPAROs, and a timestamp.
2. **Storing IPAROs in IPFS**: Developed a function to add IPAROs to IPFS and retrieve their unique content identifiers (CIDs).
3. **Linkage Methods**:
   - **All Preceding Nodes**: Implemented a function to create a chain where each IPARO links to all preceding IPAROs.
   - **Prior Node Only**: Implemented a function to create a chain where each IPARO links only to the immediately preceding IPARO.
4. **Retrieving and Navigating IPAROs**: Demonstrated how to retrieve IPAROs from IPFS and navigate between linked IPAROs.

This implementation provides a starting point for more complex linkage and version control mechanisms that can be built on top of this structure.


## Possible Metrics to Test with the IPARO Data Structure

### 1. **Storage Efficiency**

- **Data Size**: Measure the size of each IPARO stored in IPFS. Compare the storage size of the original content with the IPARO structure to understand the overhead introduced by metadata and linking information.
- **Deduplication**: Evaluate how well the data structure handles duplicate content. Since IPFS uses content-based addressing, identical content should only be stored once. Check the storage savings achieved through deduplication.

### 2. **Retrieval Time**

- **Latency**: Measure the time it takes to retrieve an IPARO from IPFS. This includes the time to fetch the IPARO from the network and decode the content.
- **Traversal Time**: For linked IPAROs, measure the time required to traverse from one IPARO to another, especially when navigating through a sequence of linked versions.

### 3. **Version Control**

- **Version Linking**: Test the efficiency of linking and retrieving multiple versions of content. Ensure that each version correctly references the previous and next versions, and measure the ease of navigating between them.
- **Conflict Resolution**: If implementing a more complex linking method (e.g., graph-based), test the system’s ability to handle conflicts and merge changes from different versions.

### 4. **Scalability**

- **Large-scale Archiving**: Evaluate the performance of the IPARO data structure with a large number of archived objects. Test how the system scales as the number of IPAROs increases, particularly in terms of storage and retrieval time.
- **Network Performance**: Measure the impact on network performance when adding and retrieving a large number of IPAROs. Assess the bandwidth usage and the system’s ability to handle network congestion.

### 5. **Data Integrity**

- **Content Verification**: Verify the integrity of the stored content by comparing the original content with the retrieved IPARO content. Check for any data corruption or loss.
- **Merkle Trees**: If using Merkle trees for linking, test the efficiency and reliability of the structure for verifying data integrity.

### 6. **Redundancy and Reliability**

- **Replication**: Test the redundancy mechanisms in IPFS to ensure that IPAROs remain accessible even if some nodes in the network go down. Measure the replication factor and the reliability of data retrieval under different network conditions.
- **Pinning**: Evaluate the effectiveness of pinning IPAROs to ensure they are retained in the IPFS network. Measure the success rate of retrieving pinned IPAROs over time.

### 7. **Access Control**

- **Permission Management**: If implementing access control mechanisms, test the system’s ability to manage permissions for different users. Ensure that only authorized users can access, modify, or delete IPAROs.
- **Audit Logs**: Evaluate the system’s ability to maintain audit logs of access and modifications to IPAROs. Measure the efficiency of tracking changes and retrieving audit logs.
