Notes from 19Aug25:
* Syntax is fine for adding new references
* Deleting references: Expression is fine, but remove the _cascade syntax so that no extra deletions occur

Can use this test: https://github.com/OpenConceptLab/oclapi2/blob/master/core/samples/sample_collection_references_with_delete.json


### Goals for this: 
* Simple conversion of a collection's references from versioned to simple extensional
* High-level method:
     * Get a list of references from the OCL collection of choice **Question: should we be pointing to an existing version, or can we simply rely on OCL's latest or HEAD version?**
     * For references that have the version number on the end, those version numbers should be stripped
         * Extract the concept URL from data.expressions
         * Remove the version number from the concept URL
         * Create a new reference object with the updated, unversioned concept URL
     * Create a new, importable list of references that will delete existing versioned references and create new unversioned extensional references in that collection

Should be able to work in these scenarios:
* Some or all references to concepts in the collection have version numbers (which should be removed)
* Some references are intensional, point to a repo version, and/or have cascade logic

### Before running this script
* Make sure to save a version of your collection so that we have a record of what was previously there
* Check the resolved versions of your collection expansion(s) so you have that baseline of what previous version of CIEL or other sources will get cleared out
    * Check that there isn't any necessary cleanup that needs to be done
* Discuss which approach to d

### Steps after this script
* Run a repository version comparison to see what has been added, removed, changed, etc.

### To Dos
* Fix the cascade parameter since that shouldn't be required if it wasn't part of the original file
* (Maybe?) Add in a parameter that might make the references sensitive to the relevant CIEL version that is associated with the concept/mapping version ID

### Future steps:
* Moderate improvements: Move from Versioned references to Repo-specific versioned references if needed (I think? I am slightly forgetting our conversation between Andy, Burke, Joe, and Jon)
* Complex improvements: Move from Versioned references to cascaded/intensional references


In [18]:
# Set Parameters, import necessary libraries and define utility functions

# Parameters
api_url = "https://api.openconceptlab.org"
owner = "/users/jamlung"
repository = "/collections/openmrs-demo"
example_url = api_url + owner + repository + "/references/?q=&page=1&limit=25&includeSearchMeta=true&verbose=true&sortDesc=_score"

import requests
import urllib.parse
import re
import json

def strip_version_from_url(url):
    pattern = r'^(.*?/sources/[^/]+)(?:/v\d+)?(.*)$'
    match = re.match(pattern, url)
    if match:
        return match.group(1) + match.group(2)
    return url


print("Parameters set:")
print(f"API URL: {api_url}")
print(f"Owner: {owner}")
print(f"Repository: {repository}")
print(f"Example URL: {example_url}")

Parameters set:
API URL: https://api.openconceptlab.org
Owner: /users/jamlung
Repository: /collections/openmrs-demo
Example URL: https://api.openconceptlab.org/users/jamlung/collections/openmrs-demo/references/?q=&page=1&limit=25&includeSearchMeta=true&verbose=true&sortDesc=_score


In [19]:
# Define the function to fetch references

def fetch_references(base_url, owner, repo, query="", limit=25):
    all_references = []
    page = 1
    
    while True:
        url = f"{base_url}{owner}{repo}/references/?q={query}&page={page}&limit={limit}&includeSearchMeta=true&verbose=true&sortDesc=_score"
        response = requests.get(url)
        
        if response.status_code != 200:
            print(f"Error fetching page {page}: Status code {response.status_code}")
            break
        
        data = response.json()
        references = data if isinstance(data, list) else data.get('references', [])
        all_references.extend(references)
        
        if len(references) < limit:
            break
        
        page += 1
    
    return all_references

# Test the function
references = fetch_references(api_url, owner, repository)

print(f"Total references fetched: {len(references)}")

# Print the first reference as an example
if references:
    print("\nExample of a fetched reference:")
    print(references[0])

Total references fetched: 240

Example of a fetched reference:
{'expression': '/users/jamlung/sources/CIEL-Example-OCL/v6/mappings/?excludeWildcard=true&excludeFuzzy=true&includeSearchMeta=true', 'reference_type': 'mappings', 'id': 7532268, 'last_resolved_at': '2024-07-19T14:30:04.528237Z', 'uri': '/users/jamlung/collections/openmrs-demo/references/7532268/', 'uuid': '7532268', 'include': True, 'type': 'CollectionReference', 'code': None, 'resource_version': None, 'namespace': None, 'system': '/users/jamlung/sources/CIEL-Example-OCL/', 'version': 'v6', 'valueset': None, 'cascade': None, 'filter': [{'op': '=', 'value': 'true', 'property': 'excludeWildcard'}, {'op': '=', 'value': 'true', 'property': 'excludeFuzzy'}, {'op': '=', 'value': 'true', 'property': 'includeSearchMeta'}], 'display': None, 'created_at': '2024-07-19T14:30:05.566932Z', 'updated_at': '2024-07-19T14:30:05.566947Z', 'concepts': 0, 'mappings': 7, 'translation': 'Include mappings from version "v6" of jamlung/CIEL-Example-

In [31]:
# Define the function to process references

import re

def is_versioned_url(url):
    # Check if the URL contains a version number after 'concepts', 'sources', or 'mappings'
    return bool(re.search(r'/(concepts|sources|mappings)/[^/]+/\d+/?$', url))

def strip_version_from_url(url):
    # Remove version number from concepts, sources, or mappings
    return re.sub(r'/(concepts|sources|mappings)/([^/]+)/\d+/?$', r'/\1/\2/', url)

def process_references(references):
    processed_references = []
    references_to_delete = []
    for ref in references:
        expression = ref.get('expression', '')
        if is_versioned_url(expression):
            # Create a reference to delete
            delete_ref = {
                "type": "Reference",
                "collection_url": f"{owner}{repository}/",
                "data": {
                    "expressions": [expression]  # Keep the versioned URL
                },
                "__cascade": {
                    "method": "sourcetoconcepts",
                    "map_types": "Q-AND-A,CONCEPT-SET",
                    "cascade_levels": "*",
                    "return_map_types": "*"
                },
                "__action": "DELETE"
            }
            references_to_delete.append(delete_ref)
        
        # Create unversioned reference
        new_ref = {
            "type": "Reference",
            "collection_url": f"{owner}{repository}/",
            "data": {
                "expressions": [strip_version_from_url(expression)]
            },
            "__cascade": {
                "method": "sourcetoconcepts",
                "map_types": "Q-AND-A,CONCEPT-SET",
                "cascade_levels": "*",
                "return_map_types": "*"
            }
        }
        processed_references.append(new_ref)
    
    return processed_references, references_to_delete

# Process the references
processed_references, references_to_delete = process_references(references)

print(f"Total processed references: {len(processed_references)}")
print(f"Total references to delete: {len(references_to_delete)}")

# Print examples
if processed_references:
    print("\nExample of a processed reference:")
    print(json.dumps(processed_references[0], indent=2))

if references_to_delete:
    print("\nExample of a reference to delete:")
    print(json.dumps(references_to_delete[0], indent=2))

Total processed references: 240
Total references to delete: 141

Example of a processed reference:
{
  "type": "Reference",
  "collection_url": "/users/jamlung/collections/openmrs-demo/",
  "data": {
    "expressions": [
      "/users/jamlung/sources/CIEL-Example-OCL/v6/mappings/?excludeWildcard=true&excludeFuzzy=true&includeSearchMeta=true"
    ]
  },
  "__cascade": {
    "method": "sourcetoconcepts",
    "map_types": "Q-AND-A,CONCEPT-SET",
    "cascade_levels": "*",
    "return_map_types": "*"
  }
}

Example of a reference to delete:
{
  "type": "Reference",
  "collection_url": "/users/jamlung/collections/openmrs-demo/",
  "data": {
    "expressions": [
      "/users/jamlung/sources/CIEL-Example-OCL/concepts/1/5340299/"
    ]
  },
  "__cascade": {
    "method": "sourcetoconcepts",
    "map_types": "Q-AND-A,CONCEPT-SET",
    "cascade_levels": "*",
    "return_map_types": "*"
  },
  "__action": "DELETE"
}


In [40]:
#references
#processed_references
#references_to_delete

In [37]:
# Compare original and processed references

if references and processed_references:
    print("Original reference:")
    print(references[0])
    print("\nProcessed reference:")
    print(processed_references[0])
    
    # Compare specific fields
    fields_to_compare = ['expression', 'system', 'version']
    print("\nChanges in specific fields:")
    for field in fields_to_compare:
        if references[0].get(field) != processed_references[0].get(field):
            print(f"{field}:")
            print(f"  Original: {references[0].get(field)}")
            print(f"  Processed: {processed_references[0].get(field)}")

Original reference:
{'expression': '/users/jamlung/sources/CIEL-Example-OCL/v6/mappings/?excludeWildcard=true&excludeFuzzy=true&includeSearchMeta=true', 'reference_type': 'mappings', 'id': 7532268, 'last_resolved_at': '2024-07-19T14:30:04.528237Z', 'uri': '/users/jamlung/collections/openmrs-demo/references/7532268/', 'uuid': '7532268', 'include': True, 'type': 'CollectionReference', 'code': None, 'resource_version': None, 'namespace': None, 'system': '/users/jamlung/sources/CIEL-Example-OCL/', 'version': 'v6', 'valueset': None, 'cascade': None, 'filter': [{'op': '=', 'value': 'true', 'property': 'excludeWildcard'}, {'op': '=', 'value': 'true', 'property': 'excludeFuzzy'}, {'op': '=', 'value': 'true', 'property': 'includeSearchMeta'}], 'display': None, 'created_at': '2024-07-19T14:30:05.566932Z', 'updated_at': '2024-07-19T14:30:05.566947Z', 'concepts': 0, 'mappings': 7, 'translation': 'Include mappings from version "v6" of jamlung/CIEL-Example-OCL having excludeWildcard equal to "true" 

In [39]:
# Save processed references to a file (optional)

def save_jsonl(data, filename):
    with open(filename, 'w') as f:
        for item in data:
            f.write(json.dumps(item) + '\n')
    print(f"Saved {len(data)} items to {filename}")

# Save processed (unversioned) references
unversioned_file = "unversioned_references.jsonl"
save_jsonl(processed_references, unversioned_file)

# Save references to delete
delete_file = "references_to_delete.jsonl"
save_jsonl(references_to_delete, delete_file)

# Optionally, display the first few lines of each file
def display_file_preview(filename, num_lines=3):
    print(f"\nFirst {num_lines} lines of {filename}:")
    with open(filename, 'r') as f:
        for i, line in enumerate(f):
            if i < num_lines:
                print(line.strip())
            else:
                break

display_file_preview(unversioned_file)
display_file_preview(delete_file)

Saved 240 items to unversioned_references.jsonl
Saved 141 items to references_to_delete.jsonl

First 3 lines of unversioned_references.jsonl:
{"type": "Reference", "collection_url": "/users/jamlung/collections/openmrs-demo/", "data": {"expressions": ["/users/jamlung/sources/CIEL-Example-OCL/v6/mappings/?excludeWildcard=true&excludeFuzzy=true&includeSearchMeta=true"]}, "__cascade": {"method": "sourcetoconcepts", "map_types": "Q-AND-A,CONCEPT-SET", "cascade_levels": "*", "return_map_types": "*"}}
{"type": "Reference", "collection_url": "/users/jamlung/collections/openmrs-demo/", "data": {"expressions": ["/users/jamlung/sources/CIEL-Example-OCL/concepts/1/"]}, "__cascade": {"method": "sourcetoconcepts", "map_types": "Q-AND-A,CONCEPT-SET", "cascade_levels": "*", "return_map_types": "*"}}
{"type": "Reference", "collection_url": "/users/jamlung/collections/openmrs-demo/", "data": {"expressions": ["/users/jamlung/sources/CIEL-Example-OCL/mappings/2/"]}, "__cascade": {"method": "sourcetoconcep