# ORCID Reader Demo

This notebook demonstrates how to use the ORCID Reader to retrieve researcher profiles and build a searchable index.

## Installation

First, install the required packages:

In [3]:
!pip install llama-index-readers-orcid llama-index-core


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


!pip install llama-index-readers-orcid

## Basic Usage

In [4]:
from llama_index.readers.orcid import ORCIDReader
from llama_index.core import VectorStoreIndex

# Initialize the ORCID reader
reader = ORCIDReader()

# Some example ORCID IDs (these are real public profiles)
orcid_ids = [
    "0000-0002-1825-0097",  # Josiah Carberry (fictional test profile)
    "0000-0003-1419-2405",  # Another public profile
]

# Load the data
documents = reader.load_data(orcid_ids=orcid_ids)

print(f"Loaded {len(documents)} researcher profiles")

ModuleNotFoundError: No module named 'llama_index.core'

## Exploring the Data

In [None]:
# Look at the first researcher profile
if documents:
    first_doc = documents[0]
    print("Researcher Profile:")
    print("=" * 50)
    print(first_doc.text[:1000] + "..." if len(first_doc.text) > 1000 else first_doc.text)
    print("\nMetadata:")
    print(first_doc.metadata)

## Building a Searchable Index

In [None]:
# Create a vector store index from the documents
index = VectorStoreIndex.from_documents(documents)

# Create a query engine
query_engine = index.as_query_engine()

print("Index created successfully!")

## Querying the Research Data

In [None]:
# Query the index
response = query_engine.query("What research areas are these researchers involved in?")
print("Query: What research areas are these researchers involved in?")
print("Response:", response)

In [None]:
# Another query
response = query_engine.query("Where do these researchers work?")
print("Query: Where do these researchers work?")
print("Response:", response)

## Advanced Configuration

In [None]:
# Create a reader with custom configuration
custom_reader = ORCIDReader(
    sandbox=False,  # Use production ORCID (default)
    include_works=True,  # Include publications (default)
    include_employment=True,  # Include employment history (default)
    include_education=True,  # Include education history (default)
    max_works=20,  # Limit publications per researcher
    rate_limit_delay=0.5  # Delay between API calls
)

print("Custom reader configured")

## Profile-Only Mode

In [None]:
# Create a reader that only gets basic profile information
profile_only_reader = ORCIDReader(
    include_works=False,
    include_employment=False,
    include_education=False
)

# This will be faster as it makes fewer API calls
profile_docs = profile_only_reader.load_data(["0000-0002-1825-0097"])

if profile_docs:
    print("Profile-only document:")
    print(profile_docs[0].text)

## Error Handling

In [None]:
# The reader handles various error cases gracefully
test_ids = [
    "0000-0002-1825-0097",  # Valid ORCID
    "0000-0000-0000-0000",  # Invalid/non-existent ORCID
    "invalid-id"  # Malformed ORCID
]

# Only valid profiles will be returned
error_test_docs = reader.load_data(orcid_ids=test_ids)
print(f"Loaded {len(error_test_docs)} valid profiles out of {len(test_ids)} requested")

## Tips for Usage

1. **Rate Limiting**: The reader includes built-in rate limiting to respect ORCID's API limits
2. **Public Data Only**: Only publicly available information is retrieved
3. **Error Handling**: Invalid or private ORCID IDs are skipped gracefully
4. **Flexible Configuration**: Customize what data sections to include based on your needs
5. **Batch Processing**: Process multiple researchers in a single call for efficiency

In [None]:
from llama_index.readers.orcid import ORCIDReader
from llama_index.core import VectorStoreIndex

# Initialize the ORCID reader
reader = ORCIDReader()

# Some example ORCID IDs (these are real public profiles)
orcid_ids = [
    "0000-0002-1825-0097",  # Josiah Carberry (fictional test profile)
    "0000-0003-1419-2405",  # Another public profile
]

# Load the data
documents = reader.load_data(orcid_ids=orcid_ids)

print(f"Loaded {len(documents)} researcher profiles")