# A1 Agent Testing Notebook
This notebook demonstrates the usage of A1 agent with various biomedical databases for CRISPR screen planning.

In [1]:
# Import required libraries
from biomni.agent import A1


In [2]:
# Initialize the A1 agent
agent = A1(
    path='./data',
    llm='gpt-4.1',
    source = 'OpenAI',
    load_datalake=False
)

Skipping datalake download (load_datalake=False)
Note: Some tools may require datalake files to function properly.


## Test 1: CRISPR Screen Planning
Test the agent's ability to plan a CRISPR screen for T cell exhaustion genes.

In [3]:
# Execute CRISPR screen planning
agent.go("""
Plan a CRISPR screen to identify genes that regulate T cell exhaustion, 
generate 32 genes that maximize the perturbation effect.
Use the following data sources:
ENCODE for chromatin accessibility in T cells
CellxGENE Census for expression patterns
GO annotations for these gene's function
""")

Using prompt-based retrieval with the agent's LLM


Plan a CRISPR screen to identify genes that regulate T cell exhaustion, 
generate 32 genes that maximize the perturbation effect.
Use the following data sources:
ENCODE for chromatin accessibility in T cells
CellxGENE Census for expression patterns
GO annotations for these gene's function

parsing error...

FORMATTING ERROR: Your response is missing required XML tags.

You MUST include either:
- <execute>code here</execute> for code execution
- <solution>final answer</solution> for final answers

Please regenerate your response with the proper tags.

<execute>
# Step 1: Query ENCODE for chromatin accessibility data in T cells to identify accessible genomic regions.
from biomni.tool.database import query_encode

encode_prompt = "chromatin accessibility (ATAC-seq or DNase-seq) in human T cells"
encode_results = query_encode(prompt=encode_prompt, max_results=50, verbose=True)
print("ENCODE chromatin accessibility results for T cells:")
p

 '<solution>\nPrioritized CRISPR screen gene list to target for regulation of T cell exhaustion (maximizing perturbation effect, integrating ENCODE chromatin accessibility, CellxGENE expression, and GO annotation):\n\n1. ADAP1\n2. BATF\n3. BCL6\n4. BTLA\n5. C7orf50\n6. CD160\n7. CD226\n8. CD244\n9. CD27\n10. CD274 (PD-L1)\n11. CD28\n12. CD40LG\n13. CD80\n14. CD86\n15. COX19\n16. CTLA4\n17. CYP2W1\n18. ENTPD1 (CD39)\n19. EOMES\n20. GPR146\n21. HAVCR2 (TIM-3)\n22. ICOS\n23. ICOSLG\n24. IRF4\n25. LAG3\n26. MIR339\n27. NFATC1\n28. NFATC2\n29. NR4A1\n30. NR4A2\n31. NR4A3\n32. PDCD1 (PD-1)\n\n**Rationale & Data Integration:**\n- Genes were selected by integrating: \n  - ENCODE T cell chromatin accessibility (prioritizing open/active loci)\n  - CellxGENE Census (T cell and exhaustion-relevant high expression)\n  - GO annotation (immune regulation, T cell activation, exhaustion)\n  - Literature curation for additional immune checkpoint/costimulatory and key transcriptional regulators.\n- This 

## Test 2: Validation with Literature and Clinical Data
Cross-reference the selected genes with clinical trials and drug databases.

In [9]:
# Validate genes against clinical data
agent.go(f"""
For Aspirin
Check DrugCentral and ChEMBL for its most critical info.
Search ClinicalTrials.gov for relevant trials of using it in cancer.
Review safety profiles using openFDA.
""")

 'The FDA adverse event report review for Aspirin reveals:\n- Of 100 total reports, 66% were classified as "serious."\n- Frequent adverse reactions included: rash, drug interactions, type 2 diabetes mellitus, dizziness, and fatigue.\n- The FDA notes these are voluntary reports and do not indicate causality.\n\nUpdated checklist:\n1. [✓] Query DrugCentral for critical information about Aspirin (meta-info retrieved)\n2. [✓] Query ChEMBL for essential bioactivity/drug details for Aspirin (detailed drug info retrieved)\n3. [✓] Search ClinicalTrials.gov for trials of Aspirin in cancer (relevant trials found)\n4. [✓] Review Aspirin\'s safety profile using FDA adverse event reports (key safety data retrieved)\n5. [ ] Summarize and present the findings.\n\nNow, I will summarize and present all findings clearly.\n\n<solution>\nSummary of Key Findings for Aspirin:\n\n1. DrugCentral:\n   - DrugCentral offers comprehensive Aspirin data via API and database, including indications, structure, and re

## Test 3: Ontology and Pathway Analysis
Analyze the selected genes using ontology tools.

In [None]:
# Perform ontology analysis
agent.go("""
For the selected gene set:CD274, ERBB2, EGFR
1. Use QuickGO to get GO annotations
2. Use OLS to map terms and pathways
3. Use Reactome for pathway analysis
""")

Using prompt-based retrieval with the agent's LLM


For the selected gene set:CD274, ERBB2, EGFR
1. Use QuickGO to get GO annotations
2. Use OLS to map terms and pathways
3. Identify enriched biological processes

parsing error...

Each response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.

Thank you for pointing out the missing tag. Here’s my thought process and corrected response:

Thinking and Reasoning:
To address your request, I will:
- Start by using QuickGO to gather GO annotations for the genes CD274, ERBB2, and EGFR.
- This will provide the GO terms associated with each gene, which are essential for the next steps.

Let’s proceed to import the QuickGO function and query for GO annotations for the three genes.

<execute>
from biomni.tool.database import query_quickgo

genes = ["CD274", "ERBB2", "EGFR"]
go_annotations = {}

for gen

 'Thank you for your feedback. Here’s my thinking process and the corrected response:\n\nThinking process:\n- The previous attempts to run enrichment analysis failed due to a missing module error ("No module named \'gget\'"), even though the built-in function should not require this.\n- Since the built-in gene_set_enrichment_analysis function cannot be executed successfully in this environment right now, I will summarize the available results and insights using the GO annotations and OLS mapping that were retrieved in steps 1 and 2.\n- I will highlight key biological processes and mapped pathways based on the mapped GO terms for CD274, ERBB2, and EGFR.\n\n<solution>\nSummary of analysis for CD274, ERBB2, EGFR:\n\n1. QuickGO GO annotation retrieval identified GO terms for each gene, including:\n   - CD274: immune response modulation (e.g., GO:0002841), actin cytoskeleton (GO:0015629), positive regulation of transcription (GO:0045893)\n   - ERBB2: transmembrane receptor protein kinase ac

## Comprehensive Drug Analysis: Aspirin Case Study
We'll perform a detailed analysis of Aspirin using multiple databases to understand its properties, interactions, and clinical applications.

In [None]:
# 1. Chemical and Basic Drug Information
agent.go("""
For Aspirin (acetylsalicylic acid):
1. Using PubChem:
   - Get chemical structure and properties
   - List all known synonyms
   - Find compound classification
   
2. Using DrugCentral:
   - Get primary mechanisms of action
   - List approved indications
   - Get ATC classification
   - Extract pharmacological class information
   
3. Using ChEMBL:
   - Find binding targets and their affinities
   - List documented biological activities
   - Get drug-like properties
""")

 'My thinking:\nThe ChEMBL query returned detailed information for Aspirin, including:\n- Drug-like properties (Lipinski analysis, logP, H-bond donors/acceptors, etc.)\n- Synonyms and identifiers\n- ATC classifications\n- Indication class\n- ChEMBL ID\n\nHowever, the output is lengthy and detailed binding target/affinity and biological activity information may need to be explicitly extracted. I will summarize the main findings and indicate if a follow-up query is needed for specific targets and affinities.\n\nUpdated checklist:\n1. [✓] PubChem: structure, properties, synonyms, classification\n2. [✗] DrugCentral: (not directly accessible via current API)\n3. [✓] ChEMBL: drug-like properties, synonyms, ATC codes, indication class (targets/affinity may need further expansion)\n\nLet’s summarize the information gathered so far and clarify if you want a deeper listing of specific targets/affinities from ChEMBL.\n\n<solution>\nSummary for Aspirin (acetylsalicylic acid):\n\n1. PubChem:\n   - 

In [None]:
# 2. Clinical Trials Analysis
agent.go("""
Analyze Aspirin clinical trials:
1. Using ClinicalTrials.gov:
   - Find all cancer-related trials (active and completed)
   - Group trials by cancer types
   - Extract dosage information
   - Identify combination therapy trials
   - List primary and secondary outcomes
   
2. Cross-reference with DrugCentral:
   - Compare approved vs. investigational uses
   - Identify potential repurposing opportunities
""")

In [5]:
# 3. Safety and Pharmacovigilance
result = agent.go("""
Comprehensive safety analysis of Aspirin:
1. Using openFDA:
   - Extract adverse event reports
   - Calculate reporting odds ratios
   - Identify black box warnings
   - List drug-drug interactions
   - Get patient demographic patterns

2. Using DailyMed:
   - Get current labeling information
   - Extract contraindications
   - List special populations warnings
   - Compare different formulation safety profiles
""")
print(result)

Using prompt-based retrieval with the agent's LLM


Comprehensive safety analysis of Aspirin:
1. Using openFDA:
   - Extract adverse event reports
   - Calculate reporting odds ratios
   - List drug-drug interactions
   - Get patient demographic patterns

2. Using DailyMed:
   - Get current labeling information
   - Extract contraindications
   - Compare different formulation safety profiles

parsing error...

Each response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.

Thank you for pointing out the formatting requirement. Here’s my revised approach:

Thinking process:  
To begin a comprehensive safety analysis of Aspirin, I need to first extract adverse event reports from openFDA. This will provide an overview of the types and frequencies of adverse events reported in association with Aspirin, which is foundational for the subsequent ste

In [3]:
# 4. Drug Interactions and Cross-References
agent.go("""
Map Aspirin across databases:
1. Using UniChem:
   - Get all database identifiers
   - Cross-reference with other systems

2. Using DrugCentral and ChEMBL:
   - List all known drug interactions
   - Categorize by severity
   - Identify mechanism-based interactions
   - Find structural analogs
""")

Using prompt-based retrieval with the agent's LLM


Map Aspirin across databases:
1. Using UniChem:
   - Get all database identifiers
   - Cross-reference with other systems

2. Using DrugCentral and ChEMBL:
   - List all known drug interactions
   - Categorize by severity
   - Identify mechanism-based interactions
   - Find structural analogs

parsing error...

Each response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.

Thank you for your feedback. Here is my revised response including the required tags:

To address the mapping of Aspirin across databases, I will follow this checklist:

1. [ ] Query UniChem for Aspirin to get all database identifiers and cross-references.
2. [ ] Query DrugCentral for Aspirin to retrieve all known drug interactions.
3. [ ] Query ChEMBL for Aspirin to obtain more drug interaction data and structural analog

 'ChEMBL returned a list of structural analogs for Aspirin (CHEMBL25), including its derivatives and close analogs with at least 70% structural similarity. These analogs include compounds such as Aspirin DL-lysine (CHEMBL1697753) and others.\n\nHere’s the updated checklist:\n1. [✗] Query UniChem for Aspirin to get all database identifiers and cross-references (failed).\n2. [✗] Query DrugCentral for Aspirin to retrieve all known drug interactions (requires direct database/API access).\n3. [✓] Query ChEMBL for Aspirin to obtain mechanism of action data (no direct interactions, but mechanism and analogs available).\n4. [✗] Combine and categorize drug interactions by severity (not possible without interaction data).\n5. [ ] Identify mechanism-based interactions (partial: mechanism known, interactions not directly listed).\n6. [✓] List structural analogs found in ChEMBL.\n\nSummary of findings:\n- UniChem and DrugCentral could not return data due to API or access limitations.\n- ChEMBL prov

In [None]:
# 5. Molecular and Pathway Analysis
agent.go("""
Analyze molecular aspects:
1. Using QuickGO:
   - Get GO terms for Aspirin targets
   - Analyze biological processes affected

2. Using OLS:
   - Map to relevant pathways
   - Find disease associations
   - Identify molecular functions

3. Combine with ChEMBL data:
   - Analyze target protein families
   - Map to signaling pathways
""")