# Optional: LLMs for Scientific Research

Using Large Language Models in research workflows.

## Learning Objectives

1. Understand LLM capabilities and limitations
2. Use APIs for text processing
3. Extract information from scientific text
4. Generate and improve scientific writing
5. Apply to code generation and debugging

In [None]:
! pip install -q pycse
from pycse.colab import pdf

In [None]:
import os

# APIs require keys - set as environment variables
# export OPENAI_API_KEY="your-key"
# export ANTHROPIC_API_KEY="your-key"

try:
    import openai
    OPENAI_AVAILABLE = True
except ImportError:
    OPENAI_AVAILABLE = False
    print("OpenAI not installed. Run: pip install openai")

try:
    import anthropic
    ANTHROPIC_AVAILABLE = True
except ImportError:
    ANTHROPIC_AVAILABLE = False
    print("Anthropic not installed. Run: pip install anthropic")

## LLM Capabilities for Research

| Task | Use Case | Reliability |
|------|----------|-------------|
| Text summarization | Literature review | High |
| Information extraction | Data mining papers | Medium |
| Writing assistance | Manuscript editing | High |
| Code generation | Prototyping | Medium |
| Code debugging | Error explanation | High |
| Brainstorming | Research ideas | Medium |
| Math/calculations | Numerical work | Low - verify! |

In [None]:
# Example: Information extraction from abstract
abstract = """
We investigated the catalytic hydrogenation of CO2 to methanol over Cu/ZnO/Al2O3 
catalysts at temperatures ranging from 200-300°C and pressures of 30-50 bar. 
The catalyst with 10 wt% Cu loading achieved the highest methanol selectivity 
of 85% at 250°C and 40 bar, with a CO2 conversion of 23%. Characterization by 
XRD and TPR revealed that the optimal Cu dispersion was achieved at this loading.
"""

# Extraction prompt
extraction_prompt = f"""
Extract the following information from this abstract:
1. Catalyst composition
2. Temperature range
3. Pressure range  
4. Best performance conditions
5. Key metrics (conversion, selectivity)

Abstract: {abstract}

Format as JSON.
"""

print("Prompt for information extraction:")
print(extraction_prompt)

In [None]:
# Example: Code generation prompt
code_prompt = """
Write a Python function that:
1. Takes temperature (K) and pressure (bar) as inputs
2. Calculates the compressibility factor Z using the van der Waals equation
3. Uses a=3.64 L²bar/mol² and b=0.0427 L/mol for CO2
4. Returns Z

Include docstring and type hints.
"""

print("Code generation prompt:")
print(code_prompt)

## Best Practices

1. **Be specific**: Clear prompts get better results
2. **Verify outputs**: Never trust LLM calculations blindly
3. **Iterate**: Refine prompts based on results
4. **Use examples**: Show the format you want
5. **Chain tasks**: Break complex work into steps

In [None]:
# Example API call structure (requires API key)
def call_claude(prompt, api_key=None):
    """Call Claude API with a prompt."""
    if not ANTHROPIC_AVAILABLE:
        return "Anthropic library not installed"
    
    if api_key is None:
        api_key = os.getenv('ANTHROPIC_API_KEY')
    
    if not api_key:
        return "API key not set"
    
    client = anthropic.Anthropic(api_key=api_key)
    
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    
    return message.content[0].text

# Uncomment to test (requires API key)
# result = call_claude("What is the ideal gas law?")
# print(result)

## Limitations

- **Hallucinations**: LLMs can make up facts and citations
- **Math errors**: Don't trust numerical calculations
- **Knowledge cutoff**: May not know recent work
- **Context length**: Can't process very long documents
- **Reproducibility**: Outputs vary between calls

## Responsible Use

- Always verify facts and calculations
- Cite sources properly (LLM output isn't a source)
- Use as a tool, not a replacement for understanding
- Follow institutional policies on AI use

In [None]:
! pip install -q jupyterquiz
from jupyterquiz import display_quiz

display_quiz("https://raw.githubusercontent.com/jkitchin/s26-06642/main/dsmles/optional/quizzes/llms-quiz.json")

## Summary

LLMs are powerful tools for:
- Literature review and summarization
- Code generation and debugging
- Writing assistance

But require:
- Careful verification
- Domain expertise to evaluate outputs
- Responsible use