# structflo.ner — Quickstart

This notebook walks through the core features of `structflo.ner`:

1. Basic extraction with a cloud model (Gemini)
2. Local extraction with Ollama
3. Using built-in profiles
4. Custom profiles
5. Working with results

## Setup

```bash
uv add structflo-ner
# for DataFrame support
uv add "structflo-ner[dataframe]"
```

In [None]:
from structflo.ner import NERExtractor, FULL, CHEMISTRY, BIOLOGY, BIOACTIVITY, DISEASE

## 1. Cloud model (Gemini)

The default model is `gemini-2.5-flash`. Pass your API key or set the `GEMINI_API_KEY` environment variable.

In [None]:
extractor = NERExtractor(api_key="YOUR_GEMINI_KEY")  # or set GEMINI_API_KEY env var

text = (
    "Gefitinib (ZD1839) is a first-generation EGFR tyrosine kinase inhibitor "
    "with IC50 = 0.033 µM, approved for non-small cell lung cancer (NSCLC). "
    "Its SMILES is COc1cc2ncnc(Nc3ccc(F)c(Cl)c3)c2cc1OCCCN1CCOCC1."
)

result = extractor.extract(text)
result

## 2. Local model via Ollama

Run extraction on your own hardware — no API key needed.

Make sure Ollama is running locally:
```bash
ollama serve
ollama pull gemma3:27b
```

In [None]:
local_extractor = NERExtractor(
    model_id="gemma3:27b",
    model_url="http://localhost:11434",
)

result_local = local_extractor.extract(
    "Sorafenib is a multi-kinase inhibitor targeting VEGFR-2, PDGFR, and RAF with IC50 values of 90 nM, 57 nM, and 6 nM respectively."
)
result_local

## 3. Built-in profiles

Profiles control which entity types are extracted. Use them to focus the model on specific categories.

In [None]:
# Extract only chemical entities
chem_result = extractor.extract(text, profile=CHEMISTRY)
print("Compounds:", chem_result.compounds)
print("Targets:", chem_result.targets)  # empty — not in CHEMISTRY profile

In [None]:
# Merge profiles to combine entity types
combined = CHEMISTRY.merge(BIOLOGY)
print(f"Profile: {combined.name}")
print(f"Entity classes: {combined.entity_classes}")

combined_result = extractor.extract(text, profile=combined)
print("Compounds:", combined_result.compounds)
print("Targets:", combined_result.targets)

## 4. Custom profiles

Define your own extraction profiles for domain-specific use cases.

In [None]:
from structflo.ner import EntityProfile

kinase_profile = EntityProfile(
    name="kinase_inhibitors",
    entity_classes=["compound_name", "smiles", "target", "bioactivity"],
    prompt="Extract kinase inhibitor names, their SMILES strings, kinase targets, and potency values (IC50, Ki, Kd).",
    examples=[],  # add your own few-shot examples here for best results
)

kinase_result = extractor.extract(text, profile=kinase_profile)
kinase_result

## 5. Working with results

In [None]:
# Access typed entity lists
print("Compounds:", result.compounds)
print("Targets:", result.targets)
print("Bioactivities:", result.bioactivities)
print("Diseases:", result.diseases)
print("Mechanisms:", result.mechanisms)

In [None]:
# Flat list of all entities
for entity in result.all_entities():
    print(f"{entity.entity_type:20s} | {entity.text}")

In [None]:
# Export to pandas DataFrame
df = result.to_dataframe()
df

In [None]:
# Serialize to dict (useful for JSON export)
import json

print(json.dumps(result.to_dict(), indent=2))

## 6. Batch extraction

Pass a list of texts to extract from multiple documents.

In [None]:
texts = [
    "Imatinib inhibits BCR-ABL with IC50 = 0.6 µM in CML.",
    "Trastuzumab targets HER2 in breast cancer patients.",
    "Remdesivir (GS-5734) is an antiviral with EC50 = 0.77 µM against SARS-CoV-2.",
]

results = extractor.extract(texts)

for i, r in enumerate(results):
    print(f"\n--- Text {i+1} ---")
    for entity in r.all_entities():
        print(f"  {entity.entity_type:20s} | {entity.text}")