# Programmatic ingestion via the REST API

The frontend normally drives the ingestion workflow by calling three FastAPI endpoints in sequence:

1. [`/upload`](../backend/score.py) registers a `Document` node and stores the original file under `backend/merged_files`.
2. [`/extract`](../backend/score.py) chunks the text, generates embeddings, writes graph triples, and updates the document status.
3. [`/post_processing`](../backend/score.py) materialises optional indices (vector + hybrid search, similarity graph, etc.).

When you already have a project that has been preprocessed into Markdown files the UI can be bypassed entirely. Run the backend as normal (for example with `uvicorn score:app --reload` from the `backend` directory or through Docker) and then call the helper located in `backend/scripts/bulk_ingest_via_api.py`.

### Use directly from Python (ideal for notebooks)

The helper exposes an `ingest_project_via_api` function so you can trigger ingestion from Python without touching the command line. This works well at the end of your preprocessing notebook:

```python
from backend.scripts.bulk_ingest_via_api import ingest_project_via_api

ingest_project_via_api(
    project_name="PROJECT_NAME",
    base_url="http://localhost:8000",
    uri="neo4j://neo4j:7687",
    user="neo4j",
    password="letmein123",
    database="neo4j",
)
```

All parameters fall back to the same environment variables that the UI relies on (`LLM_GRAPH_BUILDER_BASE_URL`, `NEO4J_URI`, etc.), so you can omit explicit arguments when they are already configured.

In [3]:
import sys
from pathlib import Path

repo_root = Path.cwd().resolve()
while repo_root != repo_root.parent and not (repo_root / "backend").exists():
    repo_root = repo_root.parent

sys.path.insert(0, str(repo_root))

In [4]:
from backend.scripts.bulk_ingest_via_api import ingest_project_via_api

PROJECT_NAME="Mobility_Wellness_Workshop"
POST_PROCESSING_TASKS = (
    'materialize_text_chunk_similarities',
    'enable_hybrid_search_and_fulltext_search_in_bloom',
    'materialize_entity_similarities',
    'enable_communities'
)

ingest_project_via_api(
    project_name=PROJECT_NAME,
    base_url="http://localhost:8000",
    uri="neo4j://neo4j:7687",
    user="neo4j",
    password="letmein123",
    database="neo4j",
    model='openai_gpt_5_mini',
    token_chunk_size=200,
    chunk_overlap=50,
    chunks_to_combine=1,
    post_processing_tasks=POST_PROCESSING_TASKS,
    skip_post_processing=False
)

INFO - Uploading 20251006_Mobility x Wellness Workshop_Mitsuzawa.md
INFO - Extracting 20251006_Mobility x Wellness Workshop_Mitsuzawa.md
INFO - Uploading 250923_alcohol_forWS.md
INFO - Extracting 250923_alcohol_forWS.md
INFO - Uploading Agenda_Bios_Mobility Wellness 1.md
INFO - Extracting Agenda_Bios_Mobility Wellness 1.md
INFO - Uploading Angela 1.md
INFO - Extracting Angela 1.md
INFO - Uploading Angela1 1.md
INFO - Extracting Angela1 1.md
INFO - Uploading Ashley 1.md
INFO - Extracting Ashley 1.md
INFO - Uploading Ashley1 1.md
INFO - Extracting Ashley1 1.md
INFO - Uploading Carrington_WheelSense_MobilityxWellness.md
INFO - Extracting Carrington_WheelSense_MobilityxWellness.md
INFO - Uploading DabelkoSchoeny_99 P Labs_Workshop Oct 2025  (1).pptx.md
INFO - Extracting DabelkoSchoeny_99 P Labs_Workshop Oct 2025  (1).pptx.md
INFO - Uploading Delirium EEG for Honda 10-6-2025.md
INFO - Extracting Delirium EEG for Honda 10-6-2025.md
INFO - Uploading Diego 1.md
INFO - Extracting Diego 1.md
INF

28