# Programmatic ingestion via the REST API

The frontend normally drives the ingestion workflow by calling three FastAPI endpoints in sequence:

1. [`/upload`](../backend/score.py) registers a `Document` node and stores the original file under `backend/merged_files`.
2. [`/extract`](../backend/score.py) chunks the text, generates embeddings, writes graph triples, and updates the document status.
3. [`/post_processing`](../backend/score.py) materialises optional indices (vector + hybrid search, similarity graph, etc.).

When you already have a project that has been preprocessed into Markdown files the UI can be bypassed entirely. Run the backend as normal (for example with `uvicorn score:app --reload` from the `backend` directory or through Docker) and then call the helper located in `backend/scripts/bulk_ingest_via_api.py`.

### Use directly from Python (ideal for notebooks)

The helper exposes an `ingest_project_via_api` function so you can trigger ingestion from Python without touching the command line. This works well at the end of your preprocessing notebook:

```python
from backend.scripts.bulk_ingest_via_api import ingest_project_via_api

ingest_project_via_api(
    project_name="PROJECT_NAME",
    base_url="http://localhost:8000",
    uri="neo4j://neo4j:7687",
    user="neo4j",
    password="letmein123",
    database="neo4j",
)
```

All parameters fall back to the same environment variables that the UI relies on (`LLM_GRAPH_BUILDER_BASE_URL`, `NEO4J_URI`, etc.), so you can omit explicit arguments when they are already configured.

In [5]:
import sys
from pathlib import Path

repo_root = Path.cwd().resolve()
while repo_root != repo_root.parent and not (repo_root / "backend").exists():
    repo_root = repo_root.parent

sys.path.insert(0, str(repo_root))

In [6]:
from backend.scripts.bulk_ingest_via_api import ingest_project_via_api

ingest_project_via_api(
    project_name="moadchat",
    base_url="http://localhost:8000",
    uri="neo4j://neo4j:7687",
    user="neo4j",
    password="letmein123",
    database="neo4j",
)

INFO - Uploading Componentization- Decomposing Monolithic LLM Responses into Manipulable Semantic Units V1.md
INFO - Extracting Componentization- Decomposing Monolithic LLM Responses into Manipulable Semantic Units V1.md


IngestionError: Endpoint http://localhost:8000/extract returned non-success status: {'status': 'Failed', 'error': "'>' not supported between instances of 'NoneType' and 'NoneType'", 'message': "Failed To Process File:Componentization- Decomposing Monolithic LLM Responses into Manipulable Semantic Units V1.md or LLM Unable To Parse Content '>' not supported between instances of 'NoneType' and 'NoneType'", 'file_name': 'Componentization- Decomposing Monolithic LLM Responses into Manipulable Semantic Units V1.md'}

### Run as a CLI (optional)

If you prefer to drive things from a shell the module still ships a CLI:

```bash
python backend/scripts/bulk_ingest_via_api.py PROJECT_NAME \
  --base-url http://localhost:8000 \
  --uri neo4j://neo4j:7687 \
  --user neo4j \
  --password letmein123 \
  --database neo4j
```

The helper assumes the Markdown files live under `preprocessing/input_data/processed/PROJECT_NAME/markdown`. Each file is uploaded as a single chunk, extracted, and (optionally) post-processed. The defaults can be overridden with CLI flags:

| Flag | Purpose |
|------|---------|
| `--model` | Embedding model stored on the `Document` node (defaults to `EMBEDDING_MODEL` or `all-MiniLM-L6-v2`). |
| `--token-chunk-size`, `--chunk-overlap`, `--chunks-to-combine` | Forwarded directly to the `/extract` endpoint for chunking control. |
| `--retry-condition`, `--additional-instructions` | Passed through unchanged to reuse UI features. |
| `--post-processing-tasks` | Space-separated list of tasks for `/post_processing` (defaults to enabling hybrid search and chunk similarities). |
| `--skip-post-processing` | Disable the `/post_processing` step entirely. |
| `--env-file` | Load backend credentials from a `.env` file so the command stays short. |

All responses are checked for a `status` of `"Success"`; a non-success result or HTTP error raises a descriptive exception so failures become visible in CI and automation pipelines. The script also logs progress, making it easy to follow along when large batches are processed.

Because the backend still enforces that local files originate in `backend/merged_files`, the upload step continues to use the standard `/upload` endpoint. No extra symlinks or directory changes are necessary—the helper simply streams each Markdown file from the preprocessing directory straight to the backend.