# FakeScope: Fact-Check + LLM Review Pipeline

This notebook builds a pipeline to: (3) compare your model's prediction against Google Fact Check results, (4) produce a teacher–student LLM review, (5) explain why a claim is not fake (when applicable), and (6) ask an LLM to reflect on your model (prompt engineering hints).

Environment variables required

## Notes
- If you see warnings about missing `openai` or `joblib`, install them in your environment (e.g., `pip install openai joblib`).
- Set secrets via environment variables, not in notebooks: 
  - macOS zsh example: `export OPENAI_API_KEY=...` `export GOOGLE_FACTCHECK_API_KEY=...`
- The baseline block auto-loads `tfidf_vectorizer.joblib` and `best_baseline_model.joblib` if present next to this notebook. If absent, it returns a neutral prediction.
- You can switch the LLM model name in the pipeline init to any compatible chat model available to your OpenAI account or proxy.

## Setup and usage (README)

### 1) Install dependencies

Use your Python environment (3.9+ recommended) and install from `requirements.txt`:

```bash
# macOS (zsh)
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

Note: On Apple Silicon, PyTorch wheels are available via pip; if you face issues, visit: https://pytorch.org/get-started/locally/

### 2) Configure environment variables

Set secrets using environment variables before running the notebook:

```bash
export OPENAI_API_KEY="<your-openai-key>"
# Optional if you route through a proxy or Azure/OpenAI-compatible endpoint
# export OPENAI_API_BASE="https://your-custom-base/v1"

export GOOGLE_FACTCHECK_API_KEY="<your-google-factcheck-api-key>"
```

### 3) Local models (optional)

Place your fine-tuned transformers under one of the following folders (already in this repo):
- `distilbert_news_adapted/`
- `distilbert_fakenews_2stage/`
- `distilbert_fakenews/`

The pipeline prefers transformer predictions and falls back to the baseline (`tfidf_vectorizer.joblib` + `best_baseline_model.joblib`) if no transformer is found.

### 4) Run the pipeline

Execute the example cell in this notebook (last code cell) or call `FakeScopePipeline(...).run(text)` on your own input.

### 5) Caching

Fact check results are cached in `factcheck_cache.json` for 24 hours to reduce API calls. You can delete the file to clear the cache.

### 6) Output validation

The pipeline’s output is validated against a JSON Schema when `jsonschema` is installed. The fields `schema_valid` and `schema_error` summarize the validation result.

### 7) Troubleshooting

- If you see `openai` not found, install with `pip install openai` or re-run `pip install -r requirements.txt`.
- If transformer loading fails, ensure the model folders contain `config.json` and `model.safetensors` (and tokenizer files if needed).
- To disable transformer predictions and use only the baseline, initialize the pipeline with `use_transformer=False`.
- If the Google API returns empty results, try a shorter, more direct claim string or adjust `language_code`.
