### 🛠️ Configuration Parameters

- **paper_source**: Selects the source of papers. Options:
  - `"Semantic Scholar"`: Fetch papers using search queries.
  - `"PDF Zip"`: Use a local ZIP file containing PDFs.
  - `"Journal API"`: Fetch papers from the Web of Science Journal API (requires `api_key_file`).

- **number_papers**: Number of papers to fetch when using `"Semantic Scholar"` as the source.

- **paper_query_file**: Path to a `.txt` file containing one paper search query per line.

- **pdfs_file**: Path to a ZIP file containing PDF papers (only used if `paper_source="PDF Zip"`).

- **api_key_file**: Path to a file containing the API key for Journal API (used only when `paper_source="Journal API"`).

- **ontology_file**: Base ontology file in JSON format.

- **AI**: Selects the AI model to use. Options: `"openAI"`, `"gemini"`, `"llama"`, `"ollama"`, `"claude"`.

- **API_keys**: Path to your AI model API keys file (e.g., OpenAI key TXT file).

- **keyword_query_file**: Text file listing keywords to extract and rank from papers.

- **model_knowledge_graph**: Type of graph backend to build the knowledge graph. Options:
  - `"neo4j"`: Uses Neo4j graph database.
  - `"networkx"`: Uses an in-memory NetworkX graph.

- **credentials_for_knowledge_graph**: JSON file containing connection credentials for Neo4j (only used when `model_knowledge_graph="neo4j"`).

- **output_dir**: Directory where all output files (results, logs, metadata) will be saved.


In [1]:
from types import SimpleNamespace

params = SimpleNamespace(
    paper_source="Semantic Scholar",  # Options: "Semantic Scholar", "PDF Zip", "Journal API"
    number_papers=3,                # Number of papers to fetch from Semantic Scholar
    paper_query_file="external/input/paper_query.txt",   # Text file with one search query per line
    pdfs_file=None,                  # Optional: ZIP file with PDFs (for PDF source)
    api_key_file="",  # API key file for Journal API source
    ontology_file="external/input/ontology.json",  # Base ontology file (JSON or OWL)
    AI="openAI",                          # Options: "openAI", "gemini", "llama", "ollama", "claude"
    API_keys="external/input/OpenAI_APIs_SOKEmatgraph1.txt",         # API key file for AI tools
    keyword_query_file="external/input/keyword_query.txt",   # Text file listing keywords
    model_knowledge_graph="networkx",       # Options: "neo4j", "networkx"
    credentials_for_knowledge_graph="external/input/neo4j.json",  # Graph DB credentials
    output_dir="external/output/"                 # Output directory
)


### 🚀 Run the Full SOKEGraph Pipeline

After you’ve defined your **`params`** dictionary in a code cell, run the entire pipeline with:

In [None]:
from sokegraph.core.full_pipeline import full_pipeline_main

full_pipeline_main(params)

### ✅ What Happens Under the Hood

#### 📄 Paper Retrieval
Fetches papers using one of three sources:
- **Semantic Scholar**: Based on search queries (`paper_query_file`, `number_papers`)
- **PDF Zip**: Loads papers from a local ZIP file (`pdfs_file`)
- **Journal API**: Queries the Web of Science Journal API (requires `api_key_file`)

#### 🧠 Ontology Enrichment
Uses your selected AI model (`AI = "openAI"`, `"gemini"`, `"llama"`, `"ollama"`, or `"claude"`) to enhance the base ontology with:
- New concepts  
- Keywords and synonyms  
- Topic structures

#### 📊 Paper Ranking
Ranks papers using:
- Keyword match scores (from `keyword_query_file`)  
- AI-generated semantic relevance  
- Filtering based on contradictory or unrelated terms  

#### 🕸 Knowledge Graph Construction
Builds a knowledge graph using either:
- **Neo4j**: Connects via `credentials_for_knowledge_graph`
- **NetworkX**: In-memory graph for lightweight analysis

Graphs include:
- Ontology layers and categories  
- Concept-paper links  
- Keyword and metadata connections  

#### 💾 Output Saving
All outputs are saved to your defined `output_dir`, including:
- Enriched ontology file  
- Ranked paper list (CSV/JSON)  
- Logs and graph data  

> 💡 **Tip:** Double-check that all file paths in your `params` are valid and that services like **Neo4j**, **Ollama**, or your Journal API access are available before starting the pipeline.