### 📦 Installing Required Packages

To ensure all dependencies for the project are installed, we use the following command:

In [None]:
%pip install -r requirements.txt

### 🛠️ Import Required Modules and Classes

This block loads all essential modules and classes from the `sokegraph` package and its dependencies. These components enable the full pipeline, including:

- **Paper sources**  
  Retrieve papers from:
  - **Semantic Scholar**
  - **PDF ZIP files**
  - **Journal API**

- **Paper ranking engine**  
  Score and prioritize papers based on:
  - Keyword relevance  
  - Synonym and concept expansion  
  - Opposite keyword filtering  

- **Knowledge graph interface**  
  Build and manage graphs using:
  - **Neo4j**
  - **NetworkX**

- **AI agents**  
  Integrate powerful language models:
  - **OpenAI**
  - **Gemini**
  - **LLaMA**
  - **Ollama**
  - **Claude**

- **Ontology management**  
  Update and expand ontologies based on enriched content and paper metadata.

- **Logging & configuration**  
  Monitor pipeline progress and capture errors or debug messages via the custom logging system.

- **JSON and file handling**  
  Load configurations, queries, ontologies, and export results efficiently.

These imports prepare the environment for the full research pipeline: from **paper collection**, **ontology enrichment**, **paper ranking**, and **graph construction**.



In [5]:
from sokegraph.base_paper_source import BasePaperSource
from sokegraph.semantic_scholar_source import SemanticScholarPaperSource
from sokegraph.pdf_paper_source import PDFPaperSource
from sokegraph.paper_ranker import PaperRanker
from sokegraph.knowledge_graph import KnowledgeGraph
from sokegraph.util.logger import LOG
from sokegraph.ai_agent import AIAgent
from sokegraph.openai_agent import OpenAIAgent
from sokegraph.gemini_agent import GeminiAgent
from sokegraph.ontology_updater import OntologyUpdater
from sokegraph.neo4j_knowledge_graph import Neo4jKnowledgeGraph
from sokegraph.llama_agent import LlamaAgent
from sokegraph.ollama_agent import OllamaAgent
from sokegraph.claude_agent import ClaudeAgent
from sokegraph.journal_api_source import JournalApiPaperSource
from sokegraph.networkx_knowledge_graph import NetworkXKnowledgeGraph
import json

# Initialize and Display the User Interface

- The `SOKEGraphUI` class is imported from the `ui_inputs` module.
- An instance of `SOKEGraphUI` is created and assigned to `ui`.
- The `display_ui()` method is called to render the interactive user interface.

> **Note:**  
> If you make changes to `ui_inputs.py`, you might need to uncomment the `importlib.reload` lines to reload the module without restarting the notebook.


In [None]:
import importlib
import sokegraph.ui_inputs
importlib.reload(sokegraph.ui_inputs)
from sokegraph.ui_inputs import SOKEGraphUI

print("shahlla")

# Create UI instance
ui = SOKEGraphUI()

# Display the UI in the notebook
ui.display_ui()


## 🧠 Step 0: Setup AI Agent

- Begins the pipeline by logging its initialization.
- Selects and initializes the appropriate AI agent based on the `AI` parameter from the user interface.
- Supports the following AI providers:
  - `openAI` → initializes an `OpenAIAgent` with the specified API keys file.
  - `gemini` → initializes a `GeminiAgent`.
  - `llama` → initializes a `LlamaAgent`.
  - `ollama` → initializes an `OllamaAgent`.
  - `claude` → initializes a `ClaudeAgent`.
- Raises an error if an unsupported provider is selected.

> ⚠️ **Important:**  
> Ensure the API keys file specified in `ui.params.api_keys_file` exists and contains valid credentials for the selected AI provider.


In [None]:
LOG.info("🚀 Starting Full Pipeline")

# 0. Setup AI agent
ai_tool: AIAgent
if ui.params.AI == "openAI":
    ai_tool = OpenAIAgent(ui.params.api_keys_file)
elif ui.params.AI == "gemini":
    ai_tool = GeminiAgent(ui.params.api_keys_file)
elif ui.params.AI == "llama":
    ai_tool = LlamaAgent(ui.params.api_keys_file)
elif ui.params.AI == "ollama":
    ai_tool = OllamaAgent()
elif ui.params.AI == "claude":
    ai_tool = ClaudeAgent(ui.params.journal_api_key_file)
else:
    raise ValueError(f"Unsupported AI provider: {ui.params.AI}")

## 📄 Step 1: Select Paper Source

- Determines the paper source based on user input:
  - If `number_papers` and a `paper_query_file` are provided, papers are retrieved from **Semantic Scholar** using the `SemanticScholarPaperSource` class.
  - If a ZIP file of PDFs (`pdfs_file`) is provided (and `number_papers` is not), papers are extracted using the `PDFPaperSource` class.
  - If the paper source is set to **Journal API**, it uses the `JournalApiPaperSource` class with the query file and API key.

- Logs an error if:
  - The query file is missing when using Semantic Scholar or Journal API.
  - Both `number_papers` and `pdfs_file` are provided (conflict).
  - Neither option is specified.

- Calls `fetch_papers()` on the selected paper source class to retrieve papers.

> ⚠️ **Important:**  
> - You must specify **either**:
>   - `number_papers` + `paper_query_file`, **or**
>   - a `pdfs_file`, **or**
>   - `number_papers` + `paper_query_file` + `journal_api_key_file`.
> - The `fetch_papers()` method returns a list of paper metadata dictionaries that will be used in later steps.


In [None]:
# 1. Select paper source
paper_source: BasePaperSource

if ui.params.paper_source == "Semantic Scholar":
    if not ui.params.number_papers or not ui.params.paper_query_file:
        LOG.error("❌ 'number_papers' and 'paper_query_file' are required for Semantic Scholar source.")
    else:
        paper_source = SemanticScholarPaperSource(
            num_papers=int(ui.params.number_papers),
            query_file=ui.params.paper_query_file,
            output_dir=ui.params.output_dir
        )

elif ui.params.paper_source == "PDF Zip":
    if not ui.params.pdfs_file:
        LOG.error("❌ 'pdfs_file' (ZIP file) is required for PDF source.")
    else:
        paper_source = PDFPaperSource(
            zip_path=ui.params.pdfs_file,
            output_dir=ui.params.output_dir
        )

elif ui.params.paper_source == "Journal API":
    if not ui.params.paper_query_file or not ui.params.api_key_file:
        LOG.error("❌ 'paper_query_file' and 'api_key_file' are required for Journal API source.")
    else:
        paper_source = JournalApiPaperSource(
            query_file=ui.params.paper_query_file,
            api_key_file=ui.params.api_key_file,
            output_dir=ui.params.output_dir
        )

else:
    LOG.error("❌ Invalid or unsupported paper source selected.")
    paper_source = None

# 2. Fetch papers
if paper_source:
    papers = paper_source.fetch_papers()
else:
    papers = []


## 🧠 Step 2: Update Ontology Using Retrieved Papers

- Creates an instance of `OntologyUpdater` to enrich the ontology file using:
  - The retrieved papers,
  - The selected AI tool (`ai_tool`),
  - The specified output directory for saving results.

- The ontology file (`ontology_file`) contains a structured hierarchy of domain-specific concepts (e.g., in materials science).

- Calls `enrich_with_papers()` to:
  - Extract relevant concepts and keywords from the papers,
  - Integrate these insights into the ontology.

> 🔍 **What this step does:**
> - Associates papers with ontology categories by analyzing their content using the selected LLM agent.
> - Produces structured `updated_ontology` used for downstream tasks like paper ranking and graph construction.

In [None]:
# 2. Update ontology
ontology_updater = OntologyUpdater(ui.params.ontology_file, papers, ai_tool, ui.params.output_dir)  # or however you instantiate it
updated_ontology_path = ontology_updater.enrich_with_papers()

## 📊 Step 3: Rank Papers Based on Ontology and Keywords

- Instantiate the `PaperRanker` using:
  - The selected AI tool (`ai_tool`)
  - The list of fetched papers
  - The enriched ontology output path (`ontology_updater.output_path`)
  - The keyword file (`keywords_file`) containing domain-specific or user-defined terms
  - The output directory for saving ranking results

- Call `rank_papers()` to:
  - Match each paper to ontology categories using keyword relevance
  - Score and rank papers based on alignment with user interests
  - Optionally filter out papers that focus on **opposite concepts** (e.g., exclude “insulator” if your focus is “conductor”)

> 🧠 **What this step does:**
> - Produces a ranked list of relevant papers

In [None]:
# 3. Rank papers
#LOG.info("ranking papers ....")
ranker = PaperRanker(ai_tool, papers, ontology_updater.output_path, ui.params.keywords_file, ui.params.output_dir)
rank_paper_output = ranker.rank_papers()

## 🕸 Step 4: Build the Knowledge Graph

This step constructs a **knowledge graph** from the enriched ontology, enabling structured querying, visualization, and semantic exploration of materials-related knowledge.

### 🔧 What happens here:
- Loads credentials for the selected graph backend (if required) from a JSON file.
- Depending on the chosen knowledge graph type (`kg_type`):
  - For **Neo4j**:
    - Initializes a `Neo4jKnowledgeGraph` instance with the enriched ontology file and connection details (URI, username, password).
    - Calls `.build_graph()` to create the graph inside the Neo4j database.
  - For **NetworkX**:
    - Initializes a `NetworkXKnowledgeGraph` instance (or similar) with the enriched ontology file.
    - Builds an in-memory graph representation using NetworkX.
- Both approaches generate structured nodes and relationships representing ontology concepts, papers, and their links.

> ✅ **Result:**  
> A knowledge graph (either in Neo4j or as a NetworkX object) representing categorized concepts and their relationships—ready for exploration, visualization, or further analysis.

---

🎉 **Pipeline Complete!**  
You've successfully:  
1. Retrieved papers  
2. Enriched the ontology  
3. Ranked relevant publications  
4. Built a knowledge graph with your selected backend  


In [None]:
# 4. Build knowledge graph
LOG.info(" Building knowledge graph ....")


#### build graph
graph_builder: KnowledgeGraph
if(ui.params.kg_type == "neo4j"):
    ### load
    with open(ui.params.kg_credentials_file, "r") as f:
        credentials = json.load(f)
    graph_builder = Neo4jKnowledgeGraph(ontology_updater.output_path,
                                        credentials["neo4j_uri"],
                                        credentials["neo4j_user"],
                                        credentials["neo4j_pass"])
elif(ui.params.kg_type == "networkx"):
    graph_builder = NetworkXKnowledgeGraph(ontology_updater.output_path)

graph_builder.build_graph()


LOG.info("🎉 Pipeline Completed Successfully")