<a href="https://colab.research.google.com/github/sanakashgouli/SOKEGraph/blob/main/full_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Required Modules and Classes

This block imports all the necessary modules and classes from the `sokegraph` package and other dependencies. These components provide core functionalities such as:

- Paper sources (retrieving papers from Semantic Scholar or PDFs)
- Paper ranking logic
- Knowledge graph management
- AI agents (OpenAI, Gemini, Llama)
- Ontology updating utilities
- Logging utilities
- Neo4j knowledge graph interface
- JSON handling

These imports set up the environment for later steps like fetching papers, ranking them, updating ontologies, and building the knowledge graph.


In [1]:
from sokegraph.base_paper_source import BasePaperSource
from sokegraph.semantic_scholar_source import SemanticScholarPaperSource
from sokegraph.pdf_paper_source import PDFPaperSource
from sokegraph.paper_ranker import PaperRanker
from sokegraph.knowledge_graph import KnowledgeGraph
from sokegraph.util.logger import LOG
from sokegraph.ai_agent import AIAgent
from sokegraph.openai_agent import OpenAIAgent
from sokegraph.gemini_agent import GeminiAgent
from sokegraph.ontology_updater import OntologyUpdater
from sokegraph.neo4j_knowledge_graph import Neo4jKnowledgeGraph
from sokegraph.llama_agent import LlamaAgent
import json

2025-06-20 14:47:44,119 [INFO ]	sokegraph v1.1
2025-06-20 14:47:44,119 [INFO ]	ipykernel_launcher.py --f=/Users/shahlla/Library/Jupyter/runtime/kernel-v3ab8ec5f36161de32ab8df9445dd72f9685bab1f9.json


# Initialize and Display the User Interface

- The `SOKEGraphUI` class is imported from the `ui_inputs` module.
- An instance of `SOKEGraphUI` is created and assigned to `ui`.
- The `display_ui()` method is called to render the interactive user interface.

> **Note:**  
> If you make changes to `ui_inputs.py`, you might need to uncomment the `importlib.reload` lines to reload the module without restarting the notebook.


In [None]:
#import importlib
#import sokegraph.ui_inputs
#importlib.reload(sokegraph.ui_inputs)
from sokegraph.ui_inputs import SOKEGraphUI


# Create UI instance
ui = SOKEGraphUI()

# Display the UI in the notebook
ui.display_ui()


VBox(children=(Dropdown(description='Paper Input Type:', options=('Select Option', 'Number of Papers', 'PDF Zi…

In [None]:
import requests

response = requests.post(
    'http://localhost:11434/api/generate',
    json={
        "model": "llama3",  # Make sure this model is pulled: `ollama pull llama3`
        "prompt": "Explain the law of gravity simply.",
        "stream": False
    }
)

# Print the entire JSON response to debug
print("🧾 Full JSON response:")
print(response.json())


🧾 Full JSON response:
{'model': 'llama3', 'created_at': '2025-06-19T17:15:33.805139Z', 'response': "The law of gravity! It's a fundamental concept in physics that explains how objects interact with each other.\n\nIn simple terms, the law of gravity states:\n\n**Every object in the universe attracts every other object with a force proportional to their mass and distance between them.**\n\nHere's what this means:\n\n1. **Mass**: The more massive an object is (like a planet or a car), the stronger its gravitational pull will be.\n2. **Distance**: As objects get farther apart, the gravitational force between them gets weaker.\n\nTo illustrate this, imagine you have a ball and a friend who is standing 5 feet away from you. If your friend has a small amount of mass (like a feather), the gravitational force between you and them would be very weak. But if they had a large amount of mass (like a bowling ball), the gravitational force would be much stronger.\n\nIn fact, that's why planets orbit 

In [None]:
import requests

response = requests.post(
    'http://localhost:11434/api/generate',
    json={
        "model": "llama3",  # Make sure this model is pulled: `ollama pull llama3`
        "prompt": "Explain the law of gravity simply.",
        "stream": False
    }
)

# Print the entire JSON response to debug
print("🧾 Full JSON response:")
print(response.json())


🧾 Full JSON response:
{'model': 'llama3', 'created_at': '2025-06-19T17:15:33.805139Z', 'response': "The law of gravity! It's a fundamental concept in physics that explains how objects interact with each other.\n\nIn simple terms, the law of gravity states:\n\n**Every object in the universe attracts every other object with a force proportional to their mass and distance between them.**\n\nHere's what this means:\n\n1. **Mass**: The more massive an object is (like a planet or a car), the stronger its gravitational pull will be.\n2. **Distance**: As objects get farther apart, the gravitational force between them gets weaker.\n\nTo illustrate this, imagine you have a ball and a friend who is standing 5 feet away from you. If your friend has a small amount of mass (like a feather), the gravitational force between you and them would be very weak. But if they had a large amount of mass (like a bowling ball), the gravitational force would be much stronger.\n\nIn fact, that's why planets orbit 

## Step 0: Setup AI Agent

- Logs the start of the full pipeline.
- Selects and initializes the AI agent based on the user interface (UI) parameter `AI`.
- Supports three AI providers:
  - `openAI` initializes an `OpenAIAgent` with the API keys file.
  - `gemini` initializes a `GeminiAgent` with the API keys file.
  - `llama` initializes a `LlamaAgent` with the API keys file.
- Raises an error if an unsupported AI provider is selected.

> **Note:**  
> Ensure that the API keys file path provided via `ui.params.api_keys_file` is correct and contains valid credentials for the selected AI provider.


In [None]:
LOG.info("🚀 Starting Full Pipeline")

# 0. Setup AI agent
ai_tool: AIAgent
if ui.params.AI == "openAI":
    ai_tool = OpenAIAgent(ui.params.api_keys_file)
elif ui.params.AI == "gemini":
    ai_tool = GeminiAgent(ui.params.api_keys_file)
elif ui.params.AI == "llama":
    ai_tool = LlamaAgent(ui.params.api_keys_file)
else:
    raise ValueError(f"Unsupported AI provider: {ui.params.AI}")

## Step 1: Select Paper Source

- Based on user input, select the source for retrieving papers:
  - If the user specifies `number_papers` and provides a query file (`paper_query_file`), papers are fetched from **Semantic Scholar** using the `SemanticScholarPaperSource` class.
  - If the user provides a ZIP file containing PDFs (`pdfs_file`) without specifying `number_papers`, papers are fetched from the PDFs using the `PDFPaperSource` class.
- Logs errors if:
  - The required query file is missing when using Semantic Scholar.
  - Both or neither `number_papers` and `pdfs_file` are specified.
- Finally, fetches papers from the selected source by calling `fetch_papers()`.

> **Important:**  
> - Make sure to specify either the number of papers and a query file **or** a PDF ZIP file, but not both.
> - The `fetch_papers()` method returns a list of paper metadata dictionaries.


In [None]:
# 1. Select paper source
paper_source: BasePaperSource
if ui.params.number_papers and not ui.params.pdf_zip:
    if not ui.params.paper_query_file:
        LOG.error("❌ paper_query_file is required when using number_papers.")
    paper_source = SemanticScholarPaperSource(
        num_papers=int(ui.params.number_papers),
        query_file=ui.params.paper_query_file,
        output_dir=ui.params.output_dir
    )
elif ui.params.pdfs_file and not ui.params.number_papers:
    paper_source = PDFPaperSource(
        zip_path=ui.params.pdfs_file,
        output_dir=ui.params.output_dir
    )
else:
    LOG.error("❌ Please specify either number_papers or pdfs_file, but not both.")


papers = paper_source.fetch_papers()

## Step 2: Update Ontology Using Retrieved Papers

- Create an instance of `OntologyUpdater`, which enriches the ontology file using:
  - The retrieved papers,
  - The selected AI tool (`ai_tool`),
  - The output directory for saving results.
- The ontology file (`ontology_file`) contains a structured hierarchy of materials science concepts.
- Calls `enrich_with_papers()` to extract relevant concepts and keywords from the papers and integrate them into the ontology.

> **What this step does:**
> - Associates papers with ontology categories by analyzing their content using an LLM agent.
> - Produces structured `ontology_extractions` used for graph construction and ranking in later steps.


In [None]:
# 2. Update ontology
ontology_updater = OntologyUpdater(ui.params.ontology_file, papers, ai_tool, ui.params.output_dir)  # or however you instantiate it
ontology_extractions = ontology_updater.enrich_with_papers()

## Step 3: Rank Papers Based on Ontology and Keywords

- Instantiate the `PaperRanker` with:
  - The selected AI tool (`ai_tool`)
  - The list of fetched papers
  - The enriched ontology output file (`ontology_updater.output_path`)
  - A keyword file containing user-defined or domain-relevant keywords
  - The output directory for storing results

- Call `rank_papers()` to:
  - Match papers to relevant ontology categories using keywords
  - Score and rank papers based on how well they align with the user’s interests
  - Optionally filter out papers dominated by **opposite concepts** (e.g., if a paper mentions “insulator” often when you are looking for “conductor”)

> **What this step does:**
> - Produces a ranked list of relevant papers
> - Generates visual summaries and a CSV of shared ontology category overlaps


In [None]:
# 3. Rank papers
#LOG.info("ranking papers ....")
ranker = PaperRanker(ai_tool, papers, ontology_updater.output_path, ui.params.keywords_file, ui.params.output_dir)
rank_paper_output = ranker.rank_papers()

## Step 4: Build the Knowledge Graph

This step constructs a **knowledge graph** from the updated ontology, allowing for structured querying and visualization of relationships between materials, methods, and properties.

### What happens here:
- Credentials for the target graph database (e.g., Neo4j) are loaded from a JSON file.
- A `KnowledgeGraph` builder is instantiated (e.g., `Neo4jKnowledgeGraph`) using:
  - The enriched ontology file (`updated_ontology.json`)
  - Connection details to the Neo4j server (URI, username, password)
- The `.build_graph()` method builds the actual graph in the database.

> ✅ **Result**: A graph database containing categorized concepts and their relationships, ready for exploration or reasoning tasks.

---

🎉 **The full pipeline is now complete!** You’ve fetched papers, enriched the ontology, ranked relevant publications, and built a graph-based representation of your domain knowledge.


In [None]:
# 4. Build knowledge graph
LOG.info(" Building knowledge graph ....")
### load
with open(ui.params.kg_credentials_file, "r") as f:
    credentials = json.load(f)

#### build graph
graph_builder: KnowledgeGraph
if(ui.params.kg_type == "neo4j"):
    graph_builder = Neo4jKnowledgeGraph(f"{ui.params.output_dir}/updated_ontology.json",
                                        credentials["neo4j_uri"],
                                        credentials["neo4j_user"],
                                        credentials["neo4j_pass"])

graph_builder.build_graph()


LOG.info("🎉 Pipeline Completed Successfully")