<a href="https://colab.research.google.com/github/singhsrj/Artificial-Intelligence-for-Dummies/blob/main/cognee_with_ontology.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Welcome to cognee 🧠**

**cognee** is your toolkit for turning text into a structured knowledge graph, optionally enhanced by ontologies, and then querying it with advanced retrieval techniques. This notebook will guide you through a simple example.

Let's start with installing cognee!


*NOTE: Google colab will ask you to restart the session, restart it and continue with the notebook as normal afterwards.
Ignore pip errors and warnings after restarting the session.*


## 1. Configure Environment Variables and Import cognee 🛠️

Cognee uses OpenAI's gpt-4o-mini model in the default setting. Provide your **OpenAI** API key below.

*Note: OpenAI free tier does not satify the rate limit requirements.*

Please refer to our documentation if you want to use another [remote model](https://docs.cognee.ai/how-to-guides/remote-models) or a [local model](https://docs.cognee.ai/how-to-guides/local-models).

## 2. Upload Sample Data from cognee repo

We'll upload a text file from our repo containing two text variables (`text_1` and `text_2`) which are brief introductions to German car manufacturers and major tech companies.


In [2]:
!wget -O car_and_tech_companies.txt https://raw.githubusercontent.com/topoteretes/cognee/dev/examples/data/car_and_tech_companies.txt
input_text = "/content/car_and_tech_companies.txt"

# uncomment the print statement below to view the file content
# print(open(input_text, 'r').read())

--2025-09-06 19:24:34--  https://raw.githubusercontent.com/topoteretes/cognee/dev/examples/data/car_and_tech_companies.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4321 (4.2K) [text/plain]
Saving to: ‘car_and_tech_companies.txt’


2025-09-06 19:24:34 (29.4 MB/s) - ‘car_and_tech_companies.txt’ saved [4321/4321]



# Scenario 1: cognee With Ontology

Think of an ontology as a map or blueprint for organizing information: it defines “what types of things exist” (e.g. CarManufacturer, CarModel) and the relationships among them (produces, belongsTo). By mapping raw text to this ontology, cognee can create a more structured knowledge graph.


In [28]:
import litellm

In [29]:
litellm._turn_on_debug()

In [37]:
!pip install fastembed

In [39]:
import cognee
import os
# Import the specific configuration getter functions
from cognee.infrastructure.llm.config import get_llm_config
from cognee.infrastructure.databases.vector import get_vectordb_config


In [54]:
import os
import cognee

# --- 1. LLM Configuration (Free local provider with Ollama) ---
llm_settings = {
    "llm_provider": "ollama",  # switch to Ollama
    "llm_model": "llama2:7b",  # example local model
    # Ollama runs locally; typically no API key needed
    # You may set endpoint if applicable, else omit or use default
}

cognee.config.set_llm_config(llm_settings)



In [55]:

# --- 2. Embedding Model Configuration (Local/Free Provider) ---
# Add this section to override the OpenAI embedding default.
# We'll configure cognee to use a free, local model via fastembed.

# Create a dictionary for vector/embedding settings.
# The keys "embedding_provider" and "embedding_model" are standard.
embedding_settings_dictionary = {
    "embedding_provider": "fastembed",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", # Popular local model
    # "vector_db_key": None # Explicitly set key to None if a setter requires it
}

# Use the set_vector_db_config method from config.py to apply settings.
# This assumes embedding settings are part of the vector DB configuration.

# --- Verification ---
print("--- LLM Config ---")
llm_conf = get_llm_config()
print(f"Provider: {llm_conf.llm_provider}")
print(f"Model: {llm_conf.llm_model}")


--- LLM Config ---
Provider: ollama
Model: llama2:7b


In [56]:
# First we'll clean any old data and resets system metadata so we start from a blank slate.
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)

# Next, we add the input text to cognee’s data store.
await cognee.add(input_text)

# Now we'll upload the ontology file from cognee repo. This provides a structure for various types of companies (car manufacturers, tech companies), products they make, and the categories of those products.
!wget -O basic_ontology.owl https://raw.githubusercontent.com/topoteretes/cognee/main/examples/python/ontology_input_example/basic_ontology.owl
ontology_path = "/content/basic_ontology.owl"

# We'll give the ontology file as a parameter to cognify, cognee's main pipeline. Cognify is the process that transforms the raw text into a knowledge graph.
await cognee.cognify(ontology_file_path=ontology_path)


[2m2025-09-06T20:04:56.286027[0m [[32m[1minfo     [0m] [1mGraph deleted successfully.   [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
[2m2025-09-06T20:04:56.292299[0m [[32m[1minfo     [0m] [1mDatabase deleted successfully.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
[2m2025-09-06T20:04:56.667589[0m [[32m[1minfo     [0m] [1mModel not found in LiteLLM's model_cost.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
[1mHTTP Request: POST https://generativelanguage.googleapis.com/chat/completions "HTTP/1.1 404 Not Found"[0m
[1mHTTP Request: POST https://generativelanguage.googleapis.com/chat/completions "HTTP/1.1 404 Not Found"[0m
[1mHTTP Request: POST https://generativelanguage.googleapis.com/chat/completions "HTTP/1.1 404 Not Found"[0m
[1mHTTP Request: POST https://generativelanguage.googleapis.com/chat/completions "HTTP/1.1 404 Not Found"[0m
[1mHTTP Request: POST https://generativelanguage.googleapis.com/chat/completions "HTTP/1.1 404

InstructorRetryException: Error code: 404

### Let's visualize the knowledge graph 👀

Below we'll let cognee render an HTML file for graph visualization. The file will be stored in the artifacts folder and and automatically downloaded.

Please open the downloaded file in your browser to view the graph.




In [None]:
import pathlib
from google.colab import files

notebook_dir = pathlib.Path.cwd()
graph_file_path = (notebook_dir / "artifacts" / "graph_visualization_with_ontology.html").resolve()

await visualize_graph(str(graph_file_path))

files.download('./artifacts/graph_visualization_with_ontology.html')

### We can now ask cognee about the data that we cognify'ed.

We'll use `GRAPH_COMPLETION` as our search type. It retrieves most related entities from the knowledge graph to user query and prompts LLM to answer with it.









In [None]:
search_results_with_ontology = await cognee.search(
    query_type=SearchType.GRAPH_COMPLETION,
    query_text="What are the exact cars and their types produced by Audi?",
)
print(search_results_with_ontology)

# Scenario 2: Base cognee

What if you don’t have an ontology? Cognee can still parse and connect entities out of the box.

Now we'll add the same text input to cognee without ontology to see the difference in the graph and the search result.

In [None]:
# clean up the cognee store again for a fresh start.
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)

# add text input to cognee
await cognee.add(input_text)

# cognify!
await cognee.cognify()

### Let's look at the knowledge graph again, this time without ontology

In [None]:
import pathlib
from google.colab import files

notebook_dir = pathlib.Path.cwd()
graph_file_path = (notebook_dir / "artifacts" / "graph_visualization_base_cognee.html").resolve()

await visualize_graph(str(graph_file_path))

files.download('./artifacts/graph_visualization_base_cognee.html')

### And we'll ask the same question about the car models from Audi ⬇️

In [None]:
search_results_base_cognee = await cognee.search(
    query_type=SearchType.GRAPH_COMPLETION,
    query_text="What are the exact cars and their types produced by Audi?",
)
print(search_results_base_cognee)

# Scenario 3: Traditional vector-based RAG

RAG (Retrieval-Augmented Generation) is another approach that uses vector embeddings to find relevant text chunks, then generates an answer using a language model.

Search type `RAG_COMPLETION` follows this logic, get's a document chunk most related to the user query and prompts the LLM with it.

This differs from `GRAPH_COMPLETION`, which relies on explicit relationships stored in the knowledge graph.


In [None]:
search_results_traditional_rag = await cognee.search(
    query_type=SearchType.RAG_COMPLETION,
    query_text="What are the exact cars and their types produced by Audi?",
)
print(search_results_traditional_rag)

# Let's compare all the results!

Notice how the ontology approach can yield more structured result.


In [None]:
print(search_results_with_ontology)

In [None]:
print(search_results_base_cognee)

In [None]:
print(search_results_traditional_rag)