# Short dlt folder README

1. App analogy 
* `search_knowledge_graph.ipynb` acts like the app’s user interface, where you interact, ask questions, and see results.
* `filesystem_pipeline.py` and `mount_cognee.py` are like the app’s backend program, working behind the scenes to gather, organize, and structure the data that powers the user interface.
* `api_spec_ontology.owl` is like the app’s data model or schema. It sets the rules for how information is structured, connected, and understood by the backend, ensuring the data is organized in a meaningful and consistent way for the user interface to access and display.

2. Library analogy 
* `filesystem_pipeline.py` is like a librarian who goes out to collect books (API documentation) from different sources, organizes them, and puts them into a cataloged storage room (structured CSV files).
* `mount_cognee.py` is like another librarian who takes those cataloged books, arranges them on the library shelves according to a special system (the ontology), and creates a map of the library (the knowledge graph) so you can find information easily.
* `api_spec_ontology.owl` is like the library’s cataloging system or classification rules (such as Dewey Decimal or Library of Congress). It defines how books (knowledge) are organized, labeled, and related on the shelves, so you can find and connect information efficiently.
* `search_knowledge_graph.ipynb` is the library visitor, using the map and the organized shelves to quickly find answers to your questions by searching through the library’s resources.

## Sending a search query to the knowledge graph (using the Cognee API)

Ask questions about the knowledge graph to quickly find specific information, discover relationships between data points, or extract insights from large and complex datasets. This is useful for research, data analysis, troubleshooting, or building applications that need to retrieve structured knowledge efficiently. By querying the knowledge graph, you can get precise answers without manually searching through documentation or raw data.

In [1]:
import cognee
from cognee.api.v1.search import SearchType
from cognee.modules.engine.models import NodeSet


[2m2025-06-29T15:38:48.088189[0m [[32m[1minfo     [0m] [1mDeleted old log file: C:\Users\Administrator\Documents\GitHub\venvs\llm-zoomcamp\lib\site-packages\logs\2025-06-27_18-04-02.log[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-06-29T15:38:48.088189[0m [[32m[1minfo     [0m] [1mLogging initialized           [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m [36mcognee_version[0m=[35m0.1.44[0m [36mos_info[0m=[35m'Windows 10 (10.0.22631)'[0m [36mpython_version[0m=[35m3.10.5[0m [36mstructlog_version[0m=[35m25.4.0[0m

[2m2025-06-29T15:38:48.088189[0m [[32m[1minfo     [0m] [1mWant to learn more? Visit the Cognee documentation: https://docs.cognee.ai[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[1mHTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"[0m


## Define an asynchronous Python function to ask a specific question about Ticketmaster API endpoints

An asynchronous Python function allows you to write code that can pause and wait for other operations (like network requests or file I/O) to complete without blocking the rest of your program. This is useful for tasks that take time, such as calling APIs or reading files, because it lets your program do other work while waiting.

In [None]:
async def search_cognee(query, node_set, query_type=SearchType.GRAPH_COMPLETION):
    answer = await cognee.search( # Await keyword 'cognee.search' to pause execution
        query_text=query, # The text to search for
        query_type=query_type, # The type of query
        node_type=NodeSet, # The type of nodes to search
        node_name=node_set # The specific nodes to search within
    )
    return answer

## Call the search_cognee function

The code is making multiple HTTP POST requests to the OpenAI API to generate embeddings (vector representations) for your query and possibly for documents in the knowledge graph. The Cognee library (imported as import cognee) handles all communication with OpenAI's API internally. When you call cognee.search, Cognee acts as the client, sending requests to OpenAI and managing authentication, model selection, and API calls behind the scenes

In [None]:
# Retrieve a list of available API endpoints and their URLs
results = await search_cognee(
    query="What API endpoints are in the Ticketmaster api? Give me specific endpoint urls.",
    node_set=['developer.ticketmaster.com']
)


[1mEmbeddingRateLimiter initialized: enabled=False, requests_limit=60, interval_seconds=60[0m
[1mHTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"[0m
[1mHTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"[0m
[1mHTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"[0m[92m16:07:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m
[1mHTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"[0m
[1mHTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"[0m
[1mHTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"[0m[92m16:07:28 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-l

## Answer to the query
The API knows to answer specifically about authorization because the query asks, "What auth info do I need in the ticketmaster API?" The search_cognee function sends this question to the knowledge graph, which contains structured information about the Ticketmaster API. The backend uses language models and embeddings to understand the question and retrieve the most relevant information—in this case, details about authorization requirements. The answer is generated based on the data and documentation stored in the knowledge graph, matching the query to the appropriate content.

In [None]:
print(results[0])

Here are some of the API endpoints in the Ticketmaster API:

1. Event Search: `https://developer.ticketmaster.com/explore` - Power your event discovery experience.
2. Partner API: `https://developer.ticketmaster.com/support/partner-api-faq` - Access to the Partner API for specific use cases.
3. Ticket Management: `https://developer.ticketmaster.com/explore` - Validate tickets and manage scanning devices.
4. Inventory Status: `https://developer.ticketmaster.com/explore` - Get status for primary Ticketmaster inventory with real-time updates.
5. Shipping Options: [GET] Shipping endpoint - Fetch valid shipping options for an event.


In [None]:
# Retrieve information about authentication requirements for using the Ticketmaster API
results = await search_cognee(
    query="What auth info do i need in the ticketmaster API?",
    node_set=['developer.ticketmaster.com']
)

[92m16:07:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:07:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:07:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:07:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:07:39 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for c

In [6]:
print(results[0])

To use the Ticketmaster API, you need authorization. Specifically, it requires API authorization to access its services.


## Get pagination 
How to handle paginated results when making requests to the Ticketmaster API. What parameters or methods are required to retrieve additional pages of data (such as events, tickets, or results) if there are too many to return in a single response. The answer would explain how to request the next set of results, such as using page numbers, offsets, or tokens, according to the Ticketmaster API's documentation.

In [7]:
results = await search_cognee(
    query="What pagination do i need in the ticketmaster API?",
    node_set=['developer.ticketmaster.com']
)

[92m16:08:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:08:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:08:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:08:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m16:08:21 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for c

In [8]:
print(results[0])

To implement pagination in the Ticketmaster API, you typically need to use parameters such as `limit` to specify the number of results per page and `offset` to determine the starting point of the results. The API documentation should provide details on how to use these parameters effectively.


In [6]:
results = await search_cognee(
    query="What pagination do i need in the ticketmaster API?",
    node_set=['developer.ticketmaster.com']
)

[92m17:39:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m17:39:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m17:39:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m17:39:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m17:39:49 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for c

## Broad Search
Above call asks a specific question and searches only within the node_set `['developer.ticketmaster.com']`. The answer will be focused on pagination details for the Ticketmaster API.

Below broader question searches within the node_set `['docs']` (which may contain general documentation) and explicitly sets `query_type=SearchType.RAG_COMPLETION` (RAG). This approach combines retrieving relevant documents from the knowledge graph with generating a natural language answer, often resulting in more comprehensive or context-aware responses. It is useful for broader or more open-ended questions, as opposed to the default method, which may focus on direct graph completion or structured lookups.

In summary: The knowledge graph is built by extracting, structuring, and connecting information from reliable documentation and data sources, so it can answer questions accurately and efficiently.

In [28]:
results = await search_cognee('''
    What sort of API information is in this knowledge graph?
    ''', 
    node_set=['docs'], query_type=SearchType.RAG_COMPLETION)

[92m18:07:00 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m18:07:00 - LiteLLM:INFO[0m: utils.py:2929 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai
[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m[92m18:07:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/gpt-4o-mini-2024-07-18
[1mselected model name for cost calculation: openai/gpt-4o-mini-2024-07-18[0m[92m18:07:03 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/gpt-4o-mini-2024-07-18
[1mselected model name for cost calculation: openai/gpt-4o-mini-2024-07-18[0m

In [29]:
print(results[0])

The monday.com API information includes the ability to read and update data within a monday.com account using GraphQL. It supports operations on boards, items, column values, users, and workspaces. Use cases for the API involve accessing board data for reports, creating items when records are created in other systems, and importing data. Admins, members, and guests can use the API, with varying levels of access. The API supports multiple monday products, excluding Workforms, and provides a comprehensive GraphQL schema for querying and mutations.


Using just the `docs` node set does not work - since it goes over v limited nodes 

## Focused Search
The answer will be relevant only to the Ticketmaster API information, rather than searching across all available documentation or data in the knowledge graph.

In [36]:
results = await search_cognee('''
    What sort of API information is in this knowledge graph?
    ''', 
    node_set=['developer.ticketmaster.com'])

[92m18:10:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m18:10:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m18:10:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m18:10:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for cost calculation: openai/text-embedding-3-large[0m[92m18:10:38 - LiteLLM:INFO[0m: cost_calculator.py:655 - selected model name for cost calculation: openai/text-embedding-3-large
[1mselected model name for c

In [37]:
print(results[0])

The knowledge graph contains information about various types of APIs, including 'publish api', 'presence api', 'top picks api', 'partner api', 'international discovery api', and 'discovery api'. These APIs are categorized under a general 'api' node, with relationships indicating that each specific API is a type of the general API.
