# CoexistAI Tool Tutorial

Welcome to the tutorial for the coexistAI tool! This notebook will guide you through the main functionalities of the tool, including web search, document processing, generative models, answer generation, YouTube summarization, and more. Each section contains explanations and code examples to help you get started quickly.

## 1. Setup and Initialization

First, let's import the required libraries, set up environment variables, and initialize the main components. This ensures that all dependencies are loaded and ready for use.

In [1]:
from utils.utils import *
set_logging(True) 
from langchain_text_splitters import TokenTextSplitter
import os

if not is_searxng_running():
       !docker run --rm \
                 -d -p 30:8080 \
                 -v "${PWD}/searxng:/etc/searxng" \
                 -e "BASE_URL=http://localhost:$PORT/" \
                 -e "INSTANCE_NAME=my-instance" \
                 searxng/searxng
else:
       print("SearxNG docker container is already running.")

os.environ['GOOGLE_API_KEY'] = 'YOUR_API_KEY'  # Replace with your actual key
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=128)
from utils.websearch_utils import *
searcher = SearchWeb(30)  # Initialize web search with a result limit



## 2. Loading Models

Load embedding models and cross-encoders using the `load_model` function. You can choose between different embedding modes such as 'gemini', 'huggingface', or 'infinity_emb'.

In [2]:
hf_embeddings, cross_encoder = load_model("models/embedding-001", _embed_mode='gemini')

[2025-06-07 19:35:15,718] INFO utils.utils: Loading model: models/embedding-001 with embedding mode: gemini
[2025-06-07 19:35:15,718] INFO utils.utils: Loading model: models/embedding-001 with embedding mode: gemini
  from .autonotebook import tqdm as notebook_tqdm
[2025-06-07 19:35:20,841] INFO sentence_transformers.cross_encoder.CrossEncoder: Use pytorch device: mps


## 4. Web Search Integration

Use the `SearchWeb` class to perform web searches and retrieve results. This is useful for augmenting LLMs with up-to-date information from the web.

In [3]:
results = searcher.query_search("latest news in AI", num_results=3)
print(results)

[2025-06-07 19:35:24,031] INFO utils.websearch_utils: Search results for query 'latest news in AI': [{'snippet': 'Anthropic launches Claude AI models for US national security · Reddit sues Anthropic over AI data scraping · Tackling hallucinations: MIT spinout teaches AI to ...', 'title': 'AI News | Latest AI News, Analysis & Events', 'link': 'https://www.artificialintelligence-news.com/', 'engines': ['google'], 'category': 'general'}, {'snippet': 'News coverage on artificial intelligence and machine learning tech, the companies building them, and the ethical issues AI raises today.', 'title': 'AI News & Artificial Intelligence', 'link': 'https://techcrunch.com/category/artificial-intelligence/', 'engines': ['google'], 'category': 'general'}, {'snippet': "Google CEO Sundar Pichai downplays AI jobs threats, says 'it allows us to do more' · FILE — Jensen Huang, CEO of NVIDIA, speaks at NVIDIA GTC in San Jose.", 'title': 'Artificial Intelligence: Read latest news updates on AI ...', 'link'

## 5. Document Conversion from URLs

Convert URLs into document objects using the `urls_to_docs` function. This allows you to process and analyze web content as structured documents.

In [4]:
docs = await urls_to_docs([
    "https://en.wikipedia.org/wiki/India",
    "https://en.wikipedia.org/wiki/Bangalore"
])
print(f"Loaded {len(docs)} documents.")
print(docs[0])

[2025-06-07 19:35:24,048] INFO utils.websearch_utils: Fetching URL: https://en.wikipedia.org/wiki/India
[2025-06-07 19:35:24,052] INFO utils.websearch_utils: Fetching URL: https://en.wikipedia.org/wiki/Bangalore
[2025-06-07 19:35:24,353] INFO utils.websearch_utils: Fetched content from https://en.wikipedia.org/wiki/Bangalore with type text/html; charset=UTF-8
[2025-06-07 19:35:24,366] INFO utils.websearch_utils: Fetched content from https://en.wikipedia.org/wiki/India with type text/html; charset=UTF-8
[2025-06-07 19:35:26,169] INFO utils.websearch_utils: Processed markdown for: https://en.wikipedia.org/wiki/Bangalore
[2025-06-07 19:35:26,182] INFO utils.websearch_utils: Processed markdown for: https://en.wikipedia.org/wiki/India
[2025-06-07 19:35:26,183] INFO utils.websearch_utils: Successfully processed and added document(s) for URL: https://en.wikipedia.org/wiki/India
[2025-06-07 19:35:26,183] INFO utils.websearch_utils: Successfully processed and added document(s) for URL: https://

## 6. Using Generative Models

Initialize and use generative models like `ChatGoogleGenerativeAI` and `ChatOpenAI` for text generation tasks. The `get_generative_model` function helps you select and configure the model.

In [5]:
llmgoogle = get_generative_model(model_name='gemini-1.5-flash',
                    type='google',
                    _tools=None,
                    kwargs={'temperature': 0.1, 'max_tokens': None, 'timeout': None, 'max_retries': 2, 
                            'api_key': os.environ['GOOGLE_API_KEY'],
                            'generation_config':{"response_mime_type": "application/json"}})

# For local (ollama, for others I will add the support in coming weeks)

# llmlocal = get_generative_model(model_name='qwen:0.5b-chat',
#                     type='local',
#                     _tools=None,
#                     kwargs={'temperature': 0.1, 'max_tokens': None, 'timeout': None, 'max_retries': 2})

                generation_config was transferred to model_kwargs.
                Please confirm that generation_config is what you intended.
  llmgoogle = get_generative_model(model_name='gemini-1.5-flash',


In [6]:
web_response = await query_web_response(
    "Top news of today",
    '06-06-2025',
    'Friday',
    searcher,
    hf_embeddings,
    True,
    cross_encoder,
    llmgoogle,
    text_model=llmgoogle,
    num_results=1,
    document_paths=[],
    local_mode=False,
    split=False
)
print(web_response[0])

domain: "googleapis.com"
metadata {
  key: "service"
  value: "generativelanguage.googleapis.com"
}
, locale: "en-US"
message: "API key not valid. Please pass a valid API key."
]. Falling back to prompt-based extraction.
[2025-06-07 19:35:28,398] ERROR utils.answer_generation: Both structured and prompt-based extraction failed: Invalid argument provided to Gemini: 400 API key not valid. Please pass a valid API key. [reason: "API_KEY_INVALID"
domain: "googleapis.com"
metadata {
  key: "service"
  value: "generativelanguage.googleapis.com"
}
, locale: "en-US"
message: "API key not valid. Please pass a valid API key."
]
[2025-06-07 19:35:28,398] ERROR utils.websearch_utils: Error generating search response for query 'Top news of today': not enough values to unpack (expected 3, got 0)


## 9. YouTube Transcript Summarization

Summarize YouTube video transcripts using the `youtube_transcript_response` function. This is useful for extracting insights from long videos.

In [6]:
summary = youtube_transcript_response("https://www.youtube.com/watch?v=o8NiE3XMPrM&t=6648s", 
                                      'summarize in bullets and themes', llmgoogle)
print(summary)

TypeError: string argument expected, got 'ExpatError'

TypeError: string argument expected, got 'ExpatError'

[2025-06-07 19:35:29,580] INFO IPKernelApp: Exception in execute request:
[31m---------------------------------------------------------------------------[39m
[31mTypeError[39m                                 Traceback (most recent call last)
[36mFile [39m[32m~/Documents/llama_index_exp/coexist/coexist_trial/lib/python3.13/site-packages/IPython/core/async_helpers.py:128[39m, in [36m_pseudo_sync_runner[39m[34m(coro)[39m
[32m    120[39m [38;5;250m[39m[33;03m"""[39;00m
[32m    121[39m [33;03mA runner that does not really allow async execution, and just advance the coroutine.[39;00m
[32m    122[39m 
[32m   (...)[39m[32m    125[39m [33;03mCredit to Nathaniel Smith[39;00m
[32m    126[39m [33;03m"""[39;00m
[32m    127[39m [38;5;28;01mtry[39;00m:
[32m--> [39m[32m128[39m     [43mcoro[49m[43m.[49m[43msend[49m[43m([49m[38;5;28;43;01mNone[39;49;00m[43m)[49m
[32m    129[39m [38;5;28;01mexcept[39;00m [38;5;167;01mStopIteration[39;00m [38;5;

## 10. Generating Maps

Generate maps with routes and points of interest using the `generate_map` function. You can visualize directions and locations directly in your notebook.

In [None]:
# Example: Generate a map with a route and POIs
from utils.map import *
s = generate_map("M G Road, Bangalore", "Indiranagar, Bangalore")
from IPython.display import display, HTML
with open("output/map_with_route_and_pois.html") as f:
    html_content = f.read()
# display(HTML(html_content))

2025-06-06 19:40:54,384 [INFO] Found 3 probable locations for 'M G Road, Bangalore'.
2025-06-06 19:40:54,385 [INFO] Auto-selected location: Mahatma Gandhi Road, Tasker Town, Shivajinagar, Bengaluru, Bangalore North, Bengaluru Urban, Karnataka, 560001, India
2025-06-06 19:40:55,716 [INFO] Found 3 probable locations for 'Indiranagar, Bangalore'.
2025-06-06 19:40:55,719 [INFO] Auto-selected location: Indiranagar, Basaveshwaranagar, Bengaluru, Bangalore North, Bengaluru Urban, Karnataka, 560079, India
2025-06-06 19:40:56,064 [INFO] Route found between start and end coordinates.
2025-06-06 19:40:56,852 [INFO] Found 78 POIs near (12.9747828, 77.6096698).
2025-06-06 19:40:57,368 [INFO] Found 12 POIs near (12.9962979, 77.5452778).
2025-06-06 19:40:57,468 [INFO] Map generated and saved as 'map_with_route_and_pois.html'.
2025-06-06 19:40:57,468 [INFO] Generated route directions.


## 11. Advanced Query Handling

Handle advanced queries, such as generating detailed reports or toy examples, using `query_web_response`. This enables complex, multi-step reasoning and content generation.

In [None]:
detailed_report = await query_web_response(
    "Give me end to end working for text diffusion model.",
    '06-06-2025',
    'Friday',
    searcher,
    hf_embeddings,
    True,
    cross_encoder,
    llmgoogle,
    text_model=llmgoogle,
    num_results=3,
    document_paths=[],
    local_mode=False,
    split=False
)
print(detailed_report[0])

[2025-06-06 19:41:00,998] INFO utils.websearch_utils: Search phrases for query 'Give me end to end working for text diffusion model.': ['text diffusion models', 'text diffusion working', 'text diffusion process', 'text diffusion steps']
2025-06-06 19:41:00,998 [INFO] Search phrases for query 'Give me end to end working for text diffusion model.': ['text diffusion models', 'text diffusion working', 'text diffusion process', 'text diffusion steps']
[2025-06-06 19:41:01,794] INFO utils.websearch_utils: Search results for query 'text diffusion models': [{'snippet': 'by Q Yi · 2024 · Cited by 13 — Diffusion models are a kind of math-based model that were first applied to image generation. Recently, they have drawn wide interest in ...', 'title': 'Diffusion models in text generation: a survey', 'link': 'https://peerj.com/articles/cs-1905/', 'engines': ['google'], 'category': 'general'}, {'snippet': 'Gemini Diffusion is our state-of-the-art research model exploring what diffusion means for la

In [None]:
# You can even summarise full page 
detailed_report = await query_web_response(
    "Summarise this full page in detail, and give me learning notes https://ollama.com/blog/secureminions",
    '06-06-2025',
    'Friday',
    searcher,
    hf_embeddings,
    True,
    cross_encoder,
    llmgoogle,
    text_model=llmgoogle,
    num_results=3,
    document_paths=[],
    local_mode=False,
    split=False
)
print(detailed_report[0][0])

[2025-06-06 19:41:27,972] INFO utils.websearch_utils: Search phrases for query 'Summarise this full page in detail, and give me learning notes https://ollama.com/blog/secureminions': ['ollama.com secureminions summary', 'ollama.com secureminions learning']
2025-06-06 19:41:27,972 [INFO] Search phrases for query 'Summarise this full page in detail, and give me learning notes https://ollama.com/blog/secureminions': ['ollama.com secureminions summary', 'ollama.com secureminions learning']
[2025-06-06 19:41:27,973] INFO utils.websearch_utils: Extracted URLs from query 'Summarise this full page in detail, and give me learning notes https://ollama.com/blog/secureminions': ['https://ollama.com/blog/secureminions']
2025-06-06 19:41:27,973 [INFO] Extracted URLs from query 'Summarise this full page in detail, and give me learning notes https://ollama.com/blog/secureminions': ['https://ollama.com/blog/secureminions']
[2025-06-06 19:41:27,973] INFO utils.websearch_utils: Created Document for sourc

In [None]:
# or multiple pages
detailed_report = await query_web_response(
    "Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp",
    '06-06-2025',
    'Friday',
    searcher,
    hf_embeddings,
    True,
    cross_encoder,
    llmgoogle,
    text_model=llmgoogle,
    num_results=3,
    document_paths=[],
    local_mode=False,
    split=False
)
print(detailed_report[0])

[2025-06-06 19:41:40,304] INFO utils.websearch_utils: Search phrases for query 'Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp': ['ollama.com secureminions summary', 'github.com llama.cpp summary']
2025-06-06 19:41:40,304 [INFO] Search phrases for query 'Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp': ['ollama.com secureminions summary', 'github.com llama.cpp summary']
[2025-06-06 19:41:40,305] INFO utils.websearch_utils: Extracted URLs from query 'Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp': ['https://ollama.com/blog/secureminions', 'https://github.com/ggml-org/llama.cpp']
2025-06-06 19:41:40,305 [INFO] Extracted URLs from query 'Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp': ['https://ollama.com/blog/secureminions', 'https://github.com/ggml-org/llama.cpp']
[2025-06-06 19:41:40,305] INFO utils.websearch_utils

[2025-06-06 19:41:48,020] INFO utils.websearch_utils: Response generated for query 'Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp'.
2025-06-06 19:41:48,020 [INFO] Response generated for query 'Summarise https://ollama.com/blog/secureminions and https://github.com/ggml-org/llama.cpp'.


In [None]:
# you can even do search/summary within local files

# or multiple pages
detailed_report = await query_web_response(
    "Summarise this document",
    '06-06-2025',
    'Friday',
    searcher,
    hf_embeddings,
    True,
    cross_encoder,
    llmgoogle,
    text_model=llmgoogle,
    num_results=3,
    document_paths=[['documents/1706.03762v7.pdf']],
    local_mode=True,
    split=False
)
print(detailed_report[0])

[2025-06-06 19:41:55,621] INFO utils.websearch_utils: Search phrases for query 'Summarise this document': ['document summary']
2025-06-06 19:41:55,621 [INFO] Search phrases for query 'Summarise this document': ['document summary']
[2025-06-06 19:41:55,623] INFO utils.websearch_utils: Starting context_to_docs for 1 URL groups.
2025-06-06 19:41:55,623 [INFO] Starting context_to_docs for 1 URL groups.
[2025-06-06 19:41:58,705] INFO utils.websearch_utils: Processing local file: documents/1706.03762v7.pdf
2025-06-06 19:41:58,705 [INFO] Processing local file: documents/1706.03762v7.pdf
[2025-06-06 19:42:01,639] INFO utils.websearch_utils: Processed markdown for: documents/1706.03762v7.pdf
2025-06-06 19:42:01,639 [INFO] Processed markdown for: documents/1706.03762v7.pdf
[2025-06-06 19:42:01,642] INFO utils.websearch_utils: Successfully processed and added document(s) for URL: documents/1706.03762v7.pdf
2025-06-06 19:42:01,642 [INFO] Successfully processed and added document(s) for URL: docume

[2025-06-06 19:42:07,195] INFO utils.websearch_utils: Response generated for query 'Summarise this document'.
2025-06-06 19:42:07,195 [INFO] Response generated for query 'Summarise this document'.


The next cell demonstrates how to summarize Reddit posts and comments from the "OpenAI" subreddit using the `reddit_reader_response` function. It retrieves the top 2 "hot" posts and summarizes their content with the help of the `llmgoogle` generative model. This is useful for quickly extracting key insights from Reddit discussions.

In [None]:
from utils.reddit_utils import *
summary = reddit_reader_response(
  subreddit="OpenAI",
  url_type="hot",
  n=2,
  k=2,
  custom_url=None,
  time_filter="today",
  search_query=None,
  sort_type="hot",
  model=llmgoogle
)
print(summary)

---

This concludes the tutorial for the **coexist**. You have learned how to perform web search, document processing, answer generation, YouTube summarization, and more. For further details, refer to the project README or explore the codebase for advanced usage patterns. 