Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape results from Serper results using Olostep #627

Merged
merged 8 commits into from Jan 29, 2024

Conversation

sabaimran
Copy link
Collaborator

  • Use Olostep to go through top results from Serper's google search
  • Add an intermediate step that summarizes extracted web results for further response downstream
  • If answerBox is present in the query results, no need to do the additional scraping
  • Add truncation logic to the model wrapper

… results

- The prompt isn't working great in actually extracting summary information from the target web pages, so this requires further investigation
- Do some minor refactors to pass a symptom prompt to the openai model when making a query
- integrate Olostep in order to perform the webscraping
@sabaimran sabaimran added the upgrade New feature or request label Jan 26, 2024
@sabaimran sabaimran changed the title Scrape results from serper google search Scrape results from Serper results using Olostep Jan 28, 2024
@debanjum
Copy link
Member

Can we add chat actor, director tests to verify the functioning of this /online with website crawl flows? It's getting a little complicated. It'd be good to see what kind of stuff Khoj can do with this vs without it (It'll improve insight into our chatquality benchmarking via pytests)

Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will be super useful for deeper insights from online results! It'd be great if we can add some chat quality tests to evaluate Khoj's improved capabilities with these changes

src/khoj/processor/conversation/utils.py Show resolved Hide resolved
src/khoj/routers/helpers.py Show resolved Hide resolved
src/khoj/processor/tools/online_search.py Outdated Show resolved Hide resolved
@sabaimran sabaimran merged commit b782683 into master Jan 29, 2024
9 checks passed
@sabaimran sabaimran deleted the features/go-through-serp-resulst branch January 29, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upgrade New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants