Skip to content

omcodedthis/wraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

wraper

Wraper extracts various details and creates a summary of a webpage. With the use of Ollama & Docker, the details parsed is passed to Orca Mini for analysis. This is Ollama's lightest model with 3B parameters, suitable for systems with at least 8GB of RAM. While more resource-reliant models can also be used, Orca Mini was chosen for the sake of accessbility.

Demo

wraperexample

Above is an example using GitHub's Wikipedia page, & the response generated by wraper. After setting up the Ollama Docker image, wraper will automatically run Docker & start the container.

How does it work?

wraperarch

Ollama's Orca Mini does not have the ability to make web searches as it runs locally, as such, TextGatherer creates a HTTP request & extracts the text of the webpage. With the use of Generative AI, webpages with large amounts of text can be easily summarized into less than 100 words, allowing users to analyse multiple websites much more swiftly.

Given enough computation power, multiple websites can be summarized simultaneously. A possible application would be for the purposes of learning content that is naturally verbose compared to paying for a subscription to utilise other forms of Generative AI technologies.

Getting Started

  • Install Docker & save it to your Program Files folder.

  • Run Ollama's Docker Image using these statements (one time setup):

    docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
    
    docker exec -it ollama ollama run orca-mini
    

Once the inital set-up is completed, Docker & the Ollama container will start automatically when wraper is run. Do note that it may take some time to receive a response, this is entirely dependent on the webpage & your system specifications. You can also utilise a dedicated GPU to run Ollama's models by setting up the NVIDIA Container Toolkit for Linux or systems with WSL2.

About

A web scraper written in Java.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages