Skip to content

flofriday/websearch

Repository files navigation

websearch

Screenshot

Let's build a search engine for the web, just for fun. 🥳

Features

  • Crawling, searching and a web server
  • Single sqlite file to store the index
  • Result ranking (just query to docuemnt match)
  • Possible to index 1k pages in 10sec.

And many more are planned ^^

Build it yourself

You need golang and node to build this project.

npm install
npx tailwindcss -i ./web/style.css -o ./web/static/style.css

go build
./websearch index
./websearch search "Linux"
./websearch server

Note: During development it is handy to let the tailwind command run with the --watch flag in a separate terminal.

Build with docker

# Build the image
docker build -t "websearch" .

# Build the index
docker volume create websearch_index
docker run \
    --rm \
    --mount source=websearch_index,target=/app/data \
    websearch index --sqlite data/index.db

# Serve the index 
docker run \
    --rm \
    -p 8080:8080 \
    --mount source=websearch_index,target=/app/data \
    websearch server --sqlite data/index.db

Profiling

To improve performance it is necessary to know where the bottle-necks are and if the optimization really has the desired impact.

When working on the indexer you can create a profile with the --profile flag and open it with pprof (the flamegraph is quite helpful).

./websearch index -n 500 --profile
go tool pprof -http="localhost:7000" cpu.prof

Architecture

Architecture

Contributing

Almost all improvements are welcome, just try it out and pick a subject that looks interesting to you. 😉

However, if you don't know where to start you can start with a random FIXME and make some progress git grep "fixme" | shuf -n 1.

You can also open the index.db database (for example with DB Browser for SQLite) and find anomalies there and try to improve them.

In any case, just have fun 🥳

About

Let's build a search engine for the web, just for fun. 🥳

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published