Skip to content

elwin/WikiIndex

Repository files navigation

Wiki Index

This project has been created to play around with the Wikipedia dataset. The goal is to calculate the distance between two pages, that is, how many clicks are required to get from one page to the other by only using the links on each page.

First you should download the dataset, e.g. from here. Next, you can build and run the index:

go build
./WikiIndex -i ~/Downloads/simplewiki-20170820-pages-meta-current.xml.bz2 

After that, you can reach the web interface with http://localhost:8080.

About

Retrieve the distance between two wikipedia pages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published