Skip to content
This repository has been archived by the owner on Feb 9, 2023. It is now read-only.

A simple web crawler, indexer, and searcher for Persian Wikipedia based on Scrapy and Elasticsearch.

Notifications You must be signed in to change notification settings

smmsadrnezh/wikipedia_searcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This program is a simple web crawler, indexer and searcher for Persian Wikipedia based on Python3.5 and elasticsearch.

How to run

This plugin is based on Python3.5 and some Python packages. You can install all of them with the following command:

pip3 install -r requirements.txt

How to use

Run elasticsearch service on your computer:

service elasticsearch start

Run the program on your console:

python3.5 start.py

You have to collect your data corpse by choosing the first given option in the console. Next, you do index operations and data clustering. Finally, you have all necessary data to search for it.

TODO

  1. Frontier Queue Prioritization

  2. Using Mutual Information to identify cluster names

  3. Progress Bar for K-means

  4. Finding best K with the given maximum L

About

A simple web crawler, indexer, and searcher for Persian Wikipedia based on Scrapy and Elasticsearch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages