Symfony documentation bot

It uses an elastic search backend with the symfony docs re-indexed to allow searching per paragraph and yield more exact results, the solr query however still has room for improvement.

It consists of 2 parts, the scraper, which is a spider written for Scrapy and a plugin for CloudBot.

##Tagging/boosting results Specific urls can be tagged by editing the tags.json, it will be loaded and consumed at indexing time. The format is:

{"url": [0.00, ["tag 1","tag n"]]}

Where url is the exact link to the section (including the # part), the value assigned to it is an array that has exactly 2 keys. The first is the boost to be added to the default/computed one, the second is an array of the tags (tags may contain spaces).

The smallest step for boost is 0.05. The boost is applied regardess of the tags, which means if the index is matched for a different reason, the boost specified here will still apply. The boost should only be used for articles that are important to show up in their topic.

Changes are welcome to the tags, but will only be accepted if well reasoned and formatted. The tags can not contain duplicates, including the tags generated by the indexing. At the moment the only tags generated are the component name for urls ending in introduction.html, so for example http://symfony.com/doc/current/components/dependency_injection/introduction.html would have the tag dependency injection generated for it already.

##Synonyms Synonyms are to be declared in synonyms.txt according to the ES format explained here.

##Vagrant up! To run it in Vagrant just cd into the directory and run vagrant up. It will try to use 192.168.100.2 as a private ip. After the vm is up, you will have to edit the config for the bot in /opt/cloudbot/config, it is pretty straightforward. To run the scraping cd into /vagrant/symfony_doc_spider and run scrapy crawl doc, you can add -L INFO for a less verbose output

#####Contributors

Lumbendil

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
cloudbot		cloudbot
symfony_doc_spider		symfony_doc_spider
vagrant		vagrant
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vagrantfile		Vagrantfile
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Symfony documentation bot

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

mitom/symfony-doc-bot

Folders and files

Latest commit

History

Repository files navigation

Symfony documentation bot

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages