Skip to content

kenob/solr_conf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

solr_conf

This repository currently holds the solr files for indexing news articles from the NY Times news corpus and wikipedia articles from wikimedia xml files

Basic Setup

- Install solr as described here - Navigate to example folder in the directory of your solr installation - Delete the solr folder - Clone this repository, and rename the resulting folder to "solr"

Instance-Specific Setup

NY Times Corpus
- Create a folder - Download all tgz files from the NYT Corpus folder under projects on ublearns, and place them in a folder - Extract the tgz files - Change the newsDataDirectory property on line 10 of newsArticleCollection/core.properties to the ABSOLUTE path of the folder created above. No quotes are required, but for windows, you'll have to replace "\\" with "\\\"
Wikipedia Articles
- Download and extract wikipedia xml files [[TODO:place url here]] and put them in a folder - Repeat the final step for NYC corpus above, replacing newsDataDirectory with wikiDataDirectory and newsArticleCollection with wikiArticleCollection

Usage

Indexing

- Run the solr admin panel, and execute a dataImport on either the newsArticleCollection or wikiArticleCollection core to index news articles and wikipedia articles respectively
OR
  • Navigate to the flask app base folder
  • With your virtual environment activated, run the command "python run.py refresh_index "

Searching

- Run queries on the solr admin panel - To search in the flask web application, import and use either the search or get_item helper defined in app/utils.py as shown in app/views.py

About

solr config for titansearch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published