solr_conf

This repository currently holds the solr files for indexing news articles from the NY Times news corpus and wikipedia articles from wikimedia xml files

Basic Setup

- Install solr as described here - Navigate to example folder in the directory of your solr installation - Delete the solr folder - Clone this repository, and rename the resulting folder to "solr"

Instance-Specific Setup

NY Times Corpus

- Create a folder - Download all tgz files from the NYT Corpus folder under projects on ublearns, and place them in a folder - Extract the tgz files - Change the newsDataDirectory property on line 10 of newsArticleCollection/core.properties to the ABSOLUTE path of the folder created above. No quotes are required, but for windows, you'll have to replace "\\" with "\\\"

Wikipedia Articles

- Download and extract wikipedia xml files [[TODO:place url here]] and put them in a folder - Repeat the final step for NYC corpus above, replacing newsDataDirectory with wikiDataDirectory and newsArticleCollection with wikiArticleCollection

Usage

Indexing

- Run the solr admin panel, and execute a dataImport on either the newsArticleCollection or wikiArticleCollection core to index news articles and wikipedia articles respectively

OR

Navigate to the flask app base folder
With your virtual environment activated, run the command "python run.py refresh_index "

Searching

- Run queries on the solr admin panel - To search in the flask web application, import and use either the search or get_item helper defined in app/utils.py as shown in app/views.py

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
newsArticleCollection		newsArticleCollection
wikiArticleCollection		wikiArticleCollection
.gitignore		.gitignore
README.md		README.md
README.txt		README.txt
solr.xml		solr.xml
zoo.cfg		zoo.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

solr_conf

Basic Setup

Instance-Specific Setup

NY Times Corpus

Wikipedia Articles

Usage

Indexing

OR

Searching

About

Releases

Packages

Contributors 2

Languages

kenob/solr_conf

Folders and files

Latest commit

History

Repository files navigation

solr_conf

Basic Setup

Instance-Specific Setup

NY Times Corpus

Wikipedia Articles

Usage

Indexing

OR

Searching

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages