Clone this wiki locally
The following information was contributed by Praful Bagai and J. Gobel.
This tutorial assumes that you are customizing the Reuters tutorial. It has been tested with Solr 3.6 and Nutch 1.6.
Solr and Nutch
<field name="content" type="text" stored="false" indexed="true"/>
<field name="content" type="text" stored="true" indexed="true"/>
to make the value of the
content field retrievable during a search.
Check the following properties in your
<property> <name>fetcher.store.content</name> <value>true</value> <description>If true, fetcher will store content.</description> </property>
<property> <name>parser.caching.forbidden.policy</name> <value>content</value> <description>If a site (or a page) requests through its robot metatags that it should not be shown as cached content, apply this policy. Currently three keywords are recognized: "none" ignores any "noarchive" directives. "content" doesn't show the content, but shows summaries (snippets). "all" doesn't show either content or summaries.</description> </property>
You may also need to copy fields from your Nutch schema to your Solr schema.
Next, follow this tutorial up to step 3.1. At step 3.1, do not run the command below, then continue up to step 6. You should then be able to log in to your Solr server and search for what Nutch crawled.
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
Download AJAX Solr and unpack the ZIP file into its own directory where your web server can find it.
solrUrl to point to your Solr server, and update the Solr parameters in
var params to reflect the structure of your Solr documents:
facet.fieldwith the fields on which you want to facet, e.g.
[ 'title' ]
f.countryCodes.facet.limitunless your Solr documents have
- Remove all
facet.dateparameters unless your Solr documents have a date field on which you want to facet
Either update or remove the tag cloud, autocomplete, country code and calendar widgets. For the tag cloud, you can set the associated Solr fields by changing the value of
var fields, e.g.
[ 'title', 'url', 'content' ].
Nutch uses a
content field, instead of a
text field like Reuters. In
examples/reuters/widgets/ResultWidget.js, in the
template method, replace all occurrences of
doc.content. Nutch has no
dateline field, so remove all occurrences of
doc.dateline + ' ' +.
You should now be able to open
examples/reuters/index.html in a browser.