Skip to content

Tutorial: Nutch

jpmckinney edited this page Nov 26, 2012 · 6 revisions

The following information was contributed by Praful Bagai.

This tutorial assumes that you are customizing the Reuters tutorial.

Widgets

In reuters.js, Update the Solr parameters in var params to reflect the structure of your Solr documents:

  • Update facet.field with the fields on which you want to facet
  • Remove f.topics.facet.limit and f.countryCodes.facet.limit unless your Solr documents have topics or countryCodes fields
  • Remove all facet.date parameters unless your Solr documents have a date field on which you want to facet

Either update or remove the tag cloud, autocomplete, country code and calendar widgets. For the tag cloud, you can set the associated Solr fields by changing the value of var fields.

Theme

Nutch uses a content field, instead of a text field like Reuters. In reuters.theme.js, in the AjaxSolr.theme.prototype.snippet function, replace doc.text with doc.content. Nutch has no dateline field, so remove doc.dateline + ' ' +.

Configuration files

Check the following properties in your nutch-default.xml:

<property>
  <name>fetcher.store.content</name>
  <value>true</value>
  <description>If true, fetcher will store content.</description>
</property>
<property>
  <name>parser.caching.forbidden.policy</name>
  <value>content</value>
  <description>If a site (or a page) requests through its robot metatags
  that it should not be shown as cached content, apply this policy.
Currently
  three keywords are recognized: "none" ignores any "noarchive" directives.
  "content" doesn't show the content, but shows summaries (snippets).
  "all" doesn't show either content or summaries.</description>
</property>

You may also need to copy fields from your Nutch schema to your Solr schema.

Something went wrong with that request. Please try again.