Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

Commit

Permalink
fix urls, filters and conf for doc indexing
Browse files Browse the repository at this point in the history
  • Loading branch information
eherget committed Sep 24, 2017
1 parent 45c2868 commit 627709d
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@
# accept hosts in MY.DOMAIN.NAME
#+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
#+^http://([a-z0-9]*\.)*redhat.com/
# skip the single page html doc. the - line must come first
-^https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html-single/
+^https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/
+^https://github.com/spacewalkproject/spacewalk/wiki/ReleaseNotes27
# skip everything else
Expand Down
19 changes: 19 additions & 0 deletions search-server/spacewalk-doc-indexes/crawl_www/conf/nutch-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@

<configuration>

<property>
<name>http.content.limit</name>
<value>-1</value>
<description>The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will be truncated;
otherwise, no truncation at all.
</description>
</property>

<property>
<name>http.agent.name</name>
<value>Spacewalk</value>
Expand All @@ -24,6 +33,16 @@
</description>
</property>

<property>
<name>http.robots.agents</name>
<value>Spacewalk,*</value>
<description>The agent strings we'll look for in robots.txt files,
comma-separated, in decreasing order of precedence. You should
put the value of http.agent.name as the first agent name, and keep the
default * at the end of the list. E.g.: BlurflDev,Blurfl,*
</description>
</property>

<property>
<name>fetcher.server.delay</name>
<value>0</value>
Expand Down
12 changes: 6 additions & 6 deletions search-server/spacewalk-doc-indexes/crawl_www/urls/urls.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
https://github.com/spacewalkproject/spacewalk/wiki/ReleaseNotes27
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/installation_guide/index
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/proxy_installation_guide/index
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/client_configuration_guide/index
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/user_guide/index
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/getting_started_guide/index
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/api_guide/index
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/installation_guide
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/proxy_installation_guide
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/client_configuration_guide
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/user_guide
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/getting_started_guide
https://access.redhat.com/documentation/en-us/red_hat_satellite/5.8/html/api_guide

0 comments on commit 627709d

Please sign in to comment.