Skip to content

Commit

Permalink
* Split the Robot* directives into their own "robots.cfg" file to make
Browse files Browse the repository at this point in the history
      it easier to keep the file up to date.

      If, like me, you have multple Interchange instances per server,
      then you can symlink all of the robots.cfg files together.  This
      means that you only have to edit one file to update all of the lists
      used by all of the Interchange instances.

    * Modified the Robot* lists to reflect the values currently in use on
      my servers.
  • Loading branch information
Kevin Walsh committed Mar 8, 2007
1 parent 12482f4 commit 27783a0
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 27 deletions.
1 change: 1 addition & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,7 @@ dist/lib/UI/vars/UI_STD_FILE_NAV
dist/lib/UI/vars/UI_STD_FOOTER
dist/lib/UI/vars/UI_STD_HEAD
dist/locale.error
dist/robots.cfg
dist/src/compile.pl
dist/src/config.h.in
dist/src/configure
Expand Down
29 changes: 2 additions & 27 deletions dist/interchange.cfg.dist
Original file line number Diff line number Diff line change
Expand Up @@ -119,30 +119,5 @@ include usertag/*.tag
# Include if you want non-English error messages. Not complete.
# include locale.error

RobotUA <<EOR
ATN_Worldwide, AltaVista, Arachnoidea, Aranha, Architext, Ask, Atomz,
BackRub, Builder, CMC, Contact, Digital*Integrity, Directory, EZResult,
Excite, Ferret, Fireball, GoogleBot, Gromit, Gulliver, Harvest, Hubater,
H?m?h?kki, INGRID, IncyWincy, Jack, KIT*Fireball, Kototoi, LWP, Lycos,
MegaSheep, Mercator, Nazilla, NetMechanic, NetResearchServer, NetScoop,
ParaSite, Refiner, RoboDude, Rover, Rutgers, Scooter, Slurp, Spyder,
T-H-U-N-D-E-R-S-T-O-N-E, Toutatis, Tv*Merc, Valkyrie, Voyager, WIRE,
Walker, Wget, WhizBang, Wire, Wombat, Yahoo, Yandex, ZyBorg, appie,
asterias, bot, contact, crawl, collector, fido, find, gazz, grabber,
griffon, archiver, legs, marvin, mirago, moget, newscan, seek, speedy,
spider, suke, tarantula, agent, topiclink, whowhere, winona, worm, xtreme,
EOR

RobotIP <<EOR
202.9.155.123, 204.152.191.41, 208.146.26.19,
208.146.26.233, 209.185.141.209, 209.185.141.211,
209.202.148.36, 209.202.148.41, 216.200.130.207,
216.35.103.6?, 216.35.103.70,
EOR

RobotHost <<EOR
*.crawler*.com, *.excite.com, *.googlebot.com,
*.infoseek.com, *.inktomi.com, *.inktomisearch.com,
*.lycos.com, *.pa-x.dec.com, add-url.altavista.com,
westinghouse-rsl-com-usa.NorthRoyalton.cw.net,
EOR
# Include the robot recognition lists
include robots.cfg
36 changes: 36 additions & 0 deletions dist/robots.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# $Id: robots.cfg,v 2.1 2007-03-08 15:03:01 kwalsh Exp $

RobotUA <<EOR
ATN_Worldwide, AltaVista, Arachnoidea, Aranha, Architext, Argus, Ask,
Atomz, BackRub, Bookdog, BookmarkSync, Builder, CFNetwork, CMC, Contact,
Creep, Digital*Integrity, Directory, EZResult, Excite, FavOrg, Ferret,
Fireball, GoogleBot, Google-Sitemaps, GetRight, Gromit, Gulliver, Harvest,
Hubater, H?m?h?kki, INGRID, IncyWincy, Jack, JPluck, KIT*Fireball, Kototoi,
Leech, LWP, Lycos, Mediapartners, MegaSheep, Mercator, MimeLive, Miva,
Nazilla, NetMechanic, NetScoop, Nutch, Ocelli, ParaSite, Pokey, Pompos,
Refiner, RoboDude, Rover, Rutgers, Scooter, Slurp, Snappy, Snoopy, Spyder,
T-H-U-N-D-E-R-S-T-O-N-E, Toutatis, Tv*Merc, Valkyrie, Voyager,
W3C_Validator, Walker, WhizBang, Wire, Wombat, WordPress, Yahoo, Yandex,
ZyBorg, adressendeutschland, archive, appie, agent, asterias, bot, ccubee,
cfetch, contact, crawl, collector, complex_network_group, dogpile, fido,
find, gazz, gonzo, grab, griffon, holmes, index, larbin, legs, locator,
marvin, mirago, moget, newscan, ozelot, pagebull, retrieve, search, seek,
speedy, silk, sna, spider, suke, swish, tarantula, topiclink, urllib,
voyager, wget, whowhere, winona, worm, wwwster, xtreme,
EOR

RobotIP <<EOR
202.9.155.123, 204.152.191.41, 208.146.26.19,
208.146.26.233, 209.185.141.209, 209.185.141.211,
209.202.148.36, 209.202.148.41, 216.200.130.207,
216.35.103.6?, 216.35.103.70,
EOR

RobotHost <<EOR
*.ask.com, *.crawler*.com, *.csccorporatedomains.com,
*.excite.com, *.analys.google.com, *.googlebot.com,
*.infoseek.com, *.inktomi.com, *.inktomisearch.com,
*.lycos.com, msnbot.msn.com, *.pa-x.dec.com,
add-url.altavista.com,
westinghouse-rsl-com-usa.NorthRoyalton.cw.net,
EOR

0 comments on commit 27783a0

Please sign in to comment.