Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 219 lines (215 sloc) 3.291 kb
27783a0 * Split the Robot* directives into their own "robots.cfg" file to…
Kevin Walsh authored
1 RobotUA <<EOR
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
2 adressendeutschland,
6d39282 @jonjensen Add some more user-agents to the robots list
jonjensen authored
3 AdsBot-Google,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
4 agent,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
5 AltaVista,
6d39282 @jonjensen Add some more user-agents to the robots list
jonjensen authored
6 Apache (internal dummy connection),
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
7 appie,
8 AppleSyndication,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
9 Arachnoidea,
10 Aranha,
11 Architext,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
12 archive,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
13 Argus,
14 Ask,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
15 asterias,
16 ATN_Worldwide,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
17 Atomz,
263c49b * More robots recognised.
Kevin Walsh authored
18 AurNet,
5aaf698 * Added some RSS feed readers and other non-browsers, as seen in the
Kevin Walsh authored
19 Awasu,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
20 BackRub,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
21 bender,
5e6986c @racke MSNBot has been renamed to BingBot. Thanks to Justin La Sotten for th…
racke authored
22 bingbot,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
23 Bookdog,
24 BookmarkSync,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
25 bot,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
26 Builder,
6d39282 @jonjensen Add some more user-agents to the robots list
jonjensen authored
27 CCBot,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
28 ccubee,
29 cfetch,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
30 CFNetwork,
6d39282 @jonjensen Add some more user-agents to the robots list
jonjensen authored
31 check_http,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
32 CMC,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
33 collector,
34 complex_network_group,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
35 Contact,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
36 crawl,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
37 Creep,
38 Digital*Integrity,
39 Directory,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
40 dogpile,
3bf143a @racke Add robot DotBot (http://www.dotnetdotcom.org/) to RobotUA.
racke authored
41 DotBot,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
42 Excite,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
43 EZResult,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
44 FavOrg,
5aaf698 * Added some RSS feed readers and other non-browsers, as seen in the
Kevin Walsh authored
45 FeedDemon,
6d39282 @jonjensen Add some more user-agents to the robots list
jonjensen authored
46 FeedFetcher-Google,
5aaf698 * Added some RSS feed readers and other non-browsers, as seen in the
Kevin Walsh authored
47 Feedreader,
48 FeedValidator,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
49 Ferret,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
50 fido,
51 find,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
52 Fireball,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
53 gazz,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
54 GetRight,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
55 gonzo,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
56 Google-Sitemaps,
57 GoogleBot,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
58 grab,
59 griffon,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
60 Gromit,
61 Gulliver,
62 H?m?h?kki,
263c49b * More robots recognised.
Kevin Walsh authored
63 heritrix,
c331da5 * Periodic insertion of search engine robots, and other non-brows…
Kevin Walsh authored
64 HTTrack,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
65 Harvest,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
66 holmes,
5aaf698 * Added some RSS feed readers and other non-browsers, as seen in the
Kevin Walsh authored
67 HTMLDOC,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
68 Hubater,
69 IncyWincy,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
70 index,
71 INGRID,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
72 Jack,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
73 JPluck,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
74 KIT*Fireball,
75 Kototoi,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
76 larbin,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
77 Leech,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
78 legs,
1c761a2 @jonjensen Recognize as a robot LWP::UserAgent in its default configuration
jonjensen authored
79 libwww-perl,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
80 locator,
81 LWP,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
82 Lycos,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
83 marvin,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
84 Mediapartners,
85 MegaSheep,
5a679b0 * Allow comments (starting with "#" and ending with EOL) in the
Kevin Walsh authored
86 MEGAUPLOAD,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
87 Mercator,
5a679b0 * Allow comments (starting with "#" and ending with EOL) in the
Kevin Walsh authored
88 MFC_Tear_Sample,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
89 Microsoft Data Access,
90 Microsoft Office,
c331da5 * Periodic insertion of search engine robots, and other non-brows…
Kevin Walsh authored
91 Microsoft URL Control,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
92 Microsoft-WebDAV,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
93 MimeLive,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
94 mirago,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
95 Miva,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
96 moget,
97 MSFrontPage,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
98 Nazilla,
99 NetMechanic,
100 NetScoop,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
101 newscan,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
102 Nutch,
103 Ocelli,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
104 ozelot,
93ed82b * More non-browsers for the grinder.
Kevin Walsh authored
105 ozzie,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
106 pagebull,
06871a1 @jonjensen Add new robot User-Agent http://panscient.com/
jonjensen authored
107 panscient.com,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
108 ParaSite,
c331da5 * Periodic insertion of search engine robots, and other non-brows…
Kevin Walsh authored
109 pavuk,
93ed82b * More non-browsers for the grinder.
Kevin Walsh authored
110 POE-Component,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
111 Pokey,
112 Pompos,
113 Refiner,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
114 retrieve,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
115 RoboDude,
116 Rover,
5aaf698 * Added some RSS feed readers and other non-browsers, as seen in the
Kevin Walsh authored
117 Rssbandit,
118 RSSOwl,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
119 Rutgers,
120 Scooter,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
121 search,
122 seek,
c331da5 * Periodic insertion of search engine robots, and other non-brows…
Kevin Walsh authored
123 shelob,
6d39282 @jonjensen Add some more user-agents to the robots list
jonjensen authored
124 ShopWiki,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
125 silk,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
126 Slurp,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
127 sna,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
128 Snappy,
129 Snoopy,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
130 speedy,
131 spider,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
132 Spyder,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
133 suke,
134 Susie,
135 swish,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
136 T-H-U-N-D-E-R-S-T-O-N-E,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
137 tarantula,
138 topiclink,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
139 Toutatis,
c331da5 * Periodic insertion of search engine robots, and other non-brows…
Kevin Walsh authored
140 TurnitinBot,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
141 Tv*Merc,
6e8e55f * Twiceler experimental web crawler.
Kevin Walsh authored
142 Twiceler,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
143 urllib,
c331da5 * Periodic insertion of search engine robots, and other non-brows…
Kevin Walsh authored
144 VB Project,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
145 Valkyrie,
146 Voyager,
147 W3C_Validator,
148 Walker,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
149 wget,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
150 WhizBang,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
151 whowhere,
6e8e55f * Twiceler experimental web crawler.
Kevin Walsh authored
152 Wiki,
5aaf698 * Added some RSS feed readers and other non-browsers, as seen in the
Kevin Walsh authored
153 WinInet,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
154 winona,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
155 Wire,
156 Wombat,
157 WordPress,
158 worm,
159 wwwster,
93ed82b * More non-browsers for the grinder.
Kevin Walsh authored
160 WWW-Mechanize,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
161 xtreme,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
162 Yahoo,
163 Yandex,
263c49b * More robots recognised.
Kevin Walsh authored
164 Zeus,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
165 ZyBorg,
27783a0 * Split the Robot* directives into their own "robots.cfg" file to…
Kevin Walsh authored
166 EOR
167
56537e5 @jonjensen Recognize that AskTB is not a robot
jonjensen authored
168 NotRobotUA <<EOR
169 AskTB,
e993135 robots.cfg: Exclude non-robot Seekmo which got caught by "seek"
Richard Templet authored
170 Seekmo,
fba6e86 @perusionjosh Add SearchToolbar to NotRobotUA in robots.cfg.
perusionjosh authored
171 SearchToolbar,
7898616 @machack666 Add MSIE and Gecko UA strings to the default list of NotRobotUA patterns
machack666 authored
172 MSIE,
173 Gecko,
56537e5 @jonjensen Recognize that AskTB is not a robot
jonjensen authored
174 EOR
175
27783a0 * Split the Robot* directives into their own "robots.cfg" file to…
Kevin Walsh authored
176 RobotIP <<EOR
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
177 202.9.155.123,
178 204.152.191.41,
179 208.146.26.19,
180 208.146.26.233,
181 209.185.141.209,
182 209.185.141.211,
183 209.202.148.36,
184 209.202.148.41,
185 216.200.130.207,
186 216.35.103.6?,
187 216.35.103.70,
5a679b0 * Allow comments (starting with "#" and ending with EOL) in the
Kevin Walsh authored
188 82.99.30.?, # Munax
189 82.99.30.1?, # Munax
190 82.99.30.2?, # Munax
191 82.99.30.3?, # Munax
192 82.99.30.4?, # Munax
193 82.99.30.5?, # Munax
194 82.99.30.6?, # Munax
195 82.99.30.70, # Munax
196 82.99.30.71, # Munax
197 82.99.30.72, # Munax
198 82.99.30.73, # Munax
27783a0 * Split the Robot* directives into their own "robots.cfg" file to…
Kevin Walsh authored
199 EOR
200
201 RobotHost <<EOR
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
202 *.analys.google.com,
33c0a12 * Racke suggested (in IRC) that the file would be more version co…
Kevin Walsh authored
203 *.ask.com,
204 *.crawler*.com,
205 *.csccorporatedomains.com,
206 *.excite.com,
207 *.googlebot.com,
208 *.infoseek.com,
209 *.inktomi.com,
210 *.inktomisearch.com,
211 *.lycos.com,
212 *.pa-x.dec.com,
16be811 * The yokels at Microsoft are using a bogus TLD for their MSN spi…
Kevin Walsh authored
213 *.phx.gbl,
263c49b * More robots recognised.
Kevin Walsh authored
214 *.search.live.com,
27783a0 * Split the Robot* directives into their own "robots.cfg" file to…
Kevin Walsh authored
215 add-url.altavista.com,
d63c00e * Re-arranged the entries into alphabetical order, for ease of
Kevin Walsh authored
216 msnbot.msn.com,
27783a0 * Split the Robot* directives into their own "robots.cfg" file to…
Kevin Walsh authored
217 westinghouse-rsl-com-usa.NorthRoyalton.cw.net,
218 EOR
Something went wrong with that request. Please try again.