The current list of robots that are excluded from download logging is:
out of date
hard coded into a perl module.
Ideally the list of robots would be in it's own file (possibly configurable on a per-archive basis?) in ~/lib/ somewhere, and could be updated by cronjob / Event / Bazaar plugin.
One potential issue is that a script may have to clean legacy data (access table) when new robots are added to the list. If one does that, then derived data (irstats1,2) will have to be re-generated from scratch (also worth noting that both irstats1,2 have their own robots definition).
Lots of processing in sight.... :-/
Imo counter/pirus should clean the data up-stream.
The current list of robots that are excluded from download logging is:
Ideally the list of robots would be in it's own file (possibly configurable on a per-archive basis?) in ~/lib/ somewhere, and could be updated by cronjob / Event / Bazaar plugin.
Comparing: 3.3 vs master
https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/LogHandler.pm#L61-L98
https://github.com/eprints/eprints/blob/master/perl_lib/EPrints/Apache/LogHandler.pm#L251-L474
The current COUNTER list is http://www.projectcounter.org/r4/COUNTER_Robots_list_Jan2014.txt
The text was updated successfully, but these errors were encountered: