Timbrrr is a user agent analysis tool that operates on Apache logs. It includes a log parser as well as a front-end web UI for visualizing user agent trends using MarkLogic Server.
This is by no means a generic or robust way to work with logs. I just needed to scratch an itch.
config directory includes a database configuration package to set the appropriate index settings. You can import this into a MarkLogic 5 instance to create or update a database.
load directory includes scripts for parsing log entries as well as seeding master data about the universe of user agent strings (using User Agent String.com). I’ve used Information Studio in MarkLogic to load the raw access log files into a collection called
After loading and processing, a log document will end up looking something like
<?xml version="1.0" encoding="UTF-8"?> <log id="16792903662867185711"> <ip>111.222.333.444</ip> <timestamp raw="01/Jan/2011:00:00:00 -0700" day-of-week="2">2011-01-01T00:00:00</timestamp> <request raw="GET /path/?foo=bar HTTP/1.1"> <method>GET</method> <url>/path/?foo=bar</url> <protocol>HTTP/1.1</protocol> </request> <responseCode>302</responseCode> <referrer>http://hostname:8000/</referrer> <userAgent raw="Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"> <os_type>Windows</os_type> <agent_type>Browser</agent_type> <agent_version>9.0</agent_version> <agent_language/> <agent_name>Internet Explorer</agent_name> <os_name>Windows 7</os_name> <os_versionNumber/> <os_producerURL/> <agent_languageTag/> <os_versionName/> <linux_distibution>Null</linux_distibution> <os_producer/> <agent_version_major>9</agent_version_major> <agent_version_minor>0</agent_version_minor> </userAgent> </log>
analytics directory includes a UI and some web services to render a visualization of requests by user agent over time. You should set the root of a MarkLogic app server to point to this directory and the context database to be the one you created in the configuration step above.
Copyright 2011 Justin Makeig <https://github.com/jmakeig>
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
DataTables (v1.8.2) is copyright Allan Jardine and is included under the terms of the BSD license.
HighCharts (v2.1.8) is copyright Highsoft Solutions AS and is included under the terms of the Creative Commons Attribution-NonCommercial 3.0 License.