-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import_logs.py and IIS/w3c date format #6968
Comments
The reason the regex fails is because the regex does not use named groups. The script doesn't know which group is the 'date' group (as well as all the other required groups). I'll run some tests to see if the script will work when supplying logs via stdin. If not, I'll see if I can get it to work. |
@kevinjc I modified the importer so logs in the W3C extended log file format can be imported from stdin. To do this, run the script with the
|
Copied latest and ran quick test with the following error: Using the following configuration in the wrapper script: --log-format-name=w3c_extended --w3c-time-taken-millisecs \ |
@kevinjc I recognize the error and it shouldn't occur w/ the code in master... Can you provide an example log file w/ one or two log lines (please include the |
I cannot count on the #Fields line to be present when it runs since it is a constant stream from the syslog-ng collector. This is one of the reasons why I was thinking I needed to pursue the regex option. I'm happy not to (use regex) if that is possible! Note: I have turned off a few of the fields but I can re-enable them if it is necessary: |
I now have snippet of logs with the commented lines to run for testing. It gets closer but now I am getting a 500 error on import from my Apache server running piwik. Is this because it has not updated to 2.10.0? Is this version required? I thought the import_logs.py was a little more independent of the version on the server, but that just may have been an assumption on my part. |
The #Fields line is necessary in order to build the regex used to parse log lines... You could get away w/ supplying a regex, but this approach is more error prone; small mistakes in the regex can cause problems that are hard to diagnose. I think I can create a middle ground, however. I'll add a new option --w3c-fields so you can specify the fields format in the log importer command.
This is the intent, however, changes made to Piwik's reporting and tracking APIs (both of which the log importer depends on) can create incompatibilities between log importer and Piwik versions. It is of course recommended to update to the newest available version, but you can work around this specific error by changing line 1028 to |
…3C extended log file format can be imported from stdin w/o a '#Fields:' line being present.
@kevinjc You should be able to import logs w/o the '#Fields:' line w/ the following options:
|
…files in W3C extended log file format can be imported from stdin w/o a '#Fields:' line being present.
I am using the import_logs.py and feeding it IIS w3c based logs via STDIN from my syslog-ng collector. Syslog-ng is configured to send only the message to the wrapper script.
The wrapper script is configured to give the python script the regex to match format since the log format is not everything but just the important fields. I think the w3cextended class in the python script is configured to look for the header fields through file seek, so STDIN would probably have to use regex anyway.
However, the problem is that the date does not seem to validate known date formats by the python script. Here is my regex pattern and the debug output:
--log-format-regex='(?P^\d+[-\d]+\s\d+[:\d+]+) (\S+) (?P.?) (?P<query_string>\S) (?P\S+) (?P[\d_.]) (?P<user_agent>.?) (?P._?) ((?P[\w-.]*)(?::\d+)?) (?P\d+) (?P\S+) (?P<generation_time_secs>\d+)' \
2015-01-09 08:19:09,286: [DEBUG] Invalid line detected (invalid date): 2015-01-07 19:47:50 GET /tenbanana/rifd/_scripts/showGrid.js _=1420660073409 - 192.168.86.240 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/20100101+Firefox/31.0 https://mytestserv2.local/oranges/rifd/?fuseaction=planData.incView&selectedInc=10&nav_type=E&nav_link=manf_d mytestserv1.local 200 8602 14
The text was updated successfully, but these errors were encountered: