Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throws Exceptions when Encountering Real World Logs #50

Closed
astorm opened this issue Oct 10, 2020 · 2 comments
Closed

Throws Exceptions when Encountering Real World Logs #50

astorm opened this issue Oct 10, 2020 · 2 comments

Comments

@astorm
Copy link

astorm commented Oct 10, 2020

Hello there -- first off, thank you for building this and saving us all the trouble of building our own regular expressions to parse Apache's log files.

When I tried using this package on my actual real world Apache logs, it mostly worked. However, there were a number of different lines where it failed to parse logs and threw an exception in my program. Here's one example

My log format looks like this

$parser->setFormat('%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"');

Here's one line that failed to parse

199.195.254.38 - - [27/Sep/2020:19:27:26 +0000] "GET ../../proc/ HTTP" 400 506 "-" "-"

and here's a few others

240e:d9:d800:200::d4 - - [29/Sep/2020:19:52:18 +0000] "\x16\x03\x01" 501 290 "-" "-"

172.105.43.21 - - [30/Sep/2020:01:05:53 +0000] "\x16\x03\x01" 501 290 "-" "-"

Is there a way to configure this library to be less strict when trying to parse these log lines?

If not, do you have any time/interest in enhancing the functionality of this library so it can handle cases like these?

@kassner
Copy link
Owner

kassner commented Oct 11, 2020

Hi. This case is similar to #49, which involves badly formed HTTP requests. Given they're not technically valid, I don't know how much value do you get parsing them, but the $parser->addPattern('%r', '(?P<request>.+)'); trick mentioned there is a good workaround if the main parsing failed.

I'd keep parsing logs with the format you have and have a second instance of LogParser configured with the addPattern and parse the line again to extract things like IP address and User-Agent.

Something like:

$parser = new LogParser();
$parser->setFormat('%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"');

$laxParser = new LogParser();
$laxParser->setFormat('%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"');
$laxParser->addPattern('%r', '(?P<request>.+)');

foreach ($lines as $line) {
    try {
        try {
            $entry = $parser->parse($line);
        } catch (FormatException $e) {
            $entry = $laxParser->parse($line);
        }
    } catch (FormatException $e) {
        continue;
    }

    // process $entry
}

@astorm
Copy link
Author

astorm commented Oct 12, 2020

This case is similar to #49, which involves badly formed HTTP requests.

I'd keep parsing logs with the format you have and have a second instance ...

While it's not what I wanted to hear -- that's a fair philosophy. Closing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants