New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaying piwik.php requests, ignore the invalid requests parameters exceptions (eg. wrong cid) instead of aborting import #96

Closed
mattab opened this Issue Aug 26, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@mattab
Member

mattab commented Aug 26, 2015

Reproduce

  • Created cid.log with content 5348954:180.153.201.215 - - [02/Aug/2015:21:15:38 +0000] "GET /piwik.php?action_name=hduy uyeuy nuo 36 ??\xAC&idsite=1&rec=1&r=199956&h=0&m=18&s=24&url=http://www.xxx.lt/x?gclid=CL6m2O6oi8cCFY_JtAod_tYIPQ&price=216,456&urlref=http://www.xxxx.lt/ss-aparatai?gclid=CL6m2O6oi8cCFY_JtAod_tYIPQ&_id=54f474c0c50144df&_idts=1438550142&_idvc=1&_idn=0&_refts=1438550142&_viewts=1438550142&_ref=http://www.googleadservices.com/pagead/aclk?sa=L&ai=Cii7VcIi-VeDRGIaRjAaPqK-QBZr23PYFwq3-29QBtcrnsxUIABABIL_gsAMoA2C5A6ABrqKPywPIAQGpAtRzbKu7bLI-qgQiT9Dly94npwbCg5D7SkUsuqSK-SLDbgVFTQ55B4AxabBQloAFkE6IBgGAB7rd8DSQBwOoB6LCG6gHpr4b2AcB&ohost=www.google.com&cid=5GjGWDPimmkdeh1FrgcuUImc3pojqtIChK-EDtUjr4JZmw&sig=AOD64_3GESvwtzmjkBhlZRJcFCvb4QN70g&clui=0&rct=j&q=&ved=0CBoQ0QxqFQoTCIz3keuoi8cCFcKyFAodIQ0Ktw&adurl=http://www.oigai.lt/agepdf=1&qt=0&realp=0&wma=1&dir=0&fla=1&java=1&gears=0&ag=0&cookie=1&res=1366x768&gt_ms=7972 HTTP/1.1" 200 54 "http://www.xx.lt/eagea-aparatai?gclid=CL6m2O6oi8cCFY_JtAod_tYIPQ&price=216,456" "Mozilla/5.0 (Linux; U; Android 4.4.2; zh-cn; GT-I9500 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko)Version/4.0 MQQBrowser/5.0 QQ-Manager Mobile Safari/537.36"
  • Run import command with replay tracking: $ ./misc/log-analytics/import_logs.py --url=localhost/piwik-master cid.log --idsite=1 --replay-tracking

Got

$ ./misc/log-analytics/import_logs.py --url=localhost/piwik-master cid.log --idsite=1 --replay-tracking
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log cid.log...
2015-08-27 06:06:43,037: [INFO] Error when connecting to Piwik: HTTP Error 400: Bad Request
2015-08-27 06:06:43,037: [INFO] Retrying request, attempt number 2
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2015-08-27 06:06:53,322: [INFO] Error when connecting to Piwik: HTTP Error 400: Bad Request
2015-08-27 06:06:53,322: [INFO] Retrying request, attempt number 3
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
4 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2015-08-27 06:07:03,606: [INFO] Error when connecting to Piwik: HTTP Error 400: Bad Request
2015-08-27 06:07:03,606: [INFO] Max number of attempts reached, server is unreachable!
Fatal error: Server returned status 400 (Bad Request).
Is the BulkTracking plugin disabled?
You can restart the import of "cid.log" from the point it failed by specifying --skip=1 on the command line.

Expected behavior

When piwik.php log lines contain any wrong request parameter and Tracking API returns 400 status, then:

  • log analytics should not error but ignore those requests and proceed
  • we should report these bad requests: 6 log lines with invalid request parameters under X requests ignored: in the summary at the end

Fixing this issue will make replaying tracking logs a much more stable experience for Piwik administrators (and easier to automate!).

@mattab mattab added the bug label Aug 26, 2015

@mattab mattab added this to the Current sprint milestone Aug 26, 2015

@diosmosis

This comment has been minimized.

Show comment
Hide comment
@diosmosis

diosmosis Aug 26, 2015

Member

Note: May have to modify BulkTracking to ignore those errors and continue tracking other requests, instead of returning 400. Or returning 400 but tracking other requests?

CC @tsteur, maybe you have an opinion?

Member

diosmosis commented Aug 26, 2015

Note: May have to modify BulkTracking to ignore those errors and continue tracking other requests, instead of returning 400. Or returning 400 but tracking other requests?

CC @tsteur, maybe you have an opinion?

@tsteur

This comment has been minimized.

Show comment
Hide comment
@tsteur

tsteur Aug 27, 2015

Member

I think bulk tracking only returns a HTTP 400 if nothing was tracked, similar to normal tracking request. Not sure what needs to be changed there? It should still return the normal response containing the number of invalid requests along the HTTP 400. At least I think so, haven't checked it

Member

tsteur commented Aug 27, 2015

I think bulk tracking only returns a HTTP 400 if nothing was tracked, similar to normal tracking request. Not sure what needs to be changed there? It should still return the normal response containing the number of invalid requests along the HTTP 400. At least I think so, haven't checked it

@diosmosis

This comment has been minimized.

Show comment
Hide comment
@diosmosis

diosmosis Sep 23, 2015

Member

If we change this line: https://github.com/piwik/piwik/blob/master/plugins/BulkTracking/Tracker/Handler.php#L57 to catch any Exception, the importer will skip those lines & report the invalid line numbers. Not sure if this should be changed though, what do you think @tsteur?

Member

diosmosis commented Sep 23, 2015

If we change this line: https://github.com/piwik/piwik/blob/master/plugins/BulkTracking/Tracker/Handler.php#L57 to catch any Exception, the importer will skip those lines & report the invalid line numbers. Not sure if this should be changed though, what do you think @tsteur?

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Sep 23, 2015

Member

maybe we create new bucket for failed requests eg. failed along with invalid and tracked? http status of the bulk request would be 500 if any of the request failed.

Member

mattab commented Sep 23, 2015

maybe we create new bucket for failed requests eg. failed along with invalid and tracked? http status of the bulk request would be 500 if any of the request failed.

@diosmosis

This comment has been minimized.

Show comment
Hide comment
@diosmosis

diosmosis Sep 23, 2015

Member

bulk request would be 500 if any of the request failed.

this would make the importer fail.

Member

diosmosis commented Sep 23, 2015

bulk request would be 500 if any of the request failed.

this would make the importer fail.

@tsteur

This comment has been minimized.

Show comment
Hide comment
@tsteur

tsteur Sep 24, 2015

Member

It may still return an HTTP 400 or 500 if an exception is thrown here eg during init or track: https://github.com/piwik/piwik/blob/master/core/Tracker.php#L96-L98 . Catching all exception where you mentioned would reduce the possibility that an exception will be caught there and therefore returning a HTTP 400 or 500.

Why did so many requests fail in this case?

In Bulk Requests we won't return an HTTP 400 if nothing was tracked so I think it must have been either a UnexpectedWebsiteFoundException or InvalidRequestParameterException see (https://github.com/piwik/piwik/blob/master/core/Tracker/Handler.php#L86-L93). In all other cases we return an HTTP 500 so far. We already catch and ignore a UnexpectedWebsiteFoundException see link from @diosmosis

InvalidRequestParameterException is thrown if eg VisitorId is invalid which almost must be the case here. It seems unlikely though and I believe the HTTP 400 was maybe actually returned by your webserver? I'm not sure...

@diosmosis the link you mentioned maybe we could at least catch InvalidRequestParameterException as well there as this is an invalid request and not a "failed" one.

Member

tsteur commented Sep 24, 2015

It may still return an HTTP 400 or 500 if an exception is thrown here eg during init or track: https://github.com/piwik/piwik/blob/master/core/Tracker.php#L96-L98 . Catching all exception where you mentioned would reduce the possibility that an exception will be caught there and therefore returning a HTTP 400 or 500.

Why did so many requests fail in this case?

In Bulk Requests we won't return an HTTP 400 if nothing was tracked so I think it must have been either a UnexpectedWebsiteFoundException or InvalidRequestParameterException see (https://github.com/piwik/piwik/blob/master/core/Tracker/Handler.php#L86-L93). In all other cases we return an HTTP 500 so far. We already catch and ignore a UnexpectedWebsiteFoundException see link from @diosmosis

InvalidRequestParameterException is thrown if eg VisitorId is invalid which almost must be the case here. It seems unlikely though and I believe the HTTP 400 was maybe actually returned by your webserver? I'm not sure...

@diosmosis the link you mentioned maybe we could at least catch InvalidRequestParameterException as well there as this is an invalid request and not a "failed" one.

@diosmosis

This comment has been minimized.

Show comment
Hide comment
@diosmosis

diosmosis Sep 24, 2015

Member

InvalidRequestParameterException is thrown if eg VisitorId is invalid which almost must be the case here. It seems unlikely though and I believe the HTTP 400 was maybe actually returned by your webserver? I'm not sure...

I can replicate locally w/ one correct log line and the above failed log line. Instead of one line being imported, the whole request fails, because the exception (I guess InvalidRequestParameterException, I didn't check which exactly) propagates and is caught by core tracker code.

@diosmosis the link you mentioned maybe we could at least catch InvalidRequestParameterException as well there as this is an invalid request and not a "failed" one.

Would fix this specific case, I'll make a PR.

Member

diosmosis commented Sep 24, 2015

InvalidRequestParameterException is thrown if eg VisitorId is invalid which almost must be the case here. It seems unlikely though and I believe the HTTP 400 was maybe actually returned by your webserver? I'm not sure...

I can replicate locally w/ one correct log line and the above failed log line. Instead of one line being imported, the whole request fails, because the exception (I guess InvalidRequestParameterException, I didn't check which exactly) propagates and is caught by core tracker code.

@diosmosis the link you mentioned maybe we could at least catch InvalidRequestParameterException as well there as this is an invalid request and not a "failed" one.

Would fix this specific case, I'll make a PR.

diosmosis added a commit to matomo-org/matomo that referenced this issue Oct 2, 2015

Fixes matomo-org/matomo-log-analytics#96, do not fail bulk tracking r…
…equest if individual request has an incorrect request parameter.
@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Oct 6, 2015

Member

👍

Member

mattab commented Oct 6, 2015

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment