Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error exception if hostname in logfile is empty = Python script loops forever #126

Closed
justnx opened this issue Feb 6, 2016 · 0 comments
Closed

Comments

@justnx
Copy link
Contributor

@justnx justnx commented Feb 6, 2016

Log Import throw me a Pyhon error exception on specific Log lines that came from Host Spoofed GET calls. It doesn't really matter if the loglines getting logged by Apache, Nginx or Varnishncsa. If spammer calling an GET Request with empty Host Header, it will be logged like that.

In the case with empty Host variable, the python script is looping forever until you kill the process by hand which is annoying since you need to manual interfere in the hanging cronjob every day.

Here is the error exception from import_logs.py:

Parsing log /var/log/nginx.log...
2016-02-05 21:06:45,116: [DEBUG] Site ID for hostname  not in cache
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/opt/piwik/misc/log-analytics/import_logs.py", line 1557, in _run_bulk
    self._record_hits(hits)
  File "/opt/piwik/misc/log-analytics/import_logs.py", line 1702, in _record_hits
    'requests': [self._get_hit_args(hit) for hit in hits]
  File "/opt/piwik/misc/log-analytics/import_logs.py", line 1599, in _get_hit_args
    site_id, main_url = resolver.resolve(hit)
  File "/opt/piwik/misc/log-analytics/import_logs.py", line 1483, in resolve
    return self._resolve_by_host(hit)
  File "/opt/piwik/misc/log-analytics/import_logs.py", line 1469, in _resolve_by_host
    site_id = self._resolve(hit)
  File "/opt/piwik/misc/log-analytics/import_logs.py", line 1440, in _resolve
    site_id = res[0]['idsite']
KeyError: 0

Here you can see the regex i use with the log-line which cause import_logs.py to freak out: https://regex101.com/r/fM8iC2/1

host    [32-32] ``
ip  [33-47] `115.231.222.14`
date    [53-73] `06/Feb/2016:09:33:42`
timezone    [74-79] `+0100`
path    [86-153]    `http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6027916056127199`
status  [164-167]   `401`
length  [168-171]   `590`
referrer    [173-174]   `-`
user_agent  [177-247]   `Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; 360SE)`
generation_time_secs    [249-254]   `0.001`

Piwik Log import should skip such lines or replace on insert it with host = unknown e.g.

@justnx justnx changed the title Error exception when hostname in logfile contains whitespace Error exception when hostname in logfile contains whitespace = looping forever Feb 9, 2016
@justnx justnx changed the title Error exception when hostname in logfile contains whitespace = looping forever Error exception when hostname in logfile contains whitespace = Python script looping forever Feb 9, 2016
@justnx justnx changed the title Error exception when hostname in logfile contains whitespace = Python script looping forever Error exception if hostname in logfile is empty = Python script loops forever Feb 9, 2016
justnx added a commit to justnx/piwik-log-analytics that referenced this issue Apr 30, 2016
@justnx justnx mentioned this issue Apr 30, 2016
@mattab mattab closed this in aa03d9a Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.