New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready to land] Detecting suspicious/explosive crashes #1394

Merged
merged 12 commits into from Aug 27, 2013

Conversation

Projects
None yet
4 participants
@shuhaowu
Contributor

shuhaowu commented Aug 9, 2013

Still work in progress. But this should be the idea for the cron job. Feedback will be nice

Right now the model is fairly "dumb". Smarter models could be developed but most of them are based on the existing math (and code framework) and should be fairly easy to add in.

Todos:

  • Testing? What is the policy on this?
  • DB migration: created a new table. Haven't done this yet
  • Further model creation (ARIMA model next. Don't have parameters yet) [DROPPED FOR NOW]
  • Remove some debugging code: I left some in, I'll take them out as I get closer to a more final version.
  • Middleware
  • Front end code
  • Time series graph
  • Legal clarifications for the code
  • Middleware unittests
  • Webapp unittests
  • Middleware documentations

Note about code review: There is a blob of math somewhere in this PR (not sure what the final structure will look like). There are some comments describing them and links to the original implementations/papers. As (if) I develop more complicated models, more math functions (blobs) will be added into the mix. While for some of these stuff I don't have theoretical verifications that I did this correctly, I did experimentally verify them to the best of my abilities (results are in docstrings/comments). If someone wants to mathematically prove it and tell me I'm wrong, please do so :).

Almost ready for merge. There are some optimizations that needs to be done. Though those might be best with Bugzilla bugs.

R?

@shuhaowu

This comment has been minimized.

Contributor

shuhaowu commented Aug 20, 2013

One thing: do i need to test the view that renders the HTML page for the report?

@shuhaowu

This comment has been minimized.

Contributor

shuhaowu commented Aug 20, 2013

Some stuff that we can refine (possibly at a later time), in no particular order.

  • Switch to use reports_clean table to speed up the graph look up, or cache the time series during the cron job.
  • Allow switching of date range in the web UI (backend is ready for this).
  • Allow filtering of products/versions on the web UI.
  • Compute explosiveness for particular products/versions (not sure how easy this would be).

These, with possibly the exception of the first one, should probably be bugzilla bugs.

df: Degrees of freedom.

Notes:
It is not exactly know if there is any problems with this

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

Did you follow up on this?

This comment has been minimized.

@shuhaowu

shuhaowu Aug 21, 2013

Contributor

Yeah, everything about this has been resolved.

@lonnen

View changes

socorro/cron/jobs/suspicious_crashes.py Outdated
# convert.
p = (1 - q) * 2

# Just don't even bother beyond this point. At least the normal cdf

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

clean these comments up

@lonnen

View changes

socorro/cron/jobs/suspicious_crashes.py Outdated
pass


class PredictiveModel(BasicModel):

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

no docstring?

raise NotImplementedError


class SlopeBased(BasicModel):

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

also needs a docstring

@lonnen

View changes

socorro/cron/jobs/suspicious_crashes.py Outdated
self.bin = bin
self.counts = {}

def buckify(self, dt):

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

buckify and unbuckify need docstring oneliners

@lonnen

View changes

socorro/cron/jobs/suspicious_crashes.py Outdated
required_config.add_option(
'training_data_length',
default=10,
doc='The number of days used for the training data feed to the models.'

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

the number of days of training data to feed to the models

@lonnen

View changes

socorro/external/postgresql/crashes.py Outdated
SELECT
COUNT(*)
FROM
reports

This comment has been minimized.

@lonnen

lonnen Aug 21, 2013

Contributor

We've discussed that reports is slow. It's proabably ok to land with this, but if you have time I'd like to see this redone with reports_clean.

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

This kind of query I thought was done through a service that provides crash counts...

This comment has been minimized.

@shuhaowu

shuhaowu Aug 21, 2013

Contributor

I looked at /crashes/frequency. But it doesn't seem to be returning the correct results. I can't find anything else beyond that.

Also: how do we convert the number in the reports table, which is throttled, back to approximately the real values?

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

Let's ask @rhelmer and @peterbe. I see related data in the home_page_graph table, but segmented by product_version_id. I wish to avoid a scan on reports and I wish to decouple this information from the reports table, which is something we may deprecate entirely in the future.

@lonnen

This comment has been minimized.

Contributor

lonnen commented Aug 21, 2013

This PR has some meat on it. Most of the time we'd land this is stages -- cron + database, middleware service, then UI. I left a few things to address, but we should get other sets of eyes on this as well.

It would be nice if @selenamarie could check the migration and give some advice about reports vs reports clean.
I'd also like to see someone take a second look over the django work. You may need to pester individuals directly in IRC to get their attention (or find them in person, since we're all in MV this week).

@selenamarie

View changes

alembic/versions/2c03d8ea0a50_adding_in_a_suspicio.py Outdated
op.create_table(u'suspicious_crash_signatures',
sa.Column(u'id', sa.INTEGER()),
sa.Column(u'signature', sa.VARCHAR(255)),
sa.Column(u'date', sa.TIMESTAMP(timezone=True))

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

Let's call this report_date because date is a reserved word and then would require double quotes when being used.

@selenamarie

View changes

alembic/versions/2c03d8ea0a50_adding_in_a_suspicio.py Outdated

def upgrade():
op.create_table(u'suspicious_crash_signatures',
sa.Column(u'id', sa.INTEGER()),

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

Please name this suspicious_crash_signature_id to make JOIN and USING useful.


Returns crashes that are explosive/suspicious. These crashes should be examined
by people to make sure there are no regressions in product code base.

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

A brief description of what makes crashes "suspicious" would be great

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

+1 to what selena said

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

oops. The next line explains what "explosive" means.

@@ -0,0 +1,382 @@
from __future__ import division

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

We need the moz license boilerplate here.

@selenamarie

View changes

socorro/external/postgresql/models.py Outdated
id = Column(u'id', Integer(), primary_key=True)
signature = Column(u'signature', VARCHAR(length=255))
counts = Column
date = Column(u'date', TIMESTAMP(timezone=True))

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

Said above -- but should be report_date and suspicious_crash_signature_id

@selenamarie

View changes

socorro/middleware/middleware_app.py Outdated
@@ -36,7 +36,7 @@
(r'/crash_data/(.*)', 'crash_data.CrashData'),
(r'/crash/(.*)', 'crash.Crash'),
(r'/crashes/'
r'(comments|daily|frequency|paireduuid|signatures|'
r'(comments|daily|frequency|count_by_day|paireduuid|signatures|'

This comment has been minimized.

@selenamarie

selenamarie Aug 21, 2013

Contributor

We usually don't put underscores in these interfaces, also should be alphabetized to come after comments

This comment has been minimized.

@shuhaowu

shuhaowu Aug 21, 2013

Contributor

I see things like /signaturesummary/report_type/distinct_install/ or like /util/versions_info/

I feel like with an underscore it is a little bit more readable.

@peterbe

View changes

socorro/cron/jobs/suspicious_crashes.py Outdated

from __future__ import division

from datetime import timedelta

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

it's better to use

import datetime
...
datetime.timedelta(...)
from socorro.cron.base import PostgresBackfillCronApp


def mean(x):

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

perhaps better call it seq or iterable instead of just x

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

Also, this is misleading in that if you do mean([1,2]) you don't get 1.5 which is what you might expect.

This comment has been minimized.

@shuhaowu

shuhaowu Aug 22, 2013

Contributor

Wait. This should give you 1.5.

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor
>>> sum([1,2]) / len([1,2])
1

This comment has been minimized.

@shuhaowu

shuhaowu Aug 22, 2013

Contributor

From future import division should make this return 1.5.

Shuhao
Sent from my phone.
On Aug 22, 2013 7:56 AM, "Peter Bengtsson" notifications@github.com wrote:

In socorro/cron/jobs/suspicious_crashes.py:

@@ -0,0 +1,415 @@
+# This Source Code Form is subject to the terms of the Mozilla Public
+# License, v. 2.0. If a copy of the MPL was not distributed with this
+# file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+from future import division
+
+from datetime import timedelta
+from math import sqrt, log, pi, cos, sin, exp
+
+from configman import Namespace
+from socorro.cron.base import PostgresBackfillCronApp
+
+
+def mean(x):

sum([1,2]) / len([1,2])
1


Reply to this email directly or view it on GitHubhttps://github.com//pull/1394/files#r5925946
.

return sum(x) / len(x)


def var(x, ddof=0):

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

again, x is not a great variable name. It's ok with single character variable names in tight loops.

This comment has been minimized.

@shuhaowu

shuhaowu Aug 22, 2013

Contributor

This is a pretty standard name for math functions in general though. x here is just a regular vector.

@peterbe

View changes

socorro/cron/jobs/suspicious_crashes.py Outdated

class SuspiciousCrashesApp(PostgresBackfillCronApp):
app_name = 'suspicious-crashes'

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

Add app_version and app_description too.

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

hm... perhaps it's not worth it. Perhaps none of the others bother.

@peterbe

View changes

webapp-django/crashstats/crashstats/views.py Outdated

@pass_default_context
@utils.json_view
def explosive_data(request, signature=None, date=None, default_context=None):

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

I think you can remove the =None on 'signature' and 'date' because the URL dictates that it will always be something.

@peterbe

View changes

webapp-django/crashstats/crashstats/views.py Outdated
@pass_default_context
@utils.json_view
def explosive_data(request, signature=None, date=None, default_context=None):
explosive_date = datetime.datetime.strptime(date, '%Y-%m-%d')

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor
try:
    explosive_date = datetime.datetime.strptime(date, '%Y-%m-%d') 
except ValueError:
    return http.HttpResponseBadRequest('Invalid date')
@peterbe

View changes

webapp-django/crashstats/crashstats/views.py Outdated
explosive_date = datetime.datetime.strptime(date, '%Y-%m-%d')

now = datetime.datetime.utcnow()
now = datetime.datetime(now.year, now.month, now.day)

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

If you want to convert a datetime.datetime instance to a datetime.date you can use the date() function on it:

>>> x=datetime.datetime.utcnow()
>>> x
datetime.datetime(2013, 8, 22, 14, 53, 2, 533427)
>>> type(x)
<type 'datetime.datetime'>
>>> x.date()
datetime.date(2013, 8, 22)
>>> type(x.date())
<type 'datetime.date'>
@peterbe

View changes

webapp-django/crashstats/crashstats/views.py Outdated
days_ahead = min(max((now - explosive_date).days, 0), 3)

end = explosive_date + datetime.timedelta(days_ahead)
current = explosive_date - datetime.timedelta(10 - days_ahead)

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

That number, 10, can you perhaps instead put it into settings/base.py. That way it becomes settable and if you give it a nice variable name, it becomes clear what it does.

@peterbe

View changes

webapp-django/crashstats/crashstats/urls.py Outdated
url('^explosive' + products + versions + '$',
views.explosive,
name="crashstats.explosive"),
url('^explosive-data/signature/(?P<signature>.+)/date/(?P<date>.+)/?$',

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

Can you change the regex for the date part to \d{4}-\d{2}-\d{2}. That'll make it more explicit how it works.

@peterbe

View changes

webapp-django/crashstats/crashstats/tests/test_views.py Outdated
return Response("""{
"total": 100
}""")

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

One thing that has saved my butt many times is to add an extra raise NotImplementedError(url) after the if statement.
Then, upon some refactoring or something, if you mess up the URLs you really tighted up this mocked_get to only do one job and that is that {"total": ...} stuff.

@peterbe

View changes

webapp-django/crashstats/crashstats/templates/crashstats/explosive_crashes.html Outdated
</div>
{% endfor %}
{% endfor %}
{% if not dates %}

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

Can you use this instead:

{% for date in dates %}
  ...
{% else %}
   Hoooray!
{% endfor %}
@peterbe

View changes

webapp-django/crashstats/crashstats/templates/crashstats/explosive_crashes.html Outdated
{% for signature in explosives[date] %}
<div class="explosive-item" data-signature="{{ signature }}" data-date="{{ date }}" data-url="{{ url('crashstats.report_list') }}?signature={{ signature|urlencode }}&amp;range_value=1&amp;range_unit=days">
<div class="explosive-plot">
<img src="/static/img/ajax-loader.gif" />

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

I think that's supposed to be:

<img src="{{ STATIC_URL }}img/ajax-loader.gif" alt="Loading...">

instead.
We don't want the word "/static" to be hardcoded in the code. If it is in places, that's bad and we should deal with that.

@peterbe

View changes

webapp-django/crashstats/crashstats/templates/crashstats/explosive_crashes.html Outdated
{% endblock %}

{% block page_title %}
Explosive Crashes

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

I know this is nitpicky but why is this different from the title in the <h2> below? Can they be the same?

@@ -0,0 +1,89 @@
/* This Source Code Form is subject to the terms of the Mozilla Public

This comment has been minimized.

@peterbe

peterbe Aug 22, 2013

Contributor

Sorry but I fear that the convention is 4 spaces in Socorro's javascript. I can never remember but I opened a couple of them in crashstats/static/crashstats/js/socorro and they're all 4 spaces.

@peterbe

View changes

socorro/external/postgresql/crashes.py Outdated

params = external_common.parse_arguments(filters, kwargs)

if not params.signature or not params.start_date:

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Break this up into two if statements.

@peterbe

View changes

socorro/external/postgresql/crashes.py Outdated
current += datetime.timedelta(1)

return {"hits": hits, "total": len(hits)}

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

All this hard-coded "%Y-%m-%d" makes me nervous. Make it a variable once and reuse.

This comment has been minimized.

@shuhaowu

shuhaowu Aug 26, 2013

Contributor

The problem is that there's a lot of them across a lot of files. I'm not sure where to put this... really. (if i put it as a variable, I would have to put them in every file where I need them)

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

No, just do it once within the function/method scope. Not as a global/module constant.

This comment has been minimized.

@shuhaowu

shuhaowu Aug 27, 2013

Contributor

This is done.

@peterbe

View changes

socorro/external/postgresql/suspicious.py Outdated

def get(self, **kwargs):
filters = [
('start_date', None, 'string'),

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

string? Shouldn't that be "datetime"?

@peterbe

View changes

socorro/external/postgresql/suspicious.py Outdated
]

params = external_common.parse_arguments(filters, kwargs)
if params['start_date'] is None:

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Change this to
if not params['start_date']:
in case someone sends it in as a blank string and the parse_arguments() doesn't catch that.

@peterbe

View changes

socorro/external/postgresql/crashes.py Outdated
filters = [
("signature", None, "str"),
("start_date", None, "datetime"),
("end_date", None, "datetime")

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

shouldn't this be "date" too now?

@peterbe

View changes

socorro/unittest/cron/jobs/test_suspicious_crashes.py Outdated
today = (utc_now() - datetime.timedelta(1)).strftime('%Y-%m-%d')
for row in cursor.fetchall():
self.assertEquals('sig', row[0])
self.assertEquals(today, row[1].strftime('%Y-%m-%d'))

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

This and line 128 shouldn't need to be strings.

@peterbe

View changes

socorro/unittest/external/postgresql/test_suspicious.py Outdated

sigs = []
for s in cursor:
sigs.append(s[0])

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

sigs = [s[0] for s in cursor]

@peterbe

View changes

socorro/unittest/external/postgresql/test_suspicious.py Outdated
INSERT INTO suspicious_crash_signatures
(signature_id, report_date)
VALUES
({0}, '{1}'::timestamp without time zone)

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

note

@@ -81,6 +81,7 @@
<option value="{{ url('crashstats.topchangers', product) }}" {% if report == 'topchangers' %}selected{% endif %}>Top Changers</option>
<option value="{{ url('crashstats.topcrasher', product) }}" {% if report == 'topcrasher' %}selected{% endif %}>Top Crashers</option>
<option value="{{ url('crashstats.crash_trends', product)}}" {% if report == 'crash_trends' %}selected{% endif %}>Crash Trends</option>
<!-- <option value="{{ url('crashstats.explosive', product) }}" {% if report == 'explosive' %}selected{% endif %}>Explosive Crashes</option> disabled for now-->

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Brilliant! Thanks.

It would be great if you can add something to your bug about "See crashstats_base.html to enable the Explosive report"

This comment has been minimized.

@shuhaowu

shuhaowu Aug 26, 2013

Contributor

Should I file this bug when it land? or now?

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Yes. Please do.
Make it block on this bug you're working on.
That way, you can at least mark your bug as resolved and feel good about calling it done!

This comment has been minimized.

@peterbe

View changes

webapp-django/crashstats/crashstats/tests/test_views.py Outdated

@mock.patch('requests.get')
def test_explosive_data_today(self, rget):
now = datetime.datetime.now()

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Use utcnow() here. It might matter.

@peterbe

View changes

webapp-django/crashstats/crashstats/urls.py Outdated
@@ -76,6 +76,19 @@
url('^topcrasher' + products + versions + os_name + '$',
views.topcrasher,
name='crashstats.topcrasher'),
url('^explosive/?$',

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Why the ?. I think it's fine to use the / always.

context = default_context or {}

# TODO: allow query other periods
days = 5

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

Can you move this number to settings/base.py?

This comment has been minimized.

@shuhaowu

shuhaowu Aug 26, 2013

Contributor

The idea is that later on we will add in the UI to get 3 days, 5 days, 7 days and so forth like top crashes.

I'll file a bug about this when we land this.

This comment has been minimized.

@peterbe

View changes

webapp-django/crashstats/crashstats/views.py Outdated
hits = models.CrashesCountByDay().get(signature=signature,
start_date=start,
end_date=end)['hits']
hits = sorted(hits.items(), key=lambda x: x[0])

This comment has been minimized.

@peterbe

peterbe Aug 26, 2013

Contributor

The default thing when sorting on a tuple is to sort on the first element.
So you can just write
hits = sorted(hits.items())

@peterbe

This comment has been minimized.

Contributor

peterbe commented Aug 27, 2013

I noticed you got two test failures.

One is weird. The one about the private tempdir. I wouldn't be surprised that that'll just work next time you try.

However, the other test you have there is failing bad. I'm sure you can figure it out.

Once you figure out the test and jenkins passes; r+

@shuhaowu

This comment has been minimized.

Contributor

shuhaowu commented Aug 27, 2013

I couldn't replicate both tests locally. I changed the date casting thing and it is running on Jenkins again and it seems like it is working...

shuhaowu added some commits Aug 7, 2013

Added framework for suspicious crash detection.
Not yet complete. Not really tested with real data.

Fixed some typos

Fixed small bug in buckify code
Minor adjustment to the algorithm.
Corrected some simple mistakes.

Also added some integration tests.
Added a view for explosiveness
Boom!

Went back to Evan's implementation.

Added docs, middleware tests

Rebased and got everything hooked up to work.

Fixed views and directory layout.

Nothing like coding at 11:00PM with Rockband in front of you.
Added test for new middleware
Fixed window and added message if no crashes.

Added last view test.

Changed cache to 18 hours as the view is slow

We also don't need to recompute within essentially 24 hours.
Yay! Migrated to using reports_clean
For mware and cron job only.

Changed getting counts from reports_clean

Adjusted based on feedback.

Changes based on feedback

Fixed according to comments

Fix jenkins error

Fixed static URL
Moved count_by_day to get a date range.
Unexposed api

Also fixed a small bug.
Webapp update, generate fake data.
Changed crashes to datetime.

Updated with space
Fixed an issue where cronjob is using today's data
Should be using yesterday's data as that's the time when we have a full
day of data where as we do not have a full day of data today.

Fixed last comment that was forgotten about..

Turns out some files were not added

I was editing the generated copy as oppose to the working copy.
Temporarily disabled report
To be reverted later when we are confident that this works.
Big commit with all fixes.
Fixed things came up in comments.

Additional fixes according to comments.

Date casting change..

shuhaowu added a commit that referenced this pull request Aug 27, 2013

Merge pull request #1394 from shuhaowu/suspicious
Boom! 

Man. Had to wait so long to make that joke.

@shuhaowu shuhaowu merged commit e3305ab into mozilla-services:master Aug 27, 2013

1 check passed

default Jenkins build 'socorro-github' #1476 has succeeded
Details

@shuhaowu shuhaowu deleted the shuhaowu:suspicious branch Aug 27, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment