Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic analysis tasks #663

Merged
merged 41 commits into from
Nov 6, 2018
Merged

Automatic analysis tasks #663

merged 41 commits into from
Nov 6, 2018

Conversation

berggren
Copy link
Contributor

@berggren berggren commented Sep 28, 2018

This PR implements automatic analysis of timelines added to Timesketch. It supports analysis plugins to run after indexing, or when a new timeline is added to a sketch.

It implements a simple abstraction for creating new analysis workers taking care of all background tasks scheduling etc.

@berggren berggren requested a review from aarontp October 2, 2018 13:09
@berggren berggren removed the request for review from aarontp October 3, 2018 10:21
@berggren berggren changed the title Auto similarity scorer Automatic analysis tasks Oct 16, 2018
@berggren berggren requested a review from aarontp October 16, 2018 17:59
Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, mostly just some small things.

task_directory = {u'plaso': run_plaso,
u'csv': run_csv_jsonl,
u'jsonl': run_csv_jsonl}
from timesketch.lib import tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just import this at the top?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way Celery works together with Flask. In order to avoid circular imports this is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool.

# If enabled, run sketch analyzers when timeline is added.
try:
if current_app.config[u'ENABLE_SKETCH_ANALYZERS']:
from timesketch.lib import tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same import question here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

timesketch/lib/analyzers/interface.py Outdated Show resolved Hide resolved
sketch_id: The Sketch ID.
"""
self.id = sketch_id
self.sql_sketch = SQLSketch.query.get(sketch_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How safe is this? Should this get wrapped in a try/except block? or will an appropriate exception be raised already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLalchemy returns None if the sketch doesn't exist. This will raise later in the code path but I have added an early raise here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Maybe add a Raises: section in the docstring too then? :)

timesketch/lib/analyzers/interface.py Outdated Show resolved Hide resolved
analyzer_class = manager.AnalysisManager.get_analyzer(analyzer_name)
analyzer = analyzer_class(index_name=index_name, **kwargs)
result = analyzer.run_wrapper()
logging.info('[%s] result: %s' % (analyzer_name, result))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're not using {0:s} and .format() here? I thought it might be because it's a logging line, but there are instances of using format() in logging lines in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# Run psort.py
try:
cmd_output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to add shell=True here in order for psort_path to work when it is set to just psort.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but no, it seems to work without shell=True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. OK.


return cmd_output
return index_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the return type should be changed in the docstrings then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


return {u'Events processed': total_events}
return index_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the return type in docstrings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

db_session.commit()

# If enabled, run sketch analyzers when timeline is added.
from timesketch.lib import tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import at the top of the file? Maybe add a comment about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. Added comment to explain as well.

@berggren
Copy link
Contributor Author

berggren commented Nov 6, 2018

@aarontp PTAL

aarontp
aarontp previously approved these changes Nov 6, 2018
Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aarontp
Copy link
Member

aarontp commented Nov 6, 2018

Re-approved, :)

@berggren berggren merged commit 9a70d94 into master Nov 6, 2018
@berggren berggren deleted the 662-auto-similarity-scorer branch November 6, 2018 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants