Use a link to a "parent Evidence" instead of subclassing #296

rgayon · 2018-11-02T14:48:51Z

Instread of stacking evidence types, as shown in #261, we just implement a link from an Evidence object to a parent Evidence object.
When this parent exists, we'll preprocess before our preprocess (same for postprocess).

When we need to generate new evidences from a context Evidence, which will be processed by all the other Jobs, this new evidence type (ie DockerContainerEvidence) can just extend Evidence, and require that a parent evidence is provided.

aarontp

Don't we also need to change the tasks input/ouput to pass in and then return the actual evidence ancestors? Otherwise when the Evidence is passed to the next task, it will have a dangling link to the parent evidence object, won't it?

rgayon · 2018-11-06T14:51:27Z

Not sure what you mean. I would expect the parent evidence would also be serialized when passed around tasks.

aarontp · 2018-11-06T22:55:01Z

Yeah, that's the part that I meant, :). Based on some quick testing, I think it will work transparently when returning the Evidence object from the Task, but the Evidence serialization that happens from the server (when scheduling tasks) will likely need some changes. You'll probably have to change the evidence serialize() and possibly the to_json() methods to recurse over the ancestors. We should double check what serialization mechanism that Celery uses as well. This is part of why I was originally thinking about implementing this as a new EvidenceChain() object so that it would be explicit what gets passed around, rather than implicit depending on the serialization method.

Another thing to consider here is where/when the parent_evidence attribute gets set. I haven't looked at the task code for this, but I expect that we'll set that wherever the request config data gets copied into the Evidence object (I think in TurbiniaTaskResult.add_evidence()).

aarontp

A couple more comments

turbinia/evidence.py

rgayon · 2018-11-08T14:36:36Z

This is part of why I was originally thinking about implementing this as a new EvidenceChain() object so that it would be explicit what gets passed around, rather than implicit depending on the serialization method.

Adapting the current Evidence objects means we don't have to update all jobs to have them accept this new EvidenceType. I also imagine that Tasks will all have to be rewrite TurbiniaTaskResult setup() and close().
My specific use case is : I need to generate new Evidences in a Task, where:

These new evidences will need to be run through most other workers (so, basically new 'ROOT' evidences, of the same kind we pass to turbiniactl)
I can't copy all the data to be analyzed from the input Evidence

I can't think of any case where we want the "chain" length to be more than 2. This would mean an entire OS environment be buried inside a disk 2 times (VM in a VM in a Guest OS)
I'll make it so that you can't set as a "parent" an Evidence which already has a parent, to avoid weird loops

You'll probably have to change the evidence serialize() and possibly the to_json() methods to recurse over the ancestors. We should double check what serialization mechanism that Celery uses as well.

Celery does pickling, which should preserve everything.
PSQ wants a JSON, I will adapt the jsonifying to serialize/deserialize parent evidence.
Do you expect we will have to support more serialization methods?

Another thing to consider here is where/when the parent_evidence attribute gets set. I haven't looked at the task code for this, but I expect that we'll set that wherever the request config data gets copied into the Evidence object (I think in TurbiniaTaskResult.add_evidence()).
In my DockerContainer test code, I just create a new DockerContainer evidence this way:

container_evidence = DockerContainer(container_id=container_id, parent_evidence=evidence)

ericzinnikas · 2018-11-10T18:18:17Z

FYI I'm looking at moving Celery over to JSON instead of pickle, so we can standardize on just one serialization method.

rgayon · 2018-11-13T19:21:59Z

Next step:
try to have the "chaining" automatically done and stored within add_evidence().
use the same kind of new property of ie 'DockerCOntainerEvidence' (for example mandatory_chaining=True somewhere) to decide whether to build the chain or not.

rgayon · 2018-11-16T14:29:01Z

The place where new Evidences are generated is within the Task run method.
add_evidence() is called from there, but without passing the input Evidence.
The only way I see to access it is from self.input_evidence property. Problem is, this is a list of Evidence.
Is this something that is actually being used as a list?https://github.com/google/turbinia/blob/master/turbinia/workers/__init__.py#L333 seems to just make a list of one Evidence object, and I can't see anywhere where a new Evidence gets appended.

rgayon · 2018-11-27T02:22:11Z

Still needs #309 to be pushed.

rgayon · 2018-11-27T02:44:13Z

#310 contains code used to showcase this PR being used in the context of enumerating Docker containers and creating new DockerEvidence objects, using the DockerEnumerate Task and associated pre/postprocessors

aarontp

Thanks, I think this will work better like this. Just a couple small items.

aarontp · 2018-12-05T05:24:44Z

turbinia/evidence.py

@@ -77,6 +78,8 @@ class Evidence(object):
        processing this evidence.
    cloud_only: Set to True for evidence types that can only be processed in a
        cloud environment, e.g. GoogleCloudDisk.
+    context_dependant: Whethere this evidence required to be built upon the


I think you want s/dependant/dependent/ s/Whethere/Whether/ s/required/is required/ :)

HOWDOIENGLISH
Fixed.

aarontp · 2018-12-05T05:27:19Z

turbinia/evidence.py

-    request_id: The id of the request this evidence came from, if any
+    request_id: The id of the request this evidence came from, if any.
+    parent_evidence: The Evidence object that was used to generate this one, and
+      which pre/post process methods we need to re-execute to access data


Maybe double indent this to match style here?

aarontp · 2018-12-05T06:04:45Z

turbinia/evidence.py

@@ -169,8 +202,6 @@ class RawDisk(Evidence):
  """Evidence object for Disk based evidence.

  Attributes:
-    loopdevice_path: Path to the losetup device for this disk.


Why are these getting removed?

I'm glad you asked. Fixed

aarontp · 2018-12-05T06:08:04Z

turbinia/evidence.py

@@ -77,6 +78,8 @@ class Evidence(object):
        processing this evidence.
    cloud_only: Set to True for evidence types that can only be processed in a
        cloud environment, e.g. GoogleCloudDisk.
+    context_dependant: Whethere this evidence required to be built upon the
+        context of a parent evidence.


Don't we need to set some of the other evidence types as context dependent then? Or does that get done in the next PR when we actually move around the pre/post-processors?

There is currently no other evidence type where this is absolutely needed, but a later PR could indeed make GoogleCloudDiskRawEmbedded dependent to GoogleCloudDisk

rgayon · 2018-12-05T16:18:48Z

Tests are failing because of #315

aarontp

LGTM, thanks!

turbinia/evidence.py

aarontp · 2018-12-05T19:28:08Z

Ok, I tried to make a github suggestion for the docstrings, but it doesn't seem to handle multi-line suggestions very well (so I had to make separate suggestions to delete the follow-up lines), and I don't have permissions to push it anyway. I'll just delete these and send a separate PR so I can get this one merged.

* Pin specific Celery/Kombu/Redis versions (google#305) * in TurbiniaTaskResult, input_evidence doesn't have to be a list (google#309) * input_evidence doesn't have to be a list * Fix exception * current pypi version of google-cloud-storage (1.13.0) requires google-cloud-core 0.28.1 (before the rename of google.cloud.iam (core) to google.api_core.iam (google#315) * Use a link to a "parent Evidence" instead of subclassing (google#296) * parent evidence * undo some simplifications for the sake of a simpler CL * Add some serialization * update docstring * Initialize attribute * set parent evidence if Evidence type is context dependant * don't pass parent_evidence at instantiation * undo linter stuff * comments * fix aim lib breaking tests * typo * Print version on start 3 (google#320) * Add files via upload * Delete turbiniactl.py * Delete turbiniactl.py * Add files via upload * Delete turbiniactl.py * Add files via upload * Update turbiniactl.py * Caps * Quick update to evidence docstrings (google#317) ... to disambiguate between _preprocess() and preprocess(). * Add Job filters (google#247) * Add job filters * fix docstrings. * update docstring * Get jobs filters working with new job manager * Refactor out FilterJobObjects into new method * Update YAPF * remove missed confict markers * Docstrings and renaming * Migrate job graph generator to use new manager (google#321) * Update Evidence local_path when it's saved (google#319) * Pin google-cloud-storage to 1.13.0 (google#326) Fixes google#325 Looks like google-cloud-storage was updated in: googleapis/google-cloud-python#6741 Which just got released as 1.13.1: https://pypi.org/project/google-cloud-storage/#history * Set image export to process all partitions (google#324) * Add --partitions all to image_export invocations * Fix typo * Explicitly set saved_paths to list (google#323) * Move version print after log level setup (google#322) * Move version print after log level setup * Remove extra whitespace * update the pystyle link (google#333) * Undefined name: Define 'unicode' in Python 3 (google#337) * Undefined name: Define 'unicode' in Python 3 __unicode()__ was removed in Python 3 because all __str__ are Unicode. [flake8](http://flake8.pycqa.org) testing of https://github.com/google/turbinia on Python 3.7.1 $ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__ ``` ./tools/turbinia_job_graph.py:47:40: F821 undefined name 'unicode' parser.add_argument('filename', type=unicode, help='where to save the file') ^ 1 F821 undefined name 'unicode' 1 ``` * Placate PyLint * Added PSQ timeout to 1 week (google#336) * Error when worker version doesn't match server google#307 (google#327) * Added turbina_version to TurbinaTask * First approach * Changed to no rise error and return instead * Restored the run from run_wrapper * Changed format of strings * Changed words fixed line too long * bump version

rgayon requested a review from aarontp November 2, 2018 14:56

aarontp reviewed Nov 6, 2018

View reviewed changes

aarontp reviewed Nov 7, 2018

View reviewed changes

turbinia/evidence.py Outdated Show resolved Hide resolved

turbinia/evidence.py Outdated Show resolved Hide resolved

turbinia/evidence.py Show resolved Hide resolved

rgayon force-pushed the dont-stack-really branch from a0a6d24 to 6f49503 Compare November 27, 2018 02:09

rgayon mentioned this pull request Nov 27, 2018

WiP: Use docker Explorer to generate DockerContainer evidence #310

Closed

rgayon requested a review from aarontp November 27, 2018 18:45

aarontp reviewed Dec 5, 2018

View reviewed changes

rgayon added 8 commits December 5, 2018 14:22

parent evidence

e2ec2a3

undo some simplifications for the sake of a simpler CL

ca1c235

Add some serialization

f21d3b9

update docstring

7a1c9a5

Initialize attribute

3a01eb0

set parent evidence if Evidence type is context dependant

6c7fb60

don't pass parent_evidence at instantiation

d88cef5

undo linter stuff

191fc57

rgayon force-pushed the dont-stack-really branch from fab2731 to 191fc57 Compare December 5, 2018 14:33

comments

77de779

rgayon requested a review from aarontp December 5, 2018 16:19

rgayon added 2 commits December 5, 2018 16:30

fix aim lib breaking tests

5d5205c

typo

034b7ff

aarontp approved these changes Dec 5, 2018

View reviewed changes

turbinia/evidence.py Show resolved Hide resolved

turbinia/evidence.py Show resolved Hide resolved

turbinia/evidence.py Show resolved Hide resolved

turbinia/evidence.py Show resolved Hide resolved

turbinia/evidence.py Show resolved Hide resolved

turbinia/evidence.py Show resolved Hide resolved

aarontp merged commit dc4737b into google:master Dec 5, 2018

rgayon mentioned this pull request Dec 6, 2018

(DO BE DELETED when #296 is in) Stack Evidence types #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a link to a "parent Evidence" instead of subclassing #296

Use a link to a "parent Evidence" instead of subclassing #296

rgayon commented Nov 2, 2018 •

edited

Loading

aarontp left a comment

rgayon commented Nov 6, 2018

aarontp commented Nov 6, 2018

aarontp left a comment

rgayon commented Nov 8, 2018

ericzinnikas commented Nov 10, 2018

rgayon commented Nov 13, 2018

rgayon commented Nov 16, 2018

rgayon commented Nov 27, 2018

rgayon commented Nov 27, 2018 •

edited

Loading

aarontp left a comment

aarontp Dec 5, 2018

rgayon Dec 5, 2018

aarontp Dec 5, 2018

rgayon Dec 5, 2018

aarontp Dec 5, 2018

rgayon Dec 5, 2018

aarontp Dec 5, 2018

rgayon Dec 5, 2018

rgayon commented Dec 5, 2018

aarontp left a comment

aarontp commented Dec 5, 2018

Use a link to a "parent Evidence" instead of subclassing #296

Use a link to a "parent Evidence" instead of subclassing #296

Conversation

rgayon commented Nov 2, 2018 • edited Loading

aarontp left a comment

Choose a reason for hiding this comment

rgayon commented Nov 6, 2018

aarontp commented Nov 6, 2018

aarontp left a comment

Choose a reason for hiding this comment

rgayon commented Nov 8, 2018

ericzinnikas commented Nov 10, 2018

rgayon commented Nov 13, 2018

rgayon commented Nov 16, 2018

rgayon commented Nov 27, 2018

rgayon commented Nov 27, 2018 • edited Loading

aarontp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgayon commented Dec 5, 2018

aarontp left a comment

Choose a reason for hiding this comment

aarontp commented Dec 5, 2018

rgayon commented Nov 2, 2018 •

edited

Loading

rgayon commented Nov 27, 2018 •

edited

Loading