Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a link to a "parent Evidence" instead of subclassing #296

Merged
merged 11 commits into from
Dec 5, 2018

Conversation

rgayon
Copy link
Collaborator

@rgayon rgayon commented Nov 2, 2018

Instread of stacking evidence types, as shown in #261, we just implement a link from an Evidence object to a parent Evidence object.
When this parent exists, we'll preprocess before our preprocess (same for postprocess).

When we need to generate new evidences from a context Evidence, which will be processed by all the other Jobs, this new evidence type (ie DockerContainerEvidence) can just extend Evidence, and require that a parent evidence is provided.

@rgayon rgayon requested a review from aarontp November 2, 2018 14:56
Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need to change the tasks input/ouput to pass in and then return the actual evidence ancestors? Otherwise when the Evidence is passed to the next task, it will have a dangling link to the parent evidence object, won't it?

@rgayon
Copy link
Collaborator Author

rgayon commented Nov 6, 2018

Not sure what you mean. I would expect the parent evidence would also be serialized when passed around tasks.

@aarontp
Copy link
Member

aarontp commented Nov 6, 2018

Yeah, that's the part that I meant, :). Based on some quick testing, I think it will work transparently when returning the Evidence object from the Task, but the Evidence serialization that happens from the server (when scheduling tasks) will likely need some changes. You'll probably have to change the evidence serialize() and possibly the to_json() methods to recurse over the ancestors. We should double check what serialization mechanism that Celery uses as well. This is part of why I was originally thinking about implementing this as a new EvidenceChain() object so that it would be explicit what gets passed around, rather than implicit depending on the serialization method.

Another thing to consider here is where/when the parent_evidence attribute gets set. I haven't looked at the task code for this, but I expect that we'll set that wherever the request config data gets copied into the Evidence object (I think in TurbiniaTaskResult.add_evidence()).

Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments

turbinia/evidence.py Outdated Show resolved Hide resolved
turbinia/evidence.py Outdated Show resolved Hide resolved
turbinia/evidence.py Show resolved Hide resolved
@rgayon
Copy link
Collaborator Author

rgayon commented Nov 8, 2018

This is part of why I was originally thinking about implementing this as a new EvidenceChain() object so that it would be explicit what gets passed around, rather than implicit depending on the serialization method.

Adapting the current Evidence objects means we don't have to update all jobs to have them accept this new EvidenceType. I also imagine that Tasks will all have to be rewrite TurbiniaTaskResult setup() and close().
My specific use case is : I need to generate new Evidences in a Task, where:

  1. These new evidences will need to be run through most other workers (so, basically new 'ROOT' evidences, of the same kind we pass to turbiniactl)
  2. I can't copy all the data to be analyzed from the input Evidence

I can't think of any case where we want the "chain" length to be more than 2. This would mean an entire OS environment be buried inside a disk 2 times (VM in a VM in a Guest OS)
I'll make it so that you can't set as a "parent" an Evidence which already has a parent, to avoid weird loops

You'll probably have to change the evidence serialize() and possibly the to_json() methods to recurse over the ancestors. We should double check what serialization mechanism that Celery uses as well.

Celery does pickling, which should preserve everything.
PSQ wants a JSON, I will adapt the jsonifying to serialize/deserialize parent evidence.
Do you expect we will have to support more serialization methods?

Another thing to consider here is where/when the parent_evidence attribute gets set. I haven't looked at the task code for this, but I expect that we'll set that wherever the request config data gets copied into the Evidence object (I think in TurbiniaTaskResult.add_evidence()).
In my DockerContainer test code, I just create a new DockerContainer evidence this way:

container_evidence = DockerContainer(container_id=container_id, parent_evidence=evidence)

@ericzinnikas
Copy link
Contributor

FYI I'm looking at moving Celery over to JSON instead of pickle, so we can standardize on just one serialization method.

@rgayon
Copy link
Collaborator Author

rgayon commented Nov 13, 2018

Next step:
try to have the "chaining" automatically done and stored within add_evidence().
use the same kind of new property of ie 'DockerCOntainerEvidence' (for example mandatory_chaining=True somewhere) to decide whether to build the chain or not.

@rgayon
Copy link
Collaborator Author

rgayon commented Nov 16, 2018

The place where new Evidences are generated is within the Task run method.
add_evidence() is called from there, but without passing the input Evidence.
The only way I see to access it is from self.input_evidence property. Problem is, this is a list of Evidence.
Is this something that is actually being used as a list?https://github.com/google/turbinia/blob/master/turbinia/workers/__init__.py#L333 seems to just make a list of one Evidence object, and I can't see anywhere where a new Evidence gets appended.

@rgayon
Copy link
Collaborator Author

rgayon commented Nov 27, 2018

Still needs #309 to be pushed.

@rgayon
Copy link
Collaborator Author

rgayon commented Nov 27, 2018

#310 contains code used to showcase this PR being used in the context of enumerating Docker containers and creating new DockerEvidence objects, using the DockerEnumerate Task and associated pre/postprocessors

@rgayon rgayon requested a review from aarontp November 27, 2018 18:45
Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this will work better like this. Just a couple small items.

@@ -77,6 +78,8 @@ class Evidence(object):
processing this evidence.
cloud_only: Set to True for evidence types that can only be processed in a
cloud environment, e.g. GoogleCloudDisk.
context_dependant: Whethere this evidence required to be built upon the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want s/dependant/dependent/ s/Whethere/Whether/ s/required/is required/ :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HOWDOIENGLISH
Fixed.

request_id: The id of the request this evidence came from, if any
request_id: The id of the request this evidence came from, if any.
parent_evidence: The Evidence object that was used to generate this one, and
which pre/post process methods we need to re-execute to access data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe double indent this to match style here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -169,8 +202,6 @@ class RawDisk(Evidence):
"""Evidence object for Disk based evidence.

Attributes:
loopdevice_path: Path to the losetup device for this disk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these getting removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you asked. Fixed

@@ -77,6 +78,8 @@ class Evidence(object):
processing this evidence.
cloud_only: Set to True for evidence types that can only be processed in a
cloud environment, e.g. GoogleCloudDisk.
context_dependant: Whethere this evidence required to be built upon the
context of a parent evidence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to set some of the other evidence types as context dependent then? Or does that get done in the next PR when we actually move around the pre/post-processors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no other evidence type where this is absolutely needed, but a later PR could indeed make GoogleCloudDiskRawEmbedded dependent to GoogleCloudDisk

@rgayon
Copy link
Collaborator Author

rgayon commented Dec 5, 2018

Tests are failing because of #315

@rgayon rgayon requested a review from aarontp December 5, 2018 16:19
Copy link
Member

@aarontp aarontp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

turbinia/evidence.py Show resolved Hide resolved
turbinia/evidence.py Show resolved Hide resolved
turbinia/evidence.py Show resolved Hide resolved
turbinia/evidence.py Show resolved Hide resolved
turbinia/evidence.py Show resolved Hide resolved
turbinia/evidence.py Show resolved Hide resolved
@aarontp
Copy link
Member

aarontp commented Dec 5, 2018

Ok, I tried to make a github suggestion for the docstrings, but it doesn't seem to handle multi-line suggestions very well (so I had to make separate suggestions to delete the follow-up lines), and I don't have permissions to push it anyway. I'll just delete these and send a separate PR so I can get this one merged.

@aarontp aarontp merged commit dc4737b into google:master Dec 5, 2018
ericzinnikas pushed a commit to ericzinnikas/turbiniafb that referenced this pull request Jan 10, 2019
* Pin specific Celery/Kombu/Redis versions (google#305)

* in TurbiniaTaskResult, input_evidence doesn't have to be a list (google#309)

* input_evidence doesn't have to be a list

* Fix exception

* current pypi version of google-cloud-storage (1.13.0) requires google-cloud-core 0.28.1 (before the rename of google.cloud.iam (core) to google.api_core.iam (google#315)

* Use a link to a "parent Evidence" instead of subclassing (google#296)

* parent evidence

* undo some simplifications for the sake of a simpler CL

* Add some serialization

* update docstring

* Initialize attribute

* set parent evidence if Evidence type is context dependant

* don't pass parent_evidence at instantiation

* undo linter stuff

* comments

* fix aim lib breaking tests

* typo

* Print version on start 3 (google#320)

* Add files via upload

* Delete turbiniactl.py

* Delete turbiniactl.py

* Add files via upload

* Delete turbiniactl.py

* Add files via upload

* Update turbiniactl.py

* Caps

* Quick update to evidence docstrings (google#317)

... to disambiguate between _preprocess() and preprocess().

* Add Job filters (google#247)

* Add job filters

* fix docstrings.

* update docstring

* Get jobs filters working with new job manager

* Refactor out FilterJobObjects into new method

* Update YAPF

* remove missed confict markers

* Docstrings and renaming

* Migrate job graph generator to use new manager (google#321)

* Update Evidence local_path when it's saved (google#319)

* Pin google-cloud-storage to 1.13.0 (google#326)

Fixes google#325

Looks like google-cloud-storage was updated in:
googleapis/google-cloud-python#6741

Which just got released as 1.13.1:
https://pypi.org/project/google-cloud-storage/#history

* Set image export to process all partitions (google#324)

* Add --partitions all to image_export invocations

* Fix typo

* Explicitly set saved_paths to list (google#323)

* Move version print after log level setup (google#322)

* Move version print after log level setup

* Remove extra whitespace

* update the pystyle link (google#333)

* Undefined name: Define 'unicode' in Python 3 (google#337)

* Undefined name: Define 'unicode' in Python 3

__unicode()__ was removed in Python 3 because all __str__ are Unicode.

[flake8](http://flake8.pycqa.org) testing of https://github.com/google/turbinia on Python 3.7.1

$ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__
```
./tools/turbinia_job_graph.py:47:40: F821 undefined name 'unicode'
  parser.add_argument('filename', type=unicode, help='where to save the file')
                                       ^
1     F821 undefined name 'unicode'
1
```

* Placate PyLint

* Added PSQ timeout to 1 week (google#336)

* Error when worker version doesn't match server google#307 (google#327)

* Added turbina_version to TurbinaTask

* First approach

* Changed to no rise error and return instead

* Restored the run from run_wrapper

* Changed format of strings

* Changed words fixed line too long

* bump version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants