-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a link to a "parent Evidence" instead of subclassing #296
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need to change the tasks input/ouput to pass in and then return the actual evidence ancestors? Otherwise when the Evidence is passed to the next task, it will have a dangling link to the parent evidence object, won't it?
Not sure what you mean. I would expect the parent evidence would also be serialized when passed around tasks. |
Yeah, that's the part that I meant, :). Based on some quick testing, I think it will work transparently when returning the Evidence object from the Task, but the Evidence serialization that happens from the server (when scheduling tasks) will likely need some changes. You'll probably have to change the evidence Another thing to consider here is where/when the parent_evidence attribute gets set. I haven't looked at the task code for this, but I expect that we'll set that wherever the request config data gets copied into the Evidence object (I think in TurbiniaTaskResult.add_evidence()). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments
Adapting the current Evidence objects means we don't have to update all jobs to have them accept this new EvidenceType. I also imagine that Tasks will all have to be rewrite
I can't think of any case where we want the "chain" length to be more than 2. This would mean an entire OS environment be buried inside a disk 2 times (VM in a VM in a Guest OS)
Celery does pickling, which should preserve everything.
|
FYI I'm looking at moving Celery over to JSON instead of pickle, so we can standardize on just one serialization method. |
Next step: |
The place where new Evidences are generated is within the Task run method. |
a0a6d24
to
6f49503
Compare
Still needs #309 to be pushed. |
#310 contains code used to showcase this PR being used in the context of enumerating Docker containers and creating new DockerEvidence objects, using the DockerEnumerate Task and associated pre/postprocessors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this will work better like this. Just a couple small items.
turbinia/evidence.py
Outdated
@@ -77,6 +78,8 @@ class Evidence(object): | |||
processing this evidence. | |||
cloud_only: Set to True for evidence types that can only be processed in a | |||
cloud environment, e.g. GoogleCloudDisk. | |||
context_dependant: Whethere this evidence required to be built upon the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want s/dependant/dependent/ s/Whethere/Whether/ s/required/is required/ :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HOWDOIENGLISH
Fixed.
turbinia/evidence.py
Outdated
request_id: The id of the request this evidence came from, if any | ||
request_id: The id of the request this evidence came from, if any. | ||
parent_evidence: The Evidence object that was used to generate this one, and | ||
which pre/post process methods we need to re-execute to access data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe double indent this to match style here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
turbinia/evidence.py
Outdated
@@ -169,8 +202,6 @@ class RawDisk(Evidence): | |||
"""Evidence object for Disk based evidence. | |||
|
|||
Attributes: | |||
loopdevice_path: Path to the losetup device for this disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these getting removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you asked. Fixed
@@ -77,6 +78,8 @@ class Evidence(object): | |||
processing this evidence. | |||
cloud_only: Set to True for evidence types that can only be processed in a | |||
cloud environment, e.g. GoogleCloudDisk. | |||
context_dependant: Whethere this evidence required to be built upon the | |||
context of a parent evidence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to set some of the other evidence types as context dependent then? Or does that get done in the next PR when we actually move around the pre/post-processors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no other evidence type where this is absolutely needed, but a later PR could indeed make GoogleCloudDiskRawEmbedded dependent to GoogleCloudDisk
fab2731
to
191fc57
Compare
Tests are failing because of #315 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Ok, I tried to make a github suggestion for the docstrings, but it doesn't seem to handle multi-line suggestions very well (so I had to make separate suggestions to delete the follow-up lines), and I don't have permissions to push it anyway. I'll just delete these and send a separate PR so I can get this one merged. |
* Pin specific Celery/Kombu/Redis versions (google#305) * in TurbiniaTaskResult, input_evidence doesn't have to be a list (google#309) * input_evidence doesn't have to be a list * Fix exception * current pypi version of google-cloud-storage (1.13.0) requires google-cloud-core 0.28.1 (before the rename of google.cloud.iam (core) to google.api_core.iam (google#315) * Use a link to a "parent Evidence" instead of subclassing (google#296) * parent evidence * undo some simplifications for the sake of a simpler CL * Add some serialization * update docstring * Initialize attribute * set parent evidence if Evidence type is context dependant * don't pass parent_evidence at instantiation * undo linter stuff * comments * fix aim lib breaking tests * typo * Print version on start 3 (google#320) * Add files via upload * Delete turbiniactl.py * Delete turbiniactl.py * Add files via upload * Delete turbiniactl.py * Add files via upload * Update turbiniactl.py * Caps * Quick update to evidence docstrings (google#317) ... to disambiguate between _preprocess() and preprocess(). * Add Job filters (google#247) * Add job filters * fix docstrings. * update docstring * Get jobs filters working with new job manager * Refactor out FilterJobObjects into new method * Update YAPF * remove missed confict markers * Docstrings and renaming * Migrate job graph generator to use new manager (google#321) * Update Evidence local_path when it's saved (google#319) * Pin google-cloud-storage to 1.13.0 (google#326) Fixes google#325 Looks like google-cloud-storage was updated in: googleapis/google-cloud-python#6741 Which just got released as 1.13.1: https://pypi.org/project/google-cloud-storage/#history * Set image export to process all partitions (google#324) * Add --partitions all to image_export invocations * Fix typo * Explicitly set saved_paths to list (google#323) * Move version print after log level setup (google#322) * Move version print after log level setup * Remove extra whitespace * update the pystyle link (google#333) * Undefined name: Define 'unicode' in Python 3 (google#337) * Undefined name: Define 'unicode' in Python 3 __unicode()__ was removed in Python 3 because all __str__ are Unicode. [flake8](http://flake8.pycqa.org) testing of https://github.com/google/turbinia on Python 3.7.1 $ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__ ``` ./tools/turbinia_job_graph.py:47:40: F821 undefined name 'unicode' parser.add_argument('filename', type=unicode, help='where to save the file') ^ 1 F821 undefined name 'unicode' 1 ``` * Placate PyLint * Added PSQ timeout to 1 week (google#336) * Error when worker version doesn't match server google#307 (google#327) * Added turbina_version to TurbinaTask * First approach * Changed to no rise error and return instead * Restored the run from run_wrapper * Changed format of strings * Changed words fixed line too long * bump version
Instread of stacking evidence types, as shown in #261, we just implement a link from an Evidence object to a parent Evidence object.
When this parent exists, we'll preprocess before our preprocess (same for postprocess).
When we need to generate new evidences from a context Evidence, which will be processed by all the other Jobs, this new evidence type (ie DockerContainerEvidence) can just extend Evidence, and require that a parent evidence is provided.