Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for saving screenshots, page source, and other arbitrary files to unstructured storage providers #232

Open
englehardt opened this issue Nov 9, 2018 · 1 comment

Comments

@englehardt
Copy link
Collaborator

Screenshots, page source, and other files collected in the browser manager process are currently written directly to disk. This worked when OpenWPM only saved data locally, but will not work for the S3Aggregator. Instead, BaseAggregator should include a save_file method. In LocalAggregator we can implement that to save to disk, and in S3Aggregator we can upload to S3.

@vringar
Copy link
Contributor

vringar commented Feb 22, 2021

Updating this comment as #753 removed everything mentioned in the original issue.
Observations:

  • UnstructuredStorageProviders already have an interface suitable for storing a bunch of bytes under a user-defined name
  • The base path for storing is specified at time of object instantiation
  • => There is no more need for a data_directory in the manager params similiar to the database_name name being removed in Data Aggregator Rewrite #753

Paths forward:

  1. Add a second UnstructuredStorageProvider to the StorageController that is responsible for saving unstructured platform data
  2. Expand the UnstructuredStorageProvider interface with a second method that is responsible for saving unstructured platform data

I prefer option 1 as it is inherently more flexible, e.g. this way screenshots can get saved into the cloud while web content just gets saved to disk.

@vringar vringar changed the title Add support for saving screenshots, page source, and other arbitrary files to data aggregators Add support for saving screenshots, page source, and other arbitrary files to unstructured storage providers Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants