Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to not store images #154

Open
cco3 opened this issue Apr 19, 2021 · 15 comments
Open

Option to not store images #154

cco3 opened this issue Apr 19, 2021 · 15 comments
Labels
design-needed enhancement New feature or request

Comments

@cco3
Copy link

cco3 commented Apr 19, 2021

Maybe this already exists, but could there be an option to not store images after a scan completes?

@tdruez
Copy link
Member

tdruez commented Apr 20, 2021

@cco3 do you mean to not store the images in the database as codebase resource (not scanned)? Or to not keep those on the file system (project work directory) after a scan?

You would need this option on a per project basis or for a whole ScanCode.io instance?

@cco3
Copy link
Author

cco3 commented Apr 20, 2021

I would like to store nothing more than the report and associated metadata. Ideally, the only thing that would need to be persistent would be the DB.

@pombredanne pombredanne added the enhancement New feature or request label Apr 28, 2021
@cco3
Copy link
Author

cco3 commented May 5, 2021

Furthermore, I'm concerned that even with a large disk, we will fill up local storage with images and the tool will just stop working unless we add some sophisticated way of handling it.

@tdruez
Copy link
Member

tdruez commented May 6, 2021

https://scancodeio.readthedocs.io/en/latest/scanpipe-concepts.html#project-workspace

To be sure I understand properly, you would like to remove the content of both the input/ (input files as uploaded/downloaded) and the codebase/ (extracted content) directories?

@cco3
Copy link
Author

cco3 commented May 6, 2021

Correct. I would like it if after a run there were nothing additional saved on disk (only in the DB). This also includes the report files if possible. Are these stored in the DB or on disk?

@cco3
Copy link
Author

cco3 commented May 6, 2021

I'd like to be able to throw away the disk and only worry about persisting the DB. An alternative might be to be able to specify settings for remote storage (AWS/Google Storage/ftp/etc.)

tdruez added a commit that referenced this issue May 7, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@cco3
Copy link
Author

cco3 commented May 27, 2021

I don't suppose there's a way to use SCANCODEIO_WORKSPACE_LOCATION to accomplish what I want, is there?

@tdruez
Copy link
Member

tdruez commented May 28, 2021

I would like it if after a run there were nothing additional saved on disk (only in the DB). This also includes the report files if possible. Are these stored in the DB or on disk?

Generated reports are stored on the disk (in the output/ project directory), but those can be regenerated anytime form the DB data. When you click on a "Download" link in the UI, a fresh reports is generated and sent.

I'd like to be able to throw away the disk and only worry about persisting the DB. An alternative might be to be able to specify settings for remote storage (AWS/Google Storage/ftp/etc.)

You can specified the location of the workspace using the SCANCODEIO_WORKSPACE_LOCATION setting https://scancodeio.readthedocs.io/en/latest/scancodeio-settings.html#scancodeio-workspace-location, as long as it's a mounted location of the filesystem. We could add remote storage support in the future.

In the short term, you can wipe the content of your workspace location (available in the header of any project details view using the web UI).

We will add automated ways to run those cleanup.

@cco3
Copy link
Author

cco3 commented May 29, 2021

Thanks! Is this the behavior on the current release or the next one?

@tdruez
Copy link
Member

tdruez commented May 31, 2021

The SCANCODEIO_WORKSPACE_LOCATION setting and system has been around for a while.

@cco3
Copy link
Author

cco3 commented Jun 1, 2021

I meant the behavior to regenerate a report when it's no longer on disk. That hasn't worked for me with the current release.

@tdruez
Copy link
Member

tdruez commented Jun 1, 2021

@cco3 which reporting format are you using, json or xlsx?

@cco3
Copy link
Author

cco3 commented Jun 1, 2021

I think we are going to end up primarily using xlsx.

tdruez added a commit that referenced this issue Jun 4, 2021
Signed-off-by: tdruez <tdruez@nexb.com>
@pombredanne
Copy link
Member

See also #356

@pombredanne
Copy link
Member

@cco3 Since we now have the option to archive a project with #205 and there is a related issue to use external storage with #356 how do you see this issue evolving? is this still relevant in this context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-needed enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants