New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Portal Download - bulk #354
Comments
Marking this as X-Large for now, but it could be quicker depending on the technical solution. First step here is to decide how to implement the feature. Some of the options include:
The second is the most user friendly, stable, and scalable, but will require the most engineering effort. The first is probably a Large task, but brings with it a number of stability and scalability issues. The third is easiest (could be done entirely client side), but users may have issues running a script. If we decide to go with the second option, I would want to break this subtasks. |
@jbeezley can we discuss option 1 at the tech sync or the kitware sync meetings this week? @emileyfadrosh would love if this could be done this sprint and thinks the users would prefer option 1. |
I think mod_zip would be a good candidate for generating the streaming responses without incurring the cost on the API server. One technical question to resolve is in regards to the persistence of the download cart. The simplest (server-side) implementation would make the download cart purely in memory on the web client. This would mean if the page is reloaded everything in the cart would disappear. If we want a persistent cart, then we probably need to store it in the database with an associated user id. This means creating a download cart API. Either way, for the server side of the implementation, I would give this a MEDIUM-LARGE size. What might be a bigger component of this is to design and implement the UI. We should consider bringing in Faiza to create a design. This would also help clarify some of the technical requirements. |
The majority of this work will be done in a future sprint. Waiting to hear back on whether it should be in June or later. First task may be done this sprint. See Jon's email below: From: Jonathan Beezley jonathan.beezley@kitware.com It looks like Brandon has basically finished the last two issues. The first one is primarily my responsibility and will take a little bit of work to expose "multi-omics" related information, but I should be able to do it. |
Yes moving most of this to June makes sense unless other priorities push it back further. If we have the right people for the Kitware sync Thursday, we could discuss the requirements for the front end and have @faiza-a begin UI mockups. |
This may already have been discussed so feel free to ignore if so. When downloading files, the user will want to associate the data files back to the sample somehow. Will the files be packaged into folders named by sample identifiers? or renamed? Just jotting this here to make sure something is being considered. At the moment, individual downloads require the user to manually rename files to something more meaningful. Some files are named identically across samples (scaffolds fna), some seem to have an identifier appended (although not sure where it came from) (e.g. EC TSV). Let me know if this isn't making any sense! |
Yes, we would need that in bulk download. I assume we can roll up the zip file in a way that will include a folder structure for each sample, but that needs @jbeezley 's input. |
Here is a pre-mockup with some basic UI ideas for discussion today. @faiza-a @jbeezley @pvangay @kfagnan @emileyfadrosh https://docs.google.com/presentation/d/1gMo1fmlneVEU2hjSoWrrKOjowcXuFStGjihOwf9adZU/edit?usp=sharing |
I had in mind organizing the files something like In terms of individual downloads, it is currently downloading with the original file name. If you could propose an alternative way to derive a file name, it would be easy to change the download name via content-disposition headers. |
@jbeezley i like good to know file names can be changed. i think that it would be pretty rare for users to download single files - so this is less of a concern knowing that the bulk download will have a structure like the above. Thanks! |
just a quick comment: I think we need to be careful and also get input from @scanon @hubin-keio about file naming since we need to preserve the names of files. Right now, I don't see a meaningful link to the IDs for any of the omics outputs (eg, assembly_contigs.fna). How is this being dealt with? |
@emileyfadrosh i mentioned something similar above :) it looks like some of this will be preserved in jon's proposed structure for file organization but we should think about whether we want to propose some kind of file naming scheme for all files in NMDC. probably needs to be done in conjunction with the conversation about how to list sample names: microbiomedata/nmdc-metadata#349 (definitely need input from others on path forward) |
Based on the Kitware sync meeting today I will move this to the June sprint. @jbeezley and @jeffbaumes if you prefer I close this and open new issues let me know. |
Notes from today's meeting about bulk download: Kitware Sync 5/27/21
|
Updated UI Mockups based on feedback in Sync meeting on 06/03/21 |
@jbeezley If I understand correctly - what you really need from microbiomedata/nmdc-schema#20 isn't simply access to descriptions (which are already there, but not used, on all data objects) but a file type attribute on each data object to allow the UI discussed here to do filtering by file type. Am I missing something? |
No, I don't think you are missing anything. There is definitely a confusion about "file type" and "description". It appears to me that the "description" is just some free form text that isn't validated (or as you noted displayed in the UI). The file type on the other hand, is an enumerated type that we can do querying on. |
UI still needs to be completed |
Researchers who are interested in raw data generally want to do bulk downloads (to a server/cloud) for custom analyses. Researchers who are interested in data products may want to download to a server or to their local computer for custom analysis or sending to KBase. Researchers are also interested in downloading the associated sample metadata only.
Priority - medium
Urgency - low
The text was updated successfully, but these errors were encountered: