Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete process data #1034

Merged
merged 7 commits into from
Apr 3, 2015
Merged

Conversation

antgonza
Copy link
Member

@antgonza antgonza commented Apr 3, 2015

Adding Processed data deletion, both to the qiita_db and the GUI.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 79.12% when pulling 2fa574e on antgonza:delete-process-data into 0744eed on biocore:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 79.1% when pulling d63fb3e on antgonza:delete-process-data into 0744eed on biocore:master.

conn_handler.add_to_queue(queue, sql)

conn_handler.execute_queue(queue)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to delete the files on the filesystem, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that we have not discussed as a group, but I will just make a note here. If you take a look, it doesn't even remove the row from qiita.filepath table. We are planning to run "purge_filepaths" in a cron job. That function, identifies which rows from qiita.filepath are not referenced and removes them from the table and from the filesystem. The advantage of this, is that the server is more free to answer requests rather than doing maintenance work, that can be done by a worker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both comments: What we are currently doing is to leave the filepath entry (it doesn't affect anything) as deleting files could be pretty expensive. Additionally, the plan is to drop one of the already created functions once we have this in: #814

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can an issue be opened specifically for that then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #814

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW @josenavas, we discussed this decision as a group, search in your email for "deleting files and automatic jobs"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question then is will that job be fired off as soon as the server starts or will there be a delay? Any delay means that the files could sit around indefinitely, as the user could start the server, quickly do something, and then shut it down again without that cleanup ever firing.

Also I do not have that email in any of my accounts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that @antgonza I found the email. We should write those small parts on the Contributing.md file. Next week, I'm going to put a new version together and I will submit a PR, so we can all review it.

The reason why I thought it was not discussed is because that is handled inconsistently across the code base, so I thought that was not agreed. Thanks for pointing that out!

@squirrelo I forwarded the email to you for your reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will you mind moving the discussion about how the job is gonna work
    to the actual issue? That way we will ensure everybody's comments are
    taken into account when that's resolved. Thanks!
  2. Thread to which you replied:
    https://groups.google.com/forum/#!searchin/qiita-dev/deleting$20files$20and$20automatic$20jobs/qiita-dev/X8pu1fHASuY/TT2_fARyS50J

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will just conclude by saying if this pull request is leveraging that expected functionality, that functionality should probably go in pretty quickly after this request.

@squirrelo
Copy link
Contributor

Few comments. Will test interface once addressed.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 79.1% when pulling 99f196f on antgonza:delete-process-data into 0744eed on biocore:master.

@antgonza
Copy link
Member Author

antgonza commented Apr 3, 2015

Ready for another review.


if analysis_ids:
raise QiitaDBError(
"Processed data %d is linked to (meta)analyses: %s" %
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve the error message here?
Processed data %d cannot be removed because it is linked to the following (meta)analysis: %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analysis names would be better than IDs. There is no way to link an ID to an analysis in the UI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, raise error without the IDs and use the Logger to store the IDs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with adding names and QiitaDBError so the user can see the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I just thought of if showing names: this will potentially show names of analyses that the user does not have access to. Probably not the end of the world, but could be a privacy issue.

Copy link
Member Author

@antgonza antgonza Apr 3, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And users should not be able to put it back to sandbox if it was public and
there was analysis using it.

2015-04-03 9:31 GMT-07:00 Antonio Gonzalez notifications@github.com:

In qiita_db/data.py
#1034 (comment):

  •        If the processed data status is not sandbox
    
  •    """
    
  •    if cls(processed_data_id).status != 'sandbox':
    
  •        raise QiitaDBStatusError(
    
  •            "Illegal operation on non sandbox processed data")
    
  •    conn_handler = SQLConnectionHandler()
    
  •    analysis_ids = [str(_id[0]) for _id in conn_handler.execute_fetchall(
    
  •        "SELECT DISTINCT analysis_id FROM qiita.analysis_sample WHERE "
    
  •        "processed_data_id = {0} ORDER BY "
    
  •        "analysis_id".format(processed_data_id))]
    
  •    if analysis_ids:
    
  •        raise QiitaDBError(
    
  •            "Processed data %d is linked to (meta)analyses: %s" %
    

I think that's fine. The only way this could happen is if the owner made
it public, then move it to sandbox, and is trying to delete it. Which will
be a full other beast ...


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/1034/files#r27738714.

Jose Navas

@josenavas
Copy link
Contributor

Few more comments @antgonza. Thanks for putting this together!

QiitaDBError
If the processed data has (meta)analyses
"""
if cls(processed_data_id).status != 'sandbox':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking about this. Should we limit the removal only to sandbox? I think we should be able to remove the processed data in any moment except if it is public (i.e. why we should not allow the removal if it is private?)

Copy link
Member Author

@antgonza antgonza Apr 3, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine with me. We can revisit this later and it is an easy change if we decide something different. I'm going to pull down the code to test.

@josenavas
Copy link
Contributor

👍

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 79.12% when pulling c195dad on antgonza:delete-process-data into 0744eed on biocore:master.

squirrelo added a commit that referenced this pull request Apr 3, 2015
@squirrelo squirrelo merged commit 2199bdd into qiita-spots:master Apr 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants