New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshots.py max_count bad list count #1227

Merged
merged 3 commits into from Mar 26, 2016

Conversation

Projects
None yet
2 participants
@MFlyer
Member

MFlyer commented Mar 22, 2016

Example: max_count set to 25

Got 25 snaps so len(snapshots) > max_count false
Add another snap so we go to 26 (25+1)

Next time IF statement is true so we delete [max_count:] -> [25:] -> element 25 and so on
another snap, back to 26 snaps (25+1) cos list index start from 0!

snapshots.py max_count bad list count
Example: max_count set to 25

Got 25 snaps so len(snapshots) > max_count false
Add another snap so we go to 26 (25+1)

Next time IF statement is true so we delete [max_count:] -> [25:] -> element 25 and so on
another snap, back to 26 snaps (25+1) cos list index start from 0!
@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 22, 2016

manually deleted snapshot to get under my max_count, but recorsively it should aid itself to the correct number

Different proposal: First take new snapshot than delete the maxcount+1 snap (in this case don't commit and just invert snapshost creation / delete code) :)

@schakrava

This comment has been minimized.

Member

schakrava commented Mar 23, 2016

I like your new/different proposal @MFlyer . In case snapshot creation fails, it will ensure that max_count of snapshots are present. But if there's a bug in delete, this will result in ever increasing number. So we could put a check before creating the snapshot to some upper bound. For example, if the current number of snapshots is max_count + 1, then delete first and only if it succeeds, create new one.

This is a nice little logic rework. Would you be interested in finishing it up?

@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 24, 2016

Hi @schakrava I partially agree. Thinking about colleagues messing on files i would prefer always to get a max_count+1 or more (debug on console or mail alert) rather than max_count-1, so we could choose an easier but safe solution: snapshot creation - on failure mail alert; snapshot deletion with an if statement (only if current snapshots number is greater than max_count - last created/supposed to be created snapshot substitutes the last one - this way, also assuming a failing delete task, on the next first successful delete it will reduce snapshots to desired max_count).

Recap: switch snapshot delete with snapshot create and before deleting check if snapshots collection meets our max_count ( so doing acts also as a min_count check ).
Obviously interested in finishing up, probably will ask you to edit for mail part / explain how to do it :)

M.

snapshsot.py update with increased checks
First step was to invert snapshot creation / deletion, so now we get

snapshot creation - len(snapshots) should be always greater than max_count
read snapshots list <- here come the problem
check list length and if > max_count go on with delete (max_count-1: cos list starting always from 0)

Final add: check on delete error and debug to console

TESTING & PROBLEMS:

Got snapshots going over max count without delete - OK
Got then delete task and OK on multiple snapshots delete

real matter: sync with Snapshot.objcets query

moved after snapshot creation to get the real number of snapshots (if before you get snaps number less one, last eventually created) but still gets outdated count (don't see last snapshot created) - tried to add a time.sleep(5) and the time.sleep(10) too, but without success
@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 24, 2016

Hi @schakrava think we've got some issues with snapshots creation / snapshots list - check last commit, it seems like snapshot creation is too slow to get caught by snapshots listing

possible solutions
go back to delete / create way, so you always get the actual (and real) snapshots list length
add a check var during snap creation and add in delete if statement to act max_count-2

EDIT - Updating/think solved, probably got error because still using a compromised share (https://forum.rockstor.com/t/shadow-copies-on-samba-over-btrfs-snapshots-2-simple-reasons-to-avoid-it/1239/4?u=flyer)

Final comit for snapshots task - ready to merge
Found corrupted share because of Shadow copies test (I was still working on the same share) - snapshots nested inside snapshots and others

New snapshots policy
First create the new snapshot to get max_count+1 or, in case of failuer, max_count
Read snapshots list with Snapshot.object (now get correctly updated, wasn't so because of share errors)
Introduced "exploit var" reduce_snapshots to 0 (used to avoid another call to Snapshot.object)
delete task if snapshots > max_count
back to snapshots[max_count:] because getting the new snap before delete now we have snapshots = snapshots +1 (last created - and if it hasn't been created we don't enter if statement for max_count check)
on every successful deletion increase reduce_snapshots by 1

Added final control (delete task failure) considering len(snapshots)-reduce_snapshots (successful deletion total)

TESTING:
- Scheduled a task with 5 snapshots max_count OK
- Increased max_count to 10 OK
- Decreased max_count to 8 (from 10 snapshots got 1 new snapshot and 3 deleted, 11 - 3 = 8)
- Increased max_count again to 10 OK
- Decreased max_count to 9 OK
[24/Mar/2016 14:01:02] DEBUG [scripts.scheduled_tasks.snapshot:79] created snapshot at shares/Dota/snapshots/Ona_201603241401
[24/Mar/2016 14:01:02] DEBUG [scripts.scheduled_tasks.snapshot:96] deleted old snapshot at shares/Dota/snapshots/Ona_201603241352
[24/Mar/2016 14:01:03] DEBUG [scripts.scheduled_tasks.snapshot:96] deleted old snapshot at shares/Dota/snapshots/Ona_201603241351
 

Here we are :)

Flyer/Mirko
@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 24, 2016

@schakrava
Finally got solved, Snapshot.object collects data correctly, error was on share (issue #1230)
Ready to merge / your suggestions

@schakrava schakrava merged commit d060fa2 into rockstor:master Mar 26, 2016

@schakrava

This comment has been minimized.

Member

schakrava commented Mar 26, 2016

Nicely done @MFlyer. I refactored and documented all the new changes. Do take a look the final changes and let me know if you see anything to improve further. It should be helpful reading for your future pull requests as well.

Really glad to have you as a contributor!

@MFlyer MFlyer deleted the MFlyer:max_count branch Mar 26, 2016

@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 26, 2016

Hi @schakrava just seen your refactored version, but don't agree at all for deletion before creation (line 98 if(delete(aw, share, stype, prefix, max_count)): )

So doing if you get ok on delete but fail on create you are on a max_count-1 snapshots: avoid extra snapshots over max_count and go safe for disk space, but with a possible missing snapshot.

Talking about refactoring: ok, got it for future commits, I'll be more explicit in code and will also feel free to refactor by adding funcs :)

@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 26, 2016

@schakrava Generally talking about refactoring and expanding scheduling snapshots, what do you think about let users decide?

Your way - Space conservative - exclude exceding data but lets snapshots incosistency (if you get error on delete you don't create snaps so users could have latest snasphot refering to n hours ago, while new data was added or modified on share)

My way - Conservative against current data state but with risk of 'over snapshots' - get always your snapshot first also if delete fails

Other thing i'd like to add: scheduling snapshots on an anacron basis ( want my snapshots every 5 mins but only during office time 9to6 )

Waiting for your opinion cos i'd like to work on these things :)

@schakrava

This comment has been minimized.

Member

schakrava commented Mar 26, 2016

I am assuming you've read my refactored code and comments. I allow at most 1 extra snapshot in cases where deleting oldest snapshot fails. If deletion of snapshot fails, chances are, creation also fails. Even if it doesn't there's a high likelihood that it only creates bigger problems if ignored for longer. So, the best is to not create any new snapshots and then alert the user immediately. The alert part is missing currently and something we should get to sooner than later.

Regarding your second question about anacron style snapshots, I really like the idea. Feel free to look into it if you are interested. The backend part is easier than the UI part, I think. If you do work on it, create a separate issue just for this.

@MFlyer

This comment has been minimized.

Member

MFlyer commented Mar 26, 2016

Even if it doesn't there's a high likelihood that it only creates bigger problems if ignored for longer. So, the best is to not create any new snapshots and then alert the user immediately. The alert part is missing currently and something we should get to sooner than later.

Ok, assuming an alert system fully agree on current code (that was my only real worry) ;)

Regarding your second question about anacron style snapshots, I really like the idea. Feel free to look into it if you are interested. The backend part is easier than the UI part, I think. If you do work on it, create a separate issue just for this.

Got it #1233 Think it will be a mix on backend and some UI adds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment