WIP: Journal - Heketi should keep track of transactions #661

MohamedAshiqrh · 2017-01-31T16:21:27Z

Journal support is a simple concept where heketi keeps track
of transaction and has capability to revert back to a consistent
state or continue from the point where transaction was stopped.

In case of Volume create transaction,
Brick entries are added to the devices and under volume entries
after creating bricks(or lv's on devices). Before all the bricks
created for the volume, heketi process is forcefully terminated.
DB parsing is failed as there are no complete volume entries.

Someone has to cleanup or resume the transaction. Thats journal
responsibility now.

Pair-Programmed-With: Raghavendra Talur rtalur@redhat.com

Signed-off-by: Mohamed Ashiq Liyazudeen mliyazud@redhat.com
Signed-off-by: Raghavendra Talur rtalur@redhat.com

centos-ci · 2017-01-31T16:23:55Z

Can one of the admins verify this patch?

MohamedAshiqrh · 2017-01-31T16:25:39Z

@heketi/dev @heketi/maintainers
Hi,

@lpabon @humblec @obnoxxx @jarrpa @raghavendra-talur @ramkrsna
Please take a look at this and share your ideas.

Test:
Do a volume create command and kill heketi forcefully before the end of volume create.
Journal will throw a critial error now which will be replaced by Journal Handle functionality which will do revert or resume the transaction.

for now, it is writing to /tmp/journal which can be moved under /var/lib/heketi or anywhere else.
There are Just two Labels
START
END

based on the count of these the transaction state is found.

Have to revert db entries and delete the stale lvs based on the input along with START and END.

Journal support is a simple concept where heketi keeps track of transaction and has capability to revert back to a consistent state or continue from the point where transaction was stopped. In case of Volume create transaction, Brick entries are added to the devices and under volume entries after creating bricks(or lv's on devices). Before all the bricks created for the volume, heketi process is forcefully terminated. DB parsing is failed as there are no complete volume entries. Someone has to cleanup or resume the transaction. Thats journal responsibility now. Pair-Programmed-With: Raghavendra Talur rtalur@redhat.com Signed-off-by: Mohamed Ashiq Liyazudeen mliyazud@redhat.com Signed-off-by: Raghavendra Talur rtalur@redhat.com

lpabon · 2017-02-01T05:24:06Z

Interesting. I'll definitely look at it this tomorrow

lpabon · 2017-02-02T01:01:07Z

Hi guys, I'm little bit confused, because Heketi already does this through the use of defer functions. Take a look at the defer functions here. Do you mind explaining what creating a journal provides over defer functions?

Ah I get it, if Heketi crashes and needs to come back, this can help. I believe you may want to describe in a document how Heketi journal would be replayed to get back to a consistent state before code is written.

MohamedAshiqrh · 2017-02-03T11:32:45Z

@lpabon Hi,

Let me tell Why I want this and Where it will make the difference.
We will hit the issue of DB in not clean state and Stale LV's, When Heketi goes down on Volume create progress. IMO This is what we hit the most today so our below solution is based on this path. Let us know If it will create conflict on other paths(node add/delete, device add/delete and cluster add/delete).

So we were thinking we would recommend to write Brick ID on the creation of each brick and create a structure from these entries. Call removeBrickFromDB and also call destroy bricks which will delete the DB entries and LV's.

Example Journal on Heketi going down on VolumeCreate, Roughly looks like below
START VolumeCreate VolumeName size .....
START BrickCreate brickid 3247198744519sdf83
END BrickCreate
START BrickCreate brickid sdahfwiu233
END BrickCreate

Now will create brickEntry and VolumeEntry to call the DB from context available in Journal file.
Then Call removeBrickFromDB and DestroyBrick.

Then Clean the Journal.
Now all the stale bricks entries and LV's are deleted. This way reverting back to good state.

Hurray! Good To Go.

MohamedAshiqrh · 2017-02-03T11:36:23Z

@heketi/dev @heketi/maintainers @heketi/admin See the above comment and share your feedbacks.

@humblec @obnoxxx @jarrpa @raghavendra-talur @ramkrsna

lpabon · 2017-02-05T02:20:16Z

@MohamedAshiqrh I think this really really good and necessary. I would highly suggest to save the information (journal) in the db instead of a file. The issue is that a file will not be available when Heketi crashes if it was a container in Kubernetes. By saving the data in the DB, the Journal can be replayed and cleaned up or continued.

I would recommend creating a Journal structure with an array of steps where it can be saved as an entry. A new bucket can be created for journals in the db to save these.

lpabon · 2017-02-05T02:22:01Z

@MohamedAshiqrh If you do not mind, please write up a design (markdown) document with the steps to save, restore, and determine how to repair. Also, document how to test.

MohamedAshiqrh · 2017-02-07T20:57:25Z

@lpabon I thought of sticking to a file as DB is in incorrect state also this may be helpful on exactly what went wrong in case db corrupts. We mount the gluster volume on /var/lib/heketi, If we place the file there we can persist it in the container world too. Just thought If heketi goes stateless Journal will not have much changes.

I will definitely write design doc.

MohamedAshiqrh · 2017-02-08T20:51:32Z

According to #671 PR No more volume is required, Thus I agree to @lpabon point on having a separate bucket for Journal. Will proceed the same. 👍 Good Job on #671 @lpabon .

obnoxxx · 2017-02-20T23:46:29Z

Generally, good thinking! @MohamedAshiqrh I don't quite understand yet though, how storing the heketi db in a kube secret (PR #671) has any effect on this change. Firstly, heketi runs in non-kubernetes deployments as well. Secondly afaict, that is just a different place to store the heketi db. It should not affect the decision whether to store the journal inside the db or outside...

I need to look deeper, but generally, I tend to agree with @lpabon that the journal should be put into the DB. This is not a contradition, since the journal would be used to keep the DB in a consistent / roll-back-able state, essentialy.

MohamedAshiqrh · 2017-02-27T07:06:37Z

@obnoxxx In #671, @lpabon correct me if wrong. secret is mounted on /backup and working directory of heketi(/var/lib/heketi) is an Empty directory(empty dir from host to container which is actually bind mount of a folder which AFAIK located in /var/lib/docker/container/id/something* of the host). This means file for journal will land in working directory of heketi which is not persisted. the db is backed up to secret again but the db is not used from the secret itself or in other words secret is not a mount point where we can place the content of heketi working directory and persisted. Secret can hold only DB. Hope I make some sense.

lpabon added the in progress label Jan 31, 2017

MohamedAshiqrh force-pushed the journal branch from b78947e to f31c5e8 Compare January 31, 2017 16:43

MohamedAshiqrh force-pushed the journal branch from f31c5e8 to 5b0469c Compare January 31, 2017 17:15

lpabon self-requested a review February 1, 2017 05:25

lpabon closed this Jun 13, 2017

lpabon removed the in progress label Jun 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Journal - Heketi should keep track of transactions #661

WIP: Journal - Heketi should keep track of transactions #661

MohamedAshiqrh commented Jan 31, 2017

centos-ci commented Jan 31, 2017

MohamedAshiqrh commented Jan 31, 2017 •

edited

lpabon commented Feb 1, 2017

lpabon commented Feb 2, 2017 •

edited

MohamedAshiqrh commented Feb 3, 2017 •

edited

MohamedAshiqrh commented Feb 3, 2017

lpabon commented Feb 5, 2017

lpabon commented Feb 5, 2017

MohamedAshiqrh commented Feb 7, 2017

MohamedAshiqrh commented Feb 8, 2017

obnoxxx commented Feb 20, 2017 •

edited

MohamedAshiqrh commented Feb 27, 2017

WIP: Journal - Heketi should keep track of transactions #661

WIP: Journal - Heketi should keep track of transactions #661

Conversation

MohamedAshiqrh commented Jan 31, 2017

centos-ci commented Jan 31, 2017

MohamedAshiqrh commented Jan 31, 2017 • edited

lpabon commented Feb 1, 2017

lpabon commented Feb 2, 2017 • edited

MohamedAshiqrh commented Feb 3, 2017 • edited

MohamedAshiqrh commented Feb 3, 2017

lpabon commented Feb 5, 2017

lpabon commented Feb 5, 2017

MohamedAshiqrh commented Feb 7, 2017

MohamedAshiqrh commented Feb 8, 2017

obnoxxx commented Feb 20, 2017 • edited

MohamedAshiqrh commented Feb 27, 2017

MohamedAshiqrh commented Jan 31, 2017 •

edited

lpabon commented Feb 2, 2017 •

edited

MohamedAshiqrh commented Feb 3, 2017 •

edited

obnoxxx commented Feb 20, 2017 •

edited