Alternative ways of exchanging arrival/departure dates #27

wrygiel · 2017-05-31T13:25:18Z

During the Skype call with Ghent, Ghent developers noted that - in their setup - it is possible to create a kind of a race condition in the update-arrival-departure-dates-v1 update request.

Currently, this update request allows the caller to simply replace the actual-arrival and actual-departure field values. Ghent intends to have a daemon which pushes these update requests at the sending HEI, and they have multiple servers which can do the pushing. Due to this, and to network latency, if two updates are queued at the same time, there's no way of knowing which one will arrive first.

Ghent wants to add some kind of versioning, so that the server (the sending HEI in this case) will be able to detect which of the two "replace" actions was the last one. If the server detects an "outdated" request, it will respond with HTTP 409 Conflict (or other proper error response status).

@erasmus-without-paper/all-members

The text was updated successfully, but these errors were encountered:

kamil-olszewski-uw · 2017-05-31T13:39:42Z

Maybe a timestamp in every update request will be enough? Timezones and other technical time differences are not important in this case - we may treat such timestamp as receiving HEI's local counter - the higher value, the newer update.

We may use this timestamp in all update elements, not only in update-arrival-departure-dates-v1.

wrygiel · 2017-05-31T13:42:25Z

Maybe a timestamp in every update request will be enough?

Not sure. Ghent uses multiple servers for generating and making these requests. Can we require all their clocks to be properly synchronized? As far as I understood, they are considering changes made within milliseconds of one another.

wrygiel · 2017-05-31T13:49:02Z

Can we require all their clocks to be properly synchronized?

A simple incremented integer seems a safer choice. But what would be the scope of this integer? Would the sending HEI be required to keep a single integer for the entire mobility? Or per each modifiable field? Or something in between?

BavoNootaert · 2017-05-31T13:55:21Z

I think we should just ignore old versions, not return 409 conflicts.

A single integer for the entire mobility wouldn't work. If version 1 is a change in the dates, and version 2 is a change in the components, if version 2 is processed first, version 1 would not be.

We also proposed an alternative scheme, where the receiving institution just sends a notification that something changed (like a CNR), and then the sending institution would GET the new data (dates, suggestions for changes in the draft). This would require a new API (or endpoints) though.

wrygiel · 2017-05-31T14:07:07Z

I think we should just ignore old versions, not return 409 conflicts.

You can ignore error responses, but I still think the server should return them. Returning HTTP 200 in this case would create confusion in some other cases (e.g. when the caller misplaces his own version ID for some reason, i.e. due to a bug).

BTW, it's worth noting, that - as far as we stick with the plan that the sending HEI is the master of this data (perhaps we won't) - the server can respond with a useful message. For example "Sorry, we couldn't save this arrival date, because it's after the departure date, so it doesn't make sense". Some servers may include other kinds of such "checks", and respond with other kinds of error messages. Ghent intends to ignore such messages (due to their system architecture), but probably other partners won't.

wrygiel · 2017-05-31T15:05:30Z

Note, that this problem applies also - theoretically - to updating statuses. (E.g. The receiving HEI could accept, and then quickly reject a nomination, and we would want such nomination to be rejected in this case.)

Solution 1 (the current one)

The update endpoint is (currently) the only place in all EWP APIs which was intended to be used "online". This means, that we indeed expect the clients to fail, if the sending HEI is offline for some reason.

Pros:

Easy for the sending HEI implementers. They just expect everything be pushed at them in proper order.

Cons:

Difficult for the receiving HEI implementers. They must decide between:
- Forcing their users to retry the action after some time, if the sending HEI is offline.
- Implementing a specialized fallback daemon (which would kick in if the sending HEI is offline), which would make sure that all requests are sent and received in the correct order.

Solution 2 (switch the master)

We could decide that it is the receiving HEI who is the master of the actual arrival and departure dates. We would need to specify two new APIs:

Incoming Mobilities API (for starters, it would contain just the outgoing-mobility-ID, along with the two dates). Note, that we would need to identify mobilities by their "outgoing" IDs here (the ones that the sending HEI has assigned). Perhaps in the future we would also add a seperate incoming-mobility-id, if necessary.
Incoming Mobilities CNR API (implemented by the sending HEI, allows it to be notified when arrival/departure dates have changed on the receiving HEI's servers)

In other words, we could require the sending HEI to fetch this data from the receiving HEI (instead of requiring the receiving HEI to push it at the sending HEI, as it is required now).

We would also drop the actual arrival and departure dates from the get endpoint of the existing Outgoing Mobilities API (because this API should contain only these properties of which the sending HEI is the master of).

Pros:

Perhaps indeed the receiving HEI is more of the "owner" of these dates.
If it turns out in the future, that there are other elements which the receiving HEI seems to be the master of, then this API will come in handy. (E.g. the Transcript of Records?)

Cons:

Two more APIs to implement.
I'm not 100% certain who's really the master/owner of this data. Is it true that the receiving HEI is "always right" in this case?

Solution 3 (attach previous values)

In this solution, we require the requester to provide the previous (or, you might say, current) values of actual arrival and actual departure dates. The server compares these values with the ones it keeps locally, and generates HTTP 409 if they don't match.

Cons:

While this "theoretically" allows the server to detect "edit conflicts" themselves, it doesn't solve the actual synchronization problem as put by Ghent. In fact, it may even introduce more conflicts. If three changes are added A->B, B->A, A->C, then the server might still end up with A, B, or C in the end, depending on the order.

Solution 4 (attach version numbers)

We require server implementers to keep additional integers ("incremental request number") for each update type of each mobility.

Pros:

We solve this particular problem, and don't introduce new APIs.

Cons:

It requires both sides to keep additional fields in their database. It may introduce problems if it turns out that either side somehow "loses track" of this number.
This solution assumes that there's only one "external updater". It will fail if we allow a single update-type to be called by two unrelated requesters. In EWP, currently we have only one "external updater" per mobility (the receiving HEI), but we do allow one receiving HEI to be covered by multiple partners. So, if two partners will attempt to make modifications to a single mobility, they could "clash" with each other. It won't probably come to that though.

wrygiel · 2017-05-31T15:52:36Z

Solution 5 (attach timestamps)

Similar to solution 4, but timestamps are stored, instead of "incremental request numbers". This is similar to what @kamil-olszewski-uw has proposed above.

Pros:

The requesters don't need to keep track of the numbers.
No problems with the "multiple external requesters", as described in solution 4.

Cons:

If parallel requests are generated (not necessarily sent) by multiple machines, then their clocks should be synchronized.

georgschermann · 2017-06-02T08:50:16Z

Maybe I miss something, but I don't see a problem at all.

If 2 Servers say that the student arrived on June 1st and June 2nd at the same moment, either one of them would have wrong data, which usually would get corrected later on and thus sent again correct.
And if these changes would appear in one user session - inserted incorrect and then corrected - I highly doubt that this would/should trigger a race condition.
So if the local system handles race conditions properly, I don't see a way in which data could be inserted, which would lead to a race condition on the other side of this API.

For example in our System 2 users can't modify the same data set simultaneously and one user can't spread across multiple servers due to sticky sessions, and one server wouldn't switch requests or send them in parallel.

But I am also ok to go with any of the proposed solutions, preferably 4 or 5 because of simplicity.

wrygiel · 2017-06-02T09:06:34Z

Guys from Ghent, are you sure you are unable to implement a "fallback daemon", similar to the one I described in Solution 1?

@georgschermann is right - if you would just send your requests in the proper order (and wait for responses), then the problem would vanish.

huntering · 2017-06-02T11:56:08Z

We are sure that we will not synchronously push the data. We could probably work around the problem but it adds complexity that we want to avoid because clients can implement the API poorly and it would be very hard to debug. (see #25 (comment))
Solution 2 and 4 are viable solution. We prefer solution 2 because it is similar to the other CNR interfaces and it provides extra interfaces that allow to recover data in both directions if something goes wrong. We would prefer to use this mechanism so that all interfaces behave similarly.

wrygiel · 2017-06-02T12:37:21Z

We prefer solution 2 because it is similar to the other CNR interfaces and it provides extra interfaces that allow to recover data in both directions if something goes wrong.

This is true. But we have also discussed this solution a couple of months ago (during the meeting in Warsaw), and most developers leaned toward solution 1, so we decided to go with solution 1 (I'm not sure if you were present, or had to leave early). Personally, I like both solutions 1 and 2, but isn't it too late for such a big change? According to the timeline, by now some partners should already be finishing with this API. Guys from Sweden keep complaining that we keep changing things, etc.

huntering · 2017-06-13T11:39:06Z

I'm guessing from the response, that the number of partners currently working on this is very limited so now is probably the right time to propose a change?

wrygiel · 2017-06-13T15:29:55Z

I can make a "full-blown" proposal (new repositories, modified XSDs, etc.) based on solution 2. However, such proposal will not be merged until we meet with all partners, talk about it, and they accept it. So, the decision will probably be delayed for some time (even months!). This means that you would still need to implement Solution 1 for now. Would this work for you?

huntering · 2017-06-14T07:45:55Z

Currently we are still in the process of analysing the mobility API. So it will take a while before we implement anything either way.
I am looking forward to your proposal and I hope we can discuss with everybody who is working on this mobility API through the right channels. Do you know who is really currently looking at this API? Nobody is responding on github, so I'm guessing most people didn't get to it yet?

wrygiel · 2017-06-14T08:53:05Z

The most accurate way of knowing who's actually done something is by searching for "api-mobility" in dev catalogue.

wrygiel · 2017-06-22T07:52:01Z

I can make a "full-blown" proposal (new repositories, modified XSDs, etc.) based on solution 2.

Just a note: @janinamincer-daszkiewicz won't let invest any more of my time into EWP this month, so I will return to this topic not earlier than in July. Also, I already have other EWP topics scheduled to work on in July (security, in particular), so it might be delayed even further.

MartaJuzepczuk · 2017-09-05T12:28:04Z

As Wojtek noticed, the same problem applies to updating statuses. Solution 2 will not fix this problem. I think there is no clear master of nomination status. The sending hei is the one that will notify eg. about student resignation, but the receiving hei will accept and reject nominations. So it does not fit to the partition to incoming and outgoing mobilities.

janinamincer-daszkiewicz · 2017-09-05T20:58:16Z

Let me also remind you that we are replacing the old way of sending information via emails by the new way of sending it via APIs. I would vote for the simplest solution possible and I am sure that it will be better than what we have now. Otherwise we risk that partners will not implement it.

janinamincer-daszkiewicz · 2017-09-05T21:19:29Z

In my opinion this is the receiving HEI which knows the exact arrival and departure dates. On the other hand the sending HEI is the one which really needs this information.

As I remember Klementyna describing the scenario, the incoming student visits IRO of the receiving institution for the last time to say good buy and for other reasons. IRO puts into the system the departure date. From this time on the information is ready for transfer to the sending HEI. So this seems more natural that the receiving HEI sends CNR (or update - whichever you choose).

The alternative scenario might be that the sending EI does not get any notifications, becomes nervous and tries to get this information from the receiving HEI (takes the initiative).

Kamil, could you please verify this scenario with Klementyna?

Versioning seems like overkill for this simple use case.

MartaJuzepczuk · 2017-09-06T05:32:58Z

There is one more thing I don't understand. This issue is about race condition problem. During the Skype talk, Ghent was talking about the problem, that the sending HEI server has to be online to receive update request. Maybe I misunderstood something, but these seem to be two separate problems. And the one with being online will not be solved by switching from updates to CNRs.

MartaJuzepczuk · 2017-09-06T05:46:06Z

From georgschermann's comment:

If 2 Servers say that the student arrived on June 1st and June 2nd at the same moment, either one of them would have wrong data, which usually would get corrected later on and thus sent again correct.

We continue talking about solving race condition, but isn't it a good argument to tell, that they are not a big problem now?

On the other hand, when we switch to CNR's and the partner, after receiving CNR, will call get method, he will get the data from one of these two servers. So there is still some synchronization problem to solve.

wrygiel · 2017-09-06T08:28:01Z

During the Skype call yesterday, @huntering suggested that Solution 2 could work for all update types. I have spent some time thinking through it, and I think I disagree.

Solution 2 is fine for data for which the receiving HEI is the master of. However - out of 4 currently defined update types - the receiving HEI is only the master of 1 (arrival and departure dates). Solution 2 won't work well for the other update types.

Example

Let's say that receiving HEI wants to suggest updates to LA.

How it works currently

Currently, it does so by pushing a HTTP request as this one up to the sending HEI.

If the sending HEI is offline, then the receiving HEI will not be allowed to push its suggestions. There are a couple of ways of handling this problem in this model. I will use the letters from the CAP theorem to label the approaches below:

A+P (the worst one): The receiving HEI will automatically retry the user's request after some time, and modify the <current-latest-draft-snapshot> section automatically, so that the request always succeeds. This may introduce consistency errors, because the receiving HEI doesn't take into account the fact that <current-latest-draft-snapshot> might have changed between the initial user's request, and the final HTTP request which has been received.
Easy C+P (good enough, for starters, I think): The user at the receiving HEI cancels this action until the sending HEI is online again. This is bad user experience, but it's good for consistency. The "bad user experience" will surface only when the partner is offline.
Complex C+P (the best one): Similar to A+P above, but the server doesn't modify the <current-latest-draft-snapshot> section automatically. This means that if the request succeeds, then data is still consistent. If it doesn't (HTTP 409 Conflict), then the receiving HEI will need to notify its user that his request conflicted.

How it would (not?) work in Solution 2

In Solution 2, receiving HEI would simply publish its suggestions to LA (via the newly introduced Incoming Mobilities API), and notify the sending HEI (via CNR) that it had changed the LA.

The sending HEI is the master of the LA. Most often, the sending coordinator will first need to review the suggestions, before they are merged. During this time (before the changes are reviewed), the LA can change in parallel, on the sending HEI's side.

Let's assume, for example, that the student adds some courses to the LA.

At this point, receiving HEI gets notified that LA has been changed - new courses were added by the student to the latest-draft-snapshot section.

How should the receiving HEI react? According to the specs, the sending HEI is the master of the data, so the most reasonable thing to do is to assume that this change is the result of the review process of the suggestions which the receiving HEI has sent earlier. So the receiving HEI (wrongly) interprets this result as "we do not accept your suggestions".

In other words, I don't see a clear way of handling conflict resolution in Solution 2. Solution 2 will work okay, but only for data which gets modified on one side only. Whenever the data gets modified on both sides, we run into problems. These problems surface regardless of the fact that we know who is the master of the data (because we cannot reliably distinguish "suggestions" vs "inconsistencies").

This is exactly why we chose to follow the master-slave approach. The example above is very similar to problems surfacing in multi-master systems.

huntering · 2017-09-06T08:33:08Z

Updating information asynchronously eventually ends up in a race condition. If you start queueing the updates and you want to do this in a loadbalanced way, you don't have controle over the order of the messages being processed.
Using CNR not only solves this problem but also makes it possible to always collect the data when needed. I don't see how a get can create a race condition, you would need to clarify that to me.

Regarding the status, there is not really one status which both systems operate upon. There are in fact two, one at the sending and one at the receiving. The sending sets his status to "nomination" after which the receiving sets his status to either "rejected" or "live" and then "completed" (which seems missing and is maybe not necessary, there is a status recognized but I don't see the use of it). The sending can at any time set it's status cancelled. Of course not all combination of both statusses are relevant and on a functional level they can be combined, but I don't see why on a technical level we should limit ourselves to one status? And I guess that's how all institutions will implement it anyway, and probably keep track of all the changes in some kind of historylog.

huntering · 2017-09-06T13:08:13Z

Regarding the possibility for the receiving institution to propose changes.

The sending is the master of the LA and the sending can choose to apply the proposed changes or not. So there is never a real conflict. It's not because we use CNR that there is no longer a master-slave.

When two people work on the LA at the same time. What will happen with the UPDATE scenario? As I see it now it is very well possible that the receiving proposes changes on a LA that in the meantime has changed at the sending. The UPDATE scenario doesn't fix this as far as I can see.

I think the way it is analyzed right now, the only thing you can do is compare snapshots from both sides. Regardless whether this is communicated through an UPDATE or a GET.
Providing something more elegant would require a reference to a specific snapshot and that complicates things.

janinamincer-daszkiewicz · 2017-09-07T08:58:55Z

After internal discussion, we have decided to prepare specifications according to Ghent's proposal (solution 2), but only for the fields which we are sure it will work correctly (arrival and departure dates). We may still consider expanding this in EWP 2.0, but we don't have enough time to discuss it now, because the project ends in less than two months. We have already implemented update for nominations, we will most probably not implement LA (update included). That means that changing other updates at this phase of the project would not be reasonable.
The first free day we will arrange Skype with Ghent or will invite Ghent to Warsaw to discuss the ‘update business’ in detail and present ready solution to other teams.

This is needed to avoid naming clashes with the newly introduced Incoming Mobility entities: #27

This is needed to avoid naming clashes with the newly introduced Incoming Mobility entities: erasmus-without-paper/ewp-specs-api-omobilities#27

These two fields were moved to the new Incoming Mobilities API, as requested in this thread: #27

wrygiel · 2017-09-19T19:28:13Z

All done.

wrygiel changed the title ~~Possible undetectable conflicts when updating arrival/departure dates~~ Alternative ways of exchanging arrival/departure dates Jun 14, 2017

wrygiel added a commit that referenced this issue Sep 7, 2017

Refactoring: mobility -> omobility

b6e37c8

This is needed to avoid naming clashes with the newly introduced Incoming Mobility entities: #27

wrygiel added a commit that referenced this issue Sep 19, 2017

Remove actual-arrival-date and actual-departure-date

b79e024

These two fields were moved to the new Incoming Mobilities API, as requested in this thread: #27

wrygiel closed this as completed Sep 19, 2017

wrygiel mentioned this issue Sep 20, 2017

ToRs vs Incoming Mobilities erasmus-without-paper/ewp-specs-api-imobility-tors#3

Closed

BavoNootaert mentioned this issue May 4, 2018

error recovery in update endpoints #31

Closed

MartaJuzepczuk mentioned this issue Jul 27, 2018

Moving nomination status to Incoming Mobilities API #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative ways of exchanging arrival/departure dates #27

Alternative ways of exchanging arrival/departure dates #27

wrygiel commented May 31, 2017 •

edited

Loading

kamil-olszewski-uw commented May 31, 2017

wrygiel commented May 31, 2017

wrygiel commented May 31, 2017

BavoNootaert commented May 31, 2017 •

edited

Loading

wrygiel commented May 31, 2017

wrygiel commented May 31, 2017 •

edited

Loading

wrygiel commented May 31, 2017 •

edited

Loading

georgschermann commented Jun 2, 2017 •

edited

Loading

wrygiel commented Jun 2, 2017 •

edited

Loading

huntering commented Jun 2, 2017

wrygiel commented Jun 2, 2017

huntering commented Jun 13, 2017

wrygiel commented Jun 13, 2017

huntering commented Jun 14, 2017

wrygiel commented Jun 14, 2017

wrygiel commented Jun 22, 2017

MartaJuzepczuk commented Sep 5, 2017

janinamincer-daszkiewicz commented Sep 5, 2017

janinamincer-daszkiewicz commented Sep 5, 2017

MartaJuzepczuk commented Sep 6, 2017

MartaJuzepczuk commented Sep 6, 2017

wrygiel commented Sep 6, 2017

huntering commented Sep 6, 2017

huntering commented Sep 6, 2017

janinamincer-daszkiewicz commented Sep 7, 2017

wrygiel commented Sep 19, 2017

Alternative ways of exchanging arrival/departure dates #27

Alternative ways of exchanging arrival/departure dates #27

Comments

wrygiel commented May 31, 2017 • edited Loading

kamil-olszewski-uw commented May 31, 2017

wrygiel commented May 31, 2017

wrygiel commented May 31, 2017

BavoNootaert commented May 31, 2017 • edited Loading

wrygiel commented May 31, 2017

wrygiel commented May 31, 2017 • edited Loading

Solution 1 (the current one)

Solution 2 (switch the master)

Solution 3 (attach previous values)

Solution 4 (attach version numbers)

wrygiel commented May 31, 2017 • edited Loading

Solution 5 (attach timestamps)

georgschermann commented Jun 2, 2017 • edited Loading

wrygiel commented Jun 2, 2017 • edited Loading

huntering commented Jun 2, 2017

wrygiel commented Jun 2, 2017

huntering commented Jun 13, 2017

wrygiel commented Jun 13, 2017

huntering commented Jun 14, 2017

wrygiel commented Jun 14, 2017

wrygiel commented Jun 22, 2017

MartaJuzepczuk commented Sep 5, 2017

janinamincer-daszkiewicz commented Sep 5, 2017

janinamincer-daszkiewicz commented Sep 5, 2017

MartaJuzepczuk commented Sep 6, 2017

MartaJuzepczuk commented Sep 6, 2017

wrygiel commented Sep 6, 2017

Example

How it works currently

How it would (not?) work in Solution 2

huntering commented Sep 6, 2017

huntering commented Sep 6, 2017

janinamincer-daszkiewicz commented Sep 7, 2017

wrygiel commented Sep 19, 2017

wrygiel commented May 31, 2017 •

edited

Loading

BavoNootaert commented May 31, 2017 •

edited

Loading

wrygiel commented May 31, 2017 •

edited

Loading

wrygiel commented May 31, 2017 •

edited

Loading

georgschermann commented Jun 2, 2017 •

edited

Loading

wrygiel commented Jun 2, 2017 •

edited

Loading