Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide CNR API to notify changes in factsheet/institution #2

Closed
umesh-qs opened this issue Oct 22, 2021 · 106 comments
Closed

Provide CNR API to notify changes in factsheet/institution #2

umesh-qs opened this issue Oct 22, 2021 · 106 comments

Comments

@umesh-qs
Copy link

When the count of institutions increases in the network it will become resource-intensive to call and check for changes in the factsheet data for each institution. In fact, there should be a CNR API for institutions as well.

@janinamincer-daszkiewicz
Copy link
Member

What the other developers think?

@janinamincer-daszkiewicz
Copy link
Member

To whom this CNR should be sent?

@umesh-qs
Copy link
Author

umesh-qs commented Nov 7, 2021

To the partners with whom the sender of the CNR has agreements with.

@janinamincer-daszkiewicz
Copy link
Member

Who else is interested in such API? Please give pros and cons.

@ArgyrisBesinas
Copy link

We could add a modified_since parameter in the factsheet endpoint that returns an empty response if nothing changed in order to minimise the network load and avoid having a CNR api with no specific targets.

This is considering that we save every factsheet from other heis in our system and don't just call it every time it is needed.

@mkurzydlowski
Copy link
Contributor

I think the easiest and most straightforward approach is the best here.

You shouldn't need to store the factsheet copy of your partners. What is good enough is to ask for the factsheet but respect the caching headers sent in the response.

I would suggest using ETags for those clients that want to have the factsheet as fresh as it can be. That way there is no unnecessary bandwidth usage if the factsheet doesn't change. That is also the fastest way to get a fresh factsheet (waiting for a notification, if it is sent to over 1000 nodes, would introduce more lag in such scenario).

@umesh-qs
Copy link
Author

@mkurzydlowski
Factsheet data is used in IIA template. From what I understand you are suggesting to internally call factsheet API, build the IIA document and then respond to the caller of IIA API. Please correct me if understanding is wrong.

@umesh-qs
Copy link
Author

We could add a modified_since parameter in the factsheet endpoint that returns an empty response if nothing changed in order to minimise the network load and avoid having a CNR api with no specific targets.

This is considering that we save every factsheet from other heis in our system and don't just call it every time it is needed.

@ArgyrisBesinas can you share more details on how this will work?

@kamil-olszewski-uw
Copy link

@umesh-qs I'm not sure what do you mean by internally calling API or building IIA. If you mean that a client system can do it locally, i.e. download a factsheet from a server system and then display IIA to its user with the most up-to-date fasctsheet data, then this is how it could look like.

When we display IIAs to users, we won't be showing them a factsheet data with each agreement. There will be a separate form for this in the system. On the business side, a factsheet is part of every IIA, but we won't inform users of a change in each agreement with a partner that changed something in their factsheet data. For IIA as such, it does not matter that something has changed, e.g. in the information about visa or housing.

@umesh-qs
Copy link
Author

@kamil-olszewski-uw
We generate IIA PDF and send it in the IIA API using some of factsheet data. So both are in a way linked.
Which means calling factsheet API every time a user opens any IIA of the partner.

Below is the agreement template we use for sending PDF in IIA API
https://erasmus-plus.ec.europa.eu/resources-and-tools/inter-institutional-agreement

@umesh-qs
Copy link
Author

umesh-qs commented Apr 7, 2022

CNR will be needed further since content in factsheet will also being linked to mobility API in coming days.

@jiripetrzelka
Copy link

From the Mandatory Business Requirements, page 7 (EWP User stories regarding Factsheets):
image

I don't quite understand the statement "partners are notified": Does this mean that users of the partner system should be notified by their system (an internal message, e-mail, ...) whenever any Factsheet of any partner with whom they have an IIA changes, or does it imply that the authors of the document have already decided that a technical solution must be based on the push approach and not on the pull approach?

@janinamincer-daszkiewicz
Copy link
Member

As I recall technical solution preceded Mandatory Business Requirements. We can now come back to this issue. Definitely we want to have consistent documents.

@umesh-qs
Copy link
Author

umesh-qs commented Mar 8, 2023

Even without mandatory business requirement, it is quite obvious that there has to be some way to tell partners about any changes in the factsheet. It is all about how far one can think when designing any API.

@georgschermann
Copy link

I think CNRs are not viable for these non-bilateral APIs. If you have a client which changes the factsheet data, forgets something, changes it again, made a type and changes it again, this results in ~10000 CNRs?
Currently we go with internal Caches with defined lifetimes.
For these APIs we could easily make Cache-Control, ETag, Expires, etc. headers suggested or mandatory to control caching.

@umesh-qs
Copy link
Author

I think CNRs are not viable for these non-bilateral APIs. If you have a client which changes the factsheet data, forgets something, changes it again, made a type and changes it again, this results in ~10000 CNRs? Currently we go with internal Caches with defined lifetimes. For these APIs we could easily make Cache-Control, ETag, Expires, etc. headers suggested or mandatory to control caching.

10000 CNRs? Do you have so many partners having active agreement for current academic year?
In-fact it is other way around. One would have to call factsheet API ~10000 times, for each mobility application, IIA if there is no CNR.

@umesh-qs
Copy link
Author

If CNR is not convenient, can we have institution index API with modified_since parameter. This will give list of all the institution that has any changes in institution data and changes in factsheet data.

@mkurzydlowski
Copy link
Contributor

mkurzydlowski commented Mar 16, 2023

Using caching is simpler, more convenient for both server and client than requiring servers to implement modified_since (and returning empty result if not changed?). Why not go with what I already proposed:

#2 (comment)

Also Georg seems to agree with this solution:

#2 (comment)

@umesh-qs
Copy link
Author

This is not a solution. Problem is how would the partners know about the changes in the factsheet.

@umesh-qs
Copy link
Author

Also if I remember correctly you also proposed a similar for file changes. But that was rejected in favor of IIA CNR

@janinamincer-daszkiewicz
Copy link
Member

This is another use case and a general mechanism concerning handling of files, not only in the context of IIAs. See https://github.com/erasmus-without-paper/ewp-specs-api-file#file-immutability. Shortly speaking: IIA CNR is sent when file id changes. File id changes when the content of the file changes.

@umesh-qs
Copy link
Author

This is another use case and a general mechanism concerning handling of files, not only in the context of IIAs. See https://github.com/erasmus-without-paper/ewp-specs-api-file#file-immutability. Shortly speaking: IIA CNR is sent when file id changes. File id changes when the content of the file changes.

Yes. There was also a discussion on not changing file ID and getting the file content real time. For this proposal caching was proposed. But that proposal was dropped in favor of changing the file id and using CNR

@janinamincer-daszkiewicz
Copy link
Member

As I told yesterday not every use case is the same and yes we have different solutions for different use cases.
But let's give the others chances to share their opinion. Let's see if there are more providers who want to have CNR for factsheets/organizational units/institutions.
We have to have one solution for all.

@umesh-qs
Copy link
Author

I also proposed another solution if CNR is not convenient. We can have index endpoint for institution API that has "modified_since" parameter to returns only changed institution/factsheet

@janinamincer-daszkiewicz
Copy link
Member

Here is counter proposal:#2 (comment).

@umesh-qs
Copy link
Author

that proposal is just a workaround and not catering to the actual requirement/business requirement

@umesh-qs
Copy link
Author

I also proposed another solution if CNR is not convenient. We can have index endpoint for institution API that has "modified_since" parameter to returns only changed institution/factsheet

what is the problem with this?

@mkurzydlowski
Copy link
Contributor

I don't see any pros compared to caching but I certainly see some cons:

  • clients need to be re-implemented to pass the date of last request (?) and handle empty responses,
  • API needs to allow empty responses,
  • servers need to be re-implemented to handle the new parameter and cannot cache the response; they need to check the parameter and compare it to last modification date.

What problem are you trying to resolve with your solution? Why is caching not good enough for you?

@umesh-qs
Copy link
Author

Single index endpoint (per provider) that gives list of institution that has most recent change in either of them should work fine.

But after Discovery v6 there is only one institution for every node in the EWP network.

Please specify the parameters of this new index endpoint and the response.

Will it be called before every GET to Institution and Factsheet?

At the time of Discovery v6, I suggested to keep institution related and course unit APIs out of 1-to-1 mapping because it might need bulk refresh. We might have to make provision for such cases.

@georgschermann
Copy link

a combined index endpoint for "master-data" inst/ounits/factsheet/(courses-in-general) is still a valid option, but I think still far less feasible than adding the last changed information to the manifest

if added to the manifest I think it should be made mandatory, so it can be reliably used for all partners.

A specific parameter could be defined for this but it is not needed, any URL change will do. The URLs are fetched from the catalogue anyway, the only changes needed in our case would be the added parameter in manifest and a switch of the max staleness of these requests to infinity.

@mkurzydlowski
Copy link
Contributor

mkurzydlowski commented May 29, 2023

To sum this up, I propose to choose solutions in the two categories as suggested by @jiripetrzelka.

I would suggest choosing at most one solution for A and one or none for B.

A. Strategy to inform partner about changes in Factsheet/Institution/Ounits

I. Adding CNR for Factsheet, Institution, Ounits

  • The list of CNR receivers must be well defined.
  • There might be hundreds of partners we would need to notify. They might not be even interested in preloading the new data and would ignore the CNR and rather ask for the newest data when needed. Client needs to implement some kind of queue to handle this.
  • Still partners need to manually ask for the data, as there is not guarantee in CNR delivery.
  • Needs new endpoint and implementation – both client and server.

II. Publishing the date of the last modification of Factsheet/Institution/Ounits in the manifest

  • Clients can display the last modification date and give the user an option to decide or they might store the last modification date and compare (periodically when browsing the catalogue?).
  • Manifests would need to store a completely new kind of data – modification dates.
  • Needs changes to Discovery API, servers and clients.

III. Changing API URLs after Factsheet/Institution/Ounits data changes

  • Clients might compare (periodically when browsing the catalogue?) with the previous URL. [what if the &last_modification_date= was used as part of the URL? – see option II]
  • No changes to Discovery, only server needs to change and those clients who indeed want to detect a change. Otherwise they would just ask and receive fresh (not cached) data (see B.I).

IV. Adding an index endpoint for Factsheet and Institution with last_modified parameter

  • What is the response? Doesn’t really matter (but still has to be decided) as it only signifies that there is a change to Factsheet or Institution that occured after last_modified date.
  • The response is empty if neither Factsheet nor Institution has changed after the given last_modified date.
  • Needs new endpoint and implementation – both client and server.

B. Strategy to reduce the load once the request is actually made

I. Adding HTTP Caching recommended for Factsheet, Institution, Ounits server implementations

  • Doesn't need any specification changes other than recommending to use HTTP caching for those APIs. For example:
  • Servers can define how long data should be considered fresh with Cache-Control max-age.
    • ETag can optimize HTTP load when data cannot be considered fresh anymore.
    • Switching caching on for clients is easy as this is a standard HTTP library feature. Adding caching for server is supported by frameworks.
  • Treats this data as a nearly static HTTP resource.
  • Can enable (optional and if we allow) the most network saving option of unconditional caching and invalidating the cache by changing API endpoint URL when data changes (also suggested in GitHub in a form of storing last modification date in the manifest). Also no specification and client implementation changes needed.

@georgschermann
Copy link

looks good, we vote A3+B1

@umesh-qs
Copy link
Author

umesh-qs commented Jun 1, 2023

IV. Adding an index endpoint for Factsheet and Institution with last_modified parameter

  • What is the response? Doesn’t really matter (but still has to be decided) as it only signifies that there is a change to Factsheet or Institution that occured after last_modified date.
  • The response is empty if neither Factsheet nor Institution has changed after the given last_modified date.
  • Needs new endpoint and implementation – both client and server.

@mkurzydlowski This is not what I proposed. I proposed a single endpoint per provider, that returns the list of HEIs hosted by the provider, that have changes in Institution, any ounit and factsheet response since last_modified date.

@mkurzydlowski
Copy link
Contributor

After Discovery 6.0 has been introduced each manifest has only one EWP host and can cover only one HEI, so there is not way to have one endpoint for many HEIs. Also endpoints are unique in EWP network.

@umesh-qs, can you explain?

@umesh-qs
Copy link
Author

umesh-qs commented Jun 1, 2023

After Discovery 6.0 has been introduced each manifest has only one EWP host and can cover only one HEI, so there is not way to have one endpoint for many HEIs. Also endpoints are unique in EWP network.

@umesh-qs, can you explain?

As I mentioned earlier, I was not in favor having "only one EWP host and can cover only one HEI" for General Purpose APIs. But you insisted. For this new API we just have to add in the manifest description to ignore the caller HEI when sending response. Not a big change.

@mkurzydlowski
Copy link
Contributor

There is no way to have one endpoint that covers multiple HEIs because every endpoint needs to be unique.

@umesh-qs
Copy link
Author

umesh-qs commented Jun 1, 2023

There is no way to have one endpoint that covers multiple HEIs because every endpoint needs to be unique.

Yes with current manifest design, there is no use of institution index endpoint for each HEI. My proposal is only valid if we can make way for single endpoint per provider for general purpose APIs

@demilatof
Copy link

B. Strategy to reduce the load once the request is actually made

I. Adding HTTP Caching recommended for Factsheet, Institution, Ounits server implementations

Why are we intruding with the local system strategy?
The caching depends on:

  • The server: I receive too many request then I choose to implement a caching
  • The client: I don't want to ask the same data whenever I need it, I save it locally every day (for example)

It's not up to us to decide about this, we could only recommend an optimal refresh time.

@demilatof
Copy link

A. Strategy to inform partner about changes in Factsheet/Institution/Ounits

I think we can answer to this question only if we know the BPOs requirements about:

  • How many times Factsheet/Institution/Ounits change?
  • Where we need to expose them?
  • Are they part of the IIA and Learning Agreements?
  • If yes, have us to take a snapshot of them each time we approve an IIA or a Learning Agreement?
  • Can someone remove an Institution or an Ounit after it has been used in an approved IIA/LA?
  • Are they valid at runtime, that is they matter for the values they offer when we read them?

Do we know all of this?

@serkanUH
Copy link

serkanUH commented Jun 5, 2023

Hasselt University :
We voted None for A because this would mean some extra implementation for the near future and a extra maintenance on longer terms during the life-cycle of the application without really adding an added value.

We would only be interested to know the changes of these objects if we cared about keeping history of it. For us we are only interested in live data of these objects and this can be easely done by calling the endpoints. Worst case is that you have to call the eindpoint more than one time ( 2 , 3 times , so what ? )

Let me also express my worries again that we try to find ways to let a distributed system behave like a centralised one. This has results that we are ending in not well thought , low quality of coding ( read: hard to maintain code ) . Like the options given by this question.

For B we opted for option 1 . Adding HTTP Caching recommended for Factsheet, Institution, Ounits server implementations. But we would like to stress that is "recommended" like stated above and not mandatory.

@georgschermann
Copy link

It's not up to us to decide about this, we could only recommend an optimal refresh time.

I also understand the proposal as "recommending to use caching for those APIs", specifics are up to the server / client providers.

Worst case is that you have to call the eindpoint more than one time ( 2 , 3 times , so what ? )

Currently looking at 3471508 served requests (all APIs) last month.
If you multiply the 3 calls by the 3200 partners in the network or the exchanges done, you could save quite some requests/responses server side.

Plus about 200ms added to the user requests including this data when live-calling the APIs, but that remains a decision of the client implementer.

As stated in earlier discussions clients not utilazing it doesn't need to be changed at all. Server implementations are "only" required to add the last changed date to URL or manifest, this is some additional effort, but should be managable.

@umesh-qs
Copy link
Author

umesh-qs commented Jun 5, 2023

It's not up to us to decide about this, we could only recommend an optimal refresh time.

I also understand the proposal as "recommending to use caching for those APIs", specifics are up to the server / client providers.

@georgschermann then what is the need for the vote?

Worst case is that you have to call the eindpoint more than one time ( 2 , 3 times , so what ? )

Currently looking at 3471508 served requests (all APIs) last month. If you multiply the 3 calls by the 3200 partners in the network or the exchanges done, you could save quite some requests/responses server side.

Plus about 200ms added to the user requests including this data when live-calling the APIs, but that remains a decision of the client implementer.

As stated in earlier discussions clients not utilazing it doesn't need to be changed at all. Server implementations are "only" required to add the last changed date to URL or manifest, this is some additional effort, but should be managable.

@georgschermann
Copy link

then what is the need for the vote?

I don't know.
Probably some providers may have objections to their responses being cached? Some providers may prefer not to have such recommendations to not further complicate the specifications? Probably all changes to the specification should be based on a voting?

At least it gave us the possibility to express that we are in favor of explicitly recommending the use of caching mechanisms.

@serkanUH
Copy link

serkanUH commented Jun 5, 2023

Currently looking at 3471508 served requests (all APIs) last month.
If you multiply the 3 calls by the 3200 partners in the network or the exchanges done, you could save quite some requests/responses server side.

@georgschermann that is why you don't want institution cnr . No need to automatically keep data of all institutions up to date all the time . IMO the cnr's are the source of this traffic and you want more cnrs ?

We will do just 1 call of the institution api only when needed. Not to keep our db up to date on any given moment.

@umesh-qs
Copy link
Author

umesh-qs commented Jun 5, 2023

then what is the need for the vote?

I don't know. Probably some providers may have objections to their responses being cached? Some providers may prefer not to have such recommendations to not further complicate the specifications? Probably all changes to the specification should be based on a voting?

At least it gave us the possibility to express that we are in favor of explicitly recommending the use of caching mechanisms.

@georgschermann so what you are saying is that if the vote is "None" for cache, you are no longer allowed to use caching mechanisms? And whoever has already implement caching must remove it?

@umesh-qs
Copy link
Author

umesh-qs commented Jun 5, 2023

Currently looking at 3471508 served requests (all APIs) last month.
If you multiply the 3 calls by the 3200 partners in the network or the exchanges done, you could save quite some requests/responses server side.

@georgschermann that is why you don't want institution cnr . No need to automatically keep data of all institutions up to date all the time . IMO the cnr's are the source of this traffic and you want more cnrs ?

We will do just 1 call of the institution api only when needed. Not to keep our db up to date on any given moment.

@serkanUH you are free to disregard the CNR option and choose another one. You might not have that requirement for keeping db up to date. Othes might have. This is already discussed in this thread. Please have a look at the complete discussion.

@georgschermann
Copy link

georgschermann commented Jun 5, 2023

IMO the cnr's are the source of this traffic and you want more cnrs ?

to quote myself

I think CNRs are not viable for these non-bilateral APIs. If you have a client which changes the factsheet data, forgets something, changes it again, made a typo and changes it again, this results in ~10000 CNRs?

The data in question changes rarely, so we don't want to fetch the same data tens of thousands of times just because it MAY change once a year, hence we support/propose a url change or date in manifest. Bonus points to your user wanting to live-loading the faculty hierarchy of some HEIs with 200+ ounits...

so what you are saying is that if the vote is "None" for cache, you are no longer allowed to use caching mechanisms? And whoever has already implement caching must remove it?

if the majority of providers votes none for the caching, it won't be added as a recommendation to the specification.
if another voting is done in favor to FORBID caching in the specification and it is supported by the majority of providers ,..

@umesh-qs
Copy link
Author

umesh-qs commented Jun 5, 2023

if the majority of providers votes none for the caching, it won't be added as a recommendation to the specification.

You are contradicting your point "Probably some providers may have objections to their responses being cached?".

@georgschermann
Copy link

The question voted was to add a recommendation for caching to the specification. With answers yes-add/no-leave-as-is.

If the question would have been "caching?" with answers yes-must, yes-recommended, don't-care-no-change, no-discouraged, no-forbidden, it would have been different, but that was not the question.

As said, anyone objecting caching, can voice his reasoning, request a new vote to forbid/discourace caching, etc.

@demilatof
Copy link

I think I'll answer none to be more conservative with actual strategy, since the questions are not previously discussed.
I've not understood what is the refresh rate of that data and when they are valid (in real time, even for previous documents that are using them, or we have to keep a history?).

As concern HTTP Caching, it is not exactly my primary specialization, but it seems to me that HTTP Caching could mean different caching strategies. Without the final solution to be implemented, I think the vote is useless.
HTTP Caching could use Cache-Control, ETags, or Last-Modified...
If I implement Etags and @umesh-qs implements Last-Modified, where is the advantage?
Have we all to implement all the solutions?

As concern publishing the date of the last modification in the manifest, it means that we, as a client, have to read many times the manifest to be sure to have the more recent date; whilst as a server we should have programmatically access to the manifest to edit it every time we make a change. And it could not be always possible (e.g., we are a provider behind another provider who takes care of the security level).

A good solution may be the index, hoping that it would be always updated correctly.
At this point, why don't we add the same timestamp for IIA-Index?
I think that most of us are reading daily (at least) the index to fetch IIAs. We could skip a lot of IIAs if we know we have already fetched them previously.

@demilatof
Copy link

As said, anyone objecting caching, can voice his reasoning, request a new vote to forbid/discourace caching, etc.

Before we have to:

  • Understand the problem;
  • Decide when Factsheet, Institutions, Ounits may change and what does it mean;
  • Decide if the caching is needed for others API;
  • Decide if a proposal involves the adoption of new APIs (such as a new Index for Factsheet, Institutions, Ounits)
  • Decide the caching strategy;

Then we may vote; voting on a black box according to everyone's suggestion is not good and brings only confusion

@umesh-qs
Copy link
Author

umesh-qs commented Jun 5, 2023

The question voted was to add a recommendation for caching to the specification. With answers yes-add/no-leave-as-is.

If the question would have been "caching?" with answers yes-must, yes-recommended, don't-care-no-change, no-discouraged, no-forbidden, it would have been different, but that was not the question.

As said, anyone objecting caching, can voice his reasoning, request a new vote to forbid/discourace caching, etc.

@georgschermann that is your understanding. But lets assume that voting is for as you described. So in either case users are free to implement caching. And voting is to decided if this has to be added as recommended in the specification.
Really? Are we serious? And you are ok with this? And you want everyone to participate in this unnecessary stuff, just because someone is over enthusiastic about caching.

@georgschermann
Copy link

There are currently already recommendations regarding caching, etc. in the specifications. I would welcome such a recommendation for these nearly-static APIs as well, so newer developers are aware of the consequences of their decisions. If there is a voting for this or anyone participates - i don't really care.

Why only consider inst/ounit/factsheet live data, why not read the registry catalogue everytime you need a url, since the urls could change anytime, better be sure your have up to date urls?

I have seen the registry, dashboard and some of our nodes go down several times because of poor caching decisions / implementations by providers (us included), if this could be prevented - why not.

@umesh-qs
Copy link
Author

umesh-qs commented Jun 6, 2023

There are currently already recommendations regarding caching, etc. in the specifications. I would welcome such a recommendation for these nearly-static APIs as well, so newer developers are aware of the consequences of their decisions. If there is a voting for this or anyone participates - i don't really care.

And who is stopping this to be added as a recommendation? Was there a voting for https://github.com/erasmus-without-paper/ewp-specs-api-registry#caching or any other recommendations? Point was, why confuse everyone if as per you it anyway will be optional. Since you justified the voting hence questions are directed to you.
And I do care, because it is only creating confusion. Tomorrow @janinamincer-daszkiewicz might show voting results and say caching is mandatory.

Why only consider inst/ounit/factsheet live data, why not read the registry catalogue everytime you need a url, since the urls could change anytime, better be sure your have up to date urls?

I have seen the registry, dashboard and some of our nodes go down several times because of poor caching decisions / implementations by providers (us included), if this could be prevented - why not.

@janinamincer-daszkiewicz
Copy link
Member

Email from the Relationship manager:
Please find the results of the latest survey of the infrastructure forum on the shared drive.
We are analysing the results internally and will inform you about the conclusions as soon as possible.

@janinamincer-daszkiewicz
Copy link
Member

Strategy to reduce the load once the request is actually made: Adding HTTP Caching recommended for Factsheet, Institution, Ounits server implementations

We will add these recommendations to relevant APIs when making other changes .

@mkurzydlowski
Copy link
Contributor

Please review this proposal:
https://github.com/erasmus-without-paper/ewp-specs-api-factsheet/compare/caching
The same changes will be introduced for Institutions API and Ounits API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests