Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GET pagination #3

Merged
merged 1 commit into from
Feb 27, 2024
Merged

Conversation

jabielecki
Copy link
Contributor

@jabielecki jabielecki commented Feb 22, 2024

Before this change, Client.Get could fetch max 500 items from the majority of API paths, because that's their default (and maximum) limit. Now it can transparently concatenate multiple GET calls into a single giant gjson result.

Assumption: a minority of API paths defaults to a smaller limit (e.g. 25 used by /v2/ippools) but a developer can force it with an explicit limit=500.

Before GET is issued, the func ensures that no writing (DELETE/POST/PUT) would run concurrently with it on the entire client (on any path - keep it simple).

It uses a single synchronizing goroutine instead of sync.RWMutex because there are use cases where both GETs and POSTs are lengthy and serializing either would be quite painful (e.g. try the basic guide with 3 images, serializing it already adds >1 hour).

With multiple GETs, the concurrency protection is uninterrupted from the first page until the last page. Protection from concurrent POST or DELETE helps against boundary items being shifted between the pages. Protection from concurrent PUT helps against items moving between pages, when sort becomes unstable due to modification of items.
Unfortunately, the protection does not cover any requests from other clients/processes/systems.

For WaitTask the protection spans until task ends, because I presume DB commit happens at the end of a task.

Known gc problem: presently this always leaves one goroutine for any client constructed. This requires interface change to clean that up (either cleanupFn or ctx), but I think it's unnoticeable for anyone who does not construct hundreds of client instances.

Alternative protection approaches considered before settling on this one:

  1. Specify limit=+infinity for GET:

    • Con: not in scope of client, it's an API change.
    • Pro: no concurrency problems.
  2. Shift offset by about +450 when fetching 500 items: have a margin.

    • Pro: no locking.
    • Pro: decent protection against all clients, not only ourselves.
    • Con: need to re-work the caller Get() interface, to explicitly include the id for elimination of duplicates. It's not only id, it also often uses convention someNameId.

@@ -319,17 +347,111 @@ func (client *Client) WaitTask(req *Req, res *Res) (Res, error) {
return *res, nil
}

// Get makes a GET request and returns a GJSON result.
// Results will be the raw data structure as returned by Catalyst Center
var maxItems = 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this as a const at the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, what I want to do (next PR or this PR - as you prefer) is to change the var in the unit tests. The test output is cumbersome with 500, so I'd change to maxItems=3 there

client.go Outdated Show resolved Hide resolved
Before this change, Client.Get could fetch max 500 items from the majority
of API paths, because that's their default (and maximum) limit. Now
it can transparently concatenate multiple GET calls into a single giant gjson result.

Assumption: a minority of API paths defaults to a smaller limit (e.g. 25
used by `/v2/ippools`) but a developer can force it with an explicit
limit=500.

Before GET is issued, the func ensures that no writing (DELETE/POST/PUT) would run concurrently with it
on the entire client (on any path - keep it simple).

With multiple GETs, the concurrency protection is uninterrupted from the first page until the last page.
Protection from concurrent POST or DELETE helps against boundary items being shifted between the pages.
Protection from concurrent PUT helps against items moving between pages, when sort becomes unstable due to
modification of items.
Unfortunately, the protection does not cover any requests from other clients/processes/systems.

For WaitTask the protection spans until task ends, because I presume DB commit happens at the end
of a task.

Known gc problem: presently this always leaves one goroutine for any client constructed. This requires interface
change to clean that up (either cleanupFn or ctx), but I think it's unnoticeable for anyone who does not construct
hundreds of client instances.

Alternative protection approaches considered before settling on this one:

1. Specify limit=+infinity for GET:
   - Con: not in scope of client, it's an API change.
   - Pro: no concurrency problems.

2. Shift offset by about +450 when fetching 500 items: have a margin.
   - Pro: no locking.
   - Pro: decent protection against *all* clients, not only ourselves.
   - Con: need to re-work the caller Get() interface, to explicitly
     include the id for elimination of duplicates. It's not only `id`,
     it also often uses convention `someNameId`.
@danischm danischm merged commit 4bb3e0a into netascode:main Feb 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants