feat(#9237): add functionality of getting people with pagination in cht-datasource #9266

sugat009 · 2024-07-16T08:43:22Z

Description

Add functionality of getting people with pagination in cht-datasource.

Usage:

Person.v1.getPage(limit, skip); // limit and skip are optional with defaults as 100 and 0 respectively

Code review checklist

Readable: Concise, well named, follows the style guide, documented if necessary.
Documented: Configuration and user documentation on cht-docs
Tested: Unit and/or e2e where appropriate

Compose URLs

If Build CI hasn't passed, these may 404:

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

…ht-datasource

jkuester

Sorry, I was not able to make it through everything today (just ran out of time), but want to share my thoughts so far. Will continue going through this tomorrow!

shared-libs/cht-datasource/src/local/libs/doc.ts

shared-libs/cht-datasource/src/local/libs/lineage.ts

shared-libs/cht-datasource/src/local/person.ts

shared-libs/cht-datasource/src/person.ts

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

shared-libs/cht-datasource/src/person.ts

shared-libs/cht-datasource/src/index.ts

shared-libs/cht-datasource/src/remote/person.ts

shared-libs/cht-datasource/src/local/person.ts

sugat009 · 2024-07-19T10:40:29Z

@jkuester I did not add unit tests for this function(

cht-core/shared-libs/cht-datasource/src/remote/libs/data-context.ts

Line 63 in de63197

    
           export const getResources = (context: RemoteDataContext, path: string) => async <T>(

) as NodeJS 20+ has an issue in which mocking fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:

cht-core/shared-libs/cht-datasource/src/remote/libs/data-context.ts

Line 41 in 3c2b140

    
           export const getResource = (context: RemoteDataContext, path: string) => async <T>(

References: here and here.

jkuester · 2024-07-23T03:10:31Z

NodeJS 20+ has an issue in which mocking fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:

🤔 These existing tests are working fine for me with node 20.11.0 (and are running in CI with that same version). What version of Node are you using?

jkuester

Looking great! Got a couple other nit-picks and suggestions, but all in all we are close to done here.

shared-libs/cht-datasource/src/local/libs/lineage.ts

shared-libs/cht-datasource/test/local/libs/lineage.spec.ts

shared-libs/cht-datasource/test/local/libs/doc.spec.ts

shared-libs/cht-datasource/test/person.spec.ts

shared-libs/cht-datasource/src/person.ts

shared-libs/cht-datasource/src/index.ts

shared-libs/cht-datasource/test/index.spec.ts

sugat009 · 2024-07-23T04:12:08Z

NodeJS 20+ has an issue in which mocking fetch is not possible. Also, existing tests that have mocked fetched are failing as well. For eg:

🤔 These existing tests are working fine for me with node 20.11.0 (and are running in CI with that same version). What version of Node are you using?

I'm on Node version v20.12.2

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

jkuester · 2024-07-23T13:30:08Z

I'm on Node version v20.12.2

😓 I can reproduce the error locally when I switched to 20.12.2. Going to do some more digging to see what the best response is here....

jkuester · 2024-07-23T13:38:57Z

Actually, digging in more to the issues you linked it looks like this was caused by Node changes introduced in 20.12.0 and then subsequently fixed in 20.13.1.

Since this issue is localized to a few minor Node versions and is fixed in the latest releases of 20.x and 22.x, I think it is fine to not worry about it and @sugat009 you just need to update your Node version to 20.13.1+. 😅 (I confirmed it works fine on v20.15.1.)

jkuester

Looks like CI is failing for a simple linting error. Should be an easy fix. I added a few more nit-pick comments below, but otherwise LGTM! 🚀

shared-libs/cht-datasource/src/person.ts

shared-libs/cht-datasource/test/local/libs/lineage.spec.ts

shared-libs/cht-datasource/test/local/person.spec.ts

shared-libs/cht-datasource/src/local/libs/lineage.ts

shared-libs/cht-datasource/src/person.ts

shared-libs/cht-datasource/src/index.ts

jkuester · 2024-07-25T19:12:59Z

shared-libs/cht-datasource/src/local/person.ts

+
+      const docs = await getDocsByPage([personType.contactType], limit, skip);
+
+      return docs.filter((doc): doc is Person.v1.Person => isPerson(settings, doc, doc?._id));


@m5r @sugat009 I have realized a complication here while looking at the code over in #9281. Basically, we pass the limit through to the Couch view query and so we know docs.length === limit or we have reached the last page of results. However, before this function returns the persons, we are filtering the array. In theory it is possible that some of the results returned from queryDocsByKey could get filtered out (with our current validation logic I cannot come up with a feasible scenario where this could happen, but if/when we add more validations, this will become an issue).

The reason I think this filtering is a problem is because it could result in us giving back an array for the page that is < limit when there are still additional results to retrieve. Of course, this is bad for our getAll logic, but it will also probably be confusing to any other consumers of this code. (How would anyone know when to stop calling getPage?).

The only feasible option I can come up with here is to change the return type to Promise<Nullable<Person.v1.Person>[]> and just pass through null when a returned doc fails the isPerson check. If we try to do anything more fancy, like calling the queryDocsByKey again until we get the full limit of persons, it is going to mess up the skip value for any subsequent calls from the consumer... The only comfort I have here is that we do not really want folks directly calling this function anyway (and the getAll can still abstract away the Nullable). The REST api is going to be a bit janky, but I just don't see any other way around it.

Do either of you have other thoughts here? Ideas on better options?

TBH, I'm surprised that we even need a isPerson validator since we explicitly pass a valid personType to the database. Anyways, @jkuester is right, if in some way an entry is filtered by the isPerson function then the sanctity of the pagination would be disturbed, so let's have a null value in the place of the non-person doc.

As an alternative approach, instead of setting the document entry to null, we could introduce an invalid property or similar indicator. This would signify that the person document contains incorrect or problematic data, without completely removing the entry and let the consumers decide what to do with it.

I'm surprised that we even need a isPerson validator since we explicitly pass a valid personType to the database

This is true. I have gone back and forth on if we should keep the isPerson check since all it is doing currently is validating the contact type (which, in theory should always be correct since that is what we are querying one...). However, regardless of if we need to do any checks like this currently, I do think it is very important that we design these API patterns in such a way that we can filter data we get from Couch before returning it to the user. One of the main points of having cht-datasource is that not everything needs to be modeled directly in the db.... With that in mind, and seeing as the performance cost of isPerson is minimal, it makes sense to me to keep it for the sake of code consistency (and better support for future validations).

the sanctity of the pagination would be disturbed

Quote of the day! 🔥

we could introduce an invalid property or similar indicator

This is an interesting idea! 🤔 Similar to the Pouch API which returns the row objects with id value, etc, but may or may not actually have a doc. Ultimately, though, it seems like it might not actually be useful to consumers of this API (and anyone trying to do data validation will probably want to connect directly to the database).

Another approach we could take here would be to return some kind of page object like this:

Promise<{ entries: Person.v1.Person[], cursor: string }>

This is similar to the paging in GitHub's GraphQL api, where each request returns a different cursor that you can pass back in on subsequent requests to get the next page. In our case we could just return the proper skip value for the next Couch query, but the meaning/value of the cursor should be completely abstracted away from consumers of the cht-datasource api. Consumers can just treat it as a token string for getting the next page. That would give us flexibility in the future if we needed to change up some underlying details of how things work. In the context of the above discussion regarding isPerson, this approach would let us run followup queries to populate a full list of persons to return so that entries.length === limit while returning the proper skip value for the next page request as the cursor.

I don't know guys, the more I think about this approach, the more I like it! It decouples us more from the underlying Couch implementation, which I think is a win, giving us more flexibility for future innovation....

One variation on this new page object proposal would be to include a hasNextPage: boolean in the returned page object so that consumers know when more data is available. Then, we could clarify that the provided limit would be the max value possible for entries, (but some pages might be returned with less). This would free us from having to implement any kind of followup queries in the getPage logic (at least at this time). We could run the Couch query, do any necessary filtering, and return what is left.

@jkuester Cursor-based pagination sounds great and would help if any record is filtered by the isPerson or any future filters. As for the implementation, should we do it via:

overfetching method where we fetch more docs than the limit specifies such that if records are filtered out then, the over-fetched docs can fill in

re-fetching method where we re-query the database with the amount of docs that were filtered out.

Usually, it is said to be better to minimize the no. of queries to the database which is even more true in our case. 😃 So, my vote is in the first method. What do you think?

Looping in @m5r .

I read through the linked sites by @m5r and tried a bit on our couchdb view(medic-client/contacts_by_type). The startkey as a cursor approach does not seem to work for the view as all the keys are person types. We probably need to create a new view designed specifically for pagination.

I have not tried it yet, but at the end of that Couch guide, it notes that for views where the key is not unique, you will need to use a combination of startkey, startkey_docid, and limit. The startkey_docid is what will actual do the heavy lifting for our contacts_by_type query since we are not actually looking for a range of keys...

Good point Josh, the fact that the key is the same is what enables couch to do the pagination here. I gave it a try locally and it should do exactly what we need:

Querying the view for persons:

$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]" {"total_rows":6,"offset":3,"rows":[ {"id":"3e93a66c-06c9-4766-b68b-eaec896ee935","key":["person"],"value":"false false 3 grootchw"}, {"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"}, {"id":"de324069-da24-4ffa-89f3-f7f67b859d70","key":["person"],"value":"false false 3 grootpatient"} ]}

Then using the person with id 6892b0eb-7398-47a9-8e92-d40d3012f9f6 as cursor:

$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]&startkey_docid=6892b0eb-7398-47a9-8e92-d40d3012f9f6" {"total_rows":6,"offset":4,"rows":[ {"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"}, {"id":"de324069-da24-4ffa-89f3-f7f67b859d70","key":["person"],"value":"false false 3 grootpatient"} ]}

Then limiting the number of rows that I get back:

$ curl "http://admin:password@localhost:5984/medic/_design/medic-client/_view/contacts_by_type?key=\[\"person\"\]&startkey_docid=6892b0eb-7398-47a9-8e92-d40d3012f9f6&limit=1" {"total_rows":6,"offset":4,"rows":[ {"id":"6892b0eb-7398-47a9-8e92-d40d3012f9f6","key":["person"],"value":"false false 3 grootvisor"} ]}

Ahhh! Gotcha. I was under the impression that the key also needed to be a uuid. Thanks!

Okay, so just recapping further discussion from the Slack Thread, it turns out that Pouch does not have any support for the startkey_docid parameter. 😓 Also from reading the Couch docs it seems like the performance for skip significantly improved after Couch 1.2 (I guess the guide linked above was old?). The Pouch Docs still note that skip has "poor performance on IndexedDB/LevelDB", but there does not seem to be any other way to actually page queries with Pouch, so... 🤷

So the decision is to stick with just using skip to page the queries for now. We should still do the refactor to return the skip value as a generic cursor to consumers instead of actually exposing skip. That will let us passively refactor our implementation once pouchdb/pouchdb#7326 is completed.

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

…ht-datasource (#9266)

feat(#9237): Add functionality of getting people with pagination in c…

08e2b27

…ht-datasource

sugat009 added the Type: Feature Add something new label Jul 16, 2024

sugat009 requested a review from jkuester July 16, 2024 08:43

sugat009 self-assigned this Jul 16, 2024

sugat009 changed the title ~~feat(#9237): Add functionality of getting people with pagination in cht-datasource~~ feat(#9237): add functionality of getting people with pagination in cht-datasource Jul 16, 2024

jkuester requested changes Jul 17, 2024

View reviewed changes

Update shared-libs/cht-datasource/src/local/libs/doc.ts

ab92f0b

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

jkuester requested changes Jul 17, 2024

View reviewed changes

feat(#9237): Address PR comments

de63197

sugat009 requested a review from jkuester July 19, 2024 15:27

jkuester requested changes Jul 23, 2024

View reviewed changes

sugat009 and others added 7 commits July 23, 2024 10:37

Update shared-libs/cht-datasource/src/qualifier.ts

105d5db

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/test/local/person.spec.ts

2b92b4c

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/src/person.ts

ba2e27a

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/src/remote/person.ts

15068fa

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/src/person.ts

a83a259

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/src/index.ts

29a3ad5

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

feat(#9237): Address PR comments

8fbc245

sugat009 added 3 commits July 23, 2024 22:06

feat(#9237): Add unit tests for getResources in remote mode

a99de72

feat(#9237): Address PR comments

d87c258

(#feat): Minor fix

eceeb75

sugat009 mentioned this pull request Jul 25, 2024

feat(#9238): add functionality of getting people as an AsyncGenerator in cht-datasource #9281

Merged

3 tasks

sugat009 requested a review from jkuester July 25, 2024 15:01

jkuester approved these changes Jul 25, 2024

View reviewed changes

jkuester reviewed Jul 25, 2024

View reviewed changes

shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved

jkuester reviewed Jul 25, 2024

View reviewed changes

shared-libs/cht-datasource/src/index.ts Outdated Show resolved Hide resolved

jkuester reviewed Jul 25, 2024

View reviewed changes

sugat009 and others added 5 commits July 26, 2024 12:19

Update shared-libs/cht-datasource/src/person.ts

1aab0e3

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/test/local/libs/lineage.spec.ts

54d04cb

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/test/local/libs/lineage.spec.ts

dcc3758

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

Update shared-libs/cht-datasource/test/local/person.spec.ts

e800b1a

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>

feat(#9237): Address PR comments

34b466c

sugat009 mentioned this pull request Aug 1, 2024

feat(#9241): create REST API endpoint for getting people #9295

Merged

3 tasks

jkuester self-requested a review August 5, 2024 16:11

sugat009 added 2 commits August 6, 2024 13:30

Implement cursor based pagination

4facbad

Fix unit tests according to implementation of cursor pagination

33d13fd

sugat009 merged commit b20dc22 into 9193-api-endpoints-for-getting-contacts-by-type Aug 9, 2024
18 of 40 checks passed

sugat009 deleted the 9237-add-functionality-of-getting-people-with-pagination-in-cht-datasource branch August 9, 2024 13:33

sugat009 added a commit that referenced this pull request Aug 12, 2024

feat(#9237): add functionality of getting people with pagination in c…

bf70789

…ht-datasource (#9266)

sugat009 added a commit that referenced this pull request Aug 12, 2024

feat(#9237): add functionality of getting people with pagination in c…

cd95b18

…ht-datasource (#9266)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#9237): add functionality of getting people with pagination in cht-datasource #9266

feat(#9237): add functionality of getting people with pagination in cht-datasource #9266

sugat009 commented Jul 16, 2024 •

edited by github-actions bot

Loading

jkuester left a comment

sugat009 commented Jul 19, 2024

jkuester commented Jul 23, 2024

jkuester left a comment

sugat009 commented Jul 23, 2024

jkuester commented Jul 23, 2024

jkuester commented Jul 23, 2024

jkuester left a comment

jkuester Jul 25, 2024

sugat009 Jul 26, 2024 •

edited

Loading

jkuester Jul 26, 2024

jkuester Jul 26, 2024

sugat009 Jul 29, 2024 •

edited

Loading

sugat009 Jul 30, 2024

jkuester Jul 30, 2024

m5r Jul 30, 2024

sugat009 Jul 30, 2024

jkuester Aug 1, 2024


		const docs = await getDocsByPage([personType.contactType], limit, skip);

		return docs.filter((doc): doc is Person.v1.Person => isPerson(settings, doc, doc?._id));

feat(#9237): add functionality of getting people with pagination in cht-datasource #9266

feat(#9237): add functionality of getting people with pagination in cht-datasource #9266

Conversation

sugat009 commented Jul 16, 2024 • edited by github-actions bot Loading

Description

Code review checklist

Compose URLs

License

jkuester left a comment

Choose a reason for hiding this comment

sugat009 commented Jul 19, 2024

jkuester commented Jul 23, 2024

jkuester left a comment

Choose a reason for hiding this comment

sugat009 commented Jul 23, 2024

jkuester commented Jul 23, 2024

jkuester commented Jul 23, 2024

jkuester left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sugat009 Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sugat009 Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sugat009 commented Jul 16, 2024 •

edited by github-actions bot

Loading

sugat009 Jul 26, 2024 •

edited

Loading

sugat009 Jul 29, 2024 •

edited

Loading