Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of /rest/info and /rest/releases? #182

Closed
ramiromagno opened this issue May 5, 2021 · 2 comments
Closed

Refactoring of /rest/info and /rest/releases? #182

ramiromagno opened this issue May 5, 2021 · 2 comments

Comments

@ramiromagno
Copy link

ramiromagno commented May 5, 2021

Hi PGS Catalog team,

This is not an issue/bug but more of a question or feature request.

I noticed you had included a few new endpoints, some of which requested by me. Thank you so much, really appreciated!

Regarding the new /rest/info may I leave a few suggestions?

Suggestion 1

You've probably thought about this, but still here are my five cents. I think it would be nice to leave the /rest/info endpoint only with details about the software side of the REST API, and the /rest/release/ endpoint reserved only for data related info.

So this would imply removing this JSON element from the /rest/info response:

"latest_release": {
    "date": "2021-04-28",
    "scores": 761,
    "publications": 167,
    "traits": 204
  },

and add extra fields in the /rest/release response by including also the number of traits, and perhaps the number of sample sets. I understand that in /rest/release you were providing the increments in new entities whereas in /rest/info you are giving the totals. I think it would be nice to stick to increments, as we can always add them together to get the total at a given point in time.

In the R package quincunx, the main table resulting from a request to /rest/release/all looks like:

# A tibble: 27 x 5
   date       n_pgs n_ppm n_pgp notes                                                                                         
   <date>     <int> <int> <int> <chr>                                                                                         
 1 2021-04-28     4    25     7 This release contains 4 new Score(s), 7 new Publication(s) and 25 new Performance metric(s)   
 2 2021-04-07     6    22     5 This release contains 6 new Score(s), 5 new Publication(s) and 22 new Performance metric(s)   
 3 2021-03-22    13   144    11 This release contains 13 new Score(s), 11 new Publication(s) and 144 new Performance metric(s)
 4 2021-02-23    17   118    11 This release contains 17 new Score(s), 11 new Publication(s) and 118 new Performance metric(s)
 5 2021-02-03    58   265     8 This release contains 58 new Score(s), 8 new Publication(s) and 265 new Performance metric(s) 
 6 2021-01-07     6    31     6 This release contains 6 new Score(s), 6 new Publication(s) and 31 new Performance metric(s)   
 7 2020-12-15   306   313     4 This release contains 306 new Score(s), 4 new Publication(s) and 313 new Performance metric(s)
 8 2020-12-08     9    65     6 This release contains 9 new Score(s), 6 new Publication(s) and 65 new Performance metric(s)   
 9 2020-11-20     4    50     5 This release contains 4 new Score(s), 5 new Publication(s) and 50 new Performance metric(s)   
10 2020-11-05     4    19     4 This release contains 4 new Score(s), 4 new Publication(s) and 19 new Performance metric(s)   
# … with 17 more rows

So it would be nice to have in addition, as I said, the n_efo (number of new traits) and n_pss (number of new sample sets) migrated from the response from /rest/info.

Suggestion 2

Also move the citation and terms_of_use to the response from /rest/release. Don't you think it belongs here more than in /rest/info/?

"citation": {
    "title": "The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation",
    "doi": "10.1038/s41588-021-00783-5",
    "PMID": 33692568,
    "authors": "Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye.",
    "journal": "Nature Genetics",
    "year": 2021
  },
  "terms_of_use": "https://www.ebi.ac.uk/about/terms-of-use"

Suggestion 3

It would be nice to provide a few extra endpoints under /rest/info, namely:

  • /rest/info/all for all REST API versions (this would be the most useful of the endpoints here suggested, because, as of now, I can only see that latest changes to the API, and oftentimes it would be nice to review the changelog over a longer period of time; otherwise, I am left with no other alternative than checking the GitHub repository and revise the commit history... which is not very efficient.)
  • /rest/info/{release_date}, analogous to /rest/release/{release_date}
  • /rest/info/{version}, e.g., /rest/info/1.7

So in its final form, the JSON from /rest/info would be simply an array of:

    "date": "2021-04-28",
    "version": 1.7,
    "changelog": [
      "New data 'ancestry_distribution' in the `/rest/score` endpoints, providing information about ancestry distribution on each stage of the PGS",
      "New endpoint `/rest/ancestry_categories` providing the list of ancestry symbols and names."
    ]

Again, all in all, thanks for the terrific work! These are just some ideas, and are not really that important aspects of the REST API.

@ens-lgil
Copy link
Member

ens-lgil commented May 6, 2021

Dear @ramiromagno,

Thank you for your suggestions.
Regarding these suggestions here are our comments:

As general remarks, the aim of the endpoint rest/info is to list diverse "generic" information regarding the PGS Catalog.
This endpoint is a good place to know the current release date and numbers, the current REST API version, the latest PGS paper, etc ...

Suggestion 1

The rest/release endpoints are only here to list the number of new data on each release.
We are hesitant to add the number of new Traits as this particular type of data can change with the ontolgies (merged traits, new traits, obsolete traits).
Furthermore, we think that the total number of entries is more suitable for the rest/info endpoint.

Suggestion 2

As I wrote it above, we view the rest/info as a place to put different type of information regarding the Catalog. Therefore we think that this is a decent place for the citation and the terms_of_use.

Suggestion 3

It makes sense to have a dedicated endpoint listing the different REST API versions and their changelogs.

Note that the changelog is already available in the https://www.pgscatalog.org/rest page (under REST API version changelog).

Although it would make sense to have it in the same way as the rest/release endpoint, we think it will be a bit overkilling to have 3 endpoints that (e.g. all/current/1.7). We believe that there won't be that many new versions in the future (at least less than the number of new releases) and therefore it should be easy to parse and filter.
The idea would be to have a unique endpoint (e.g. rest/api_versions) and a structure like this:

{
    "current": {
        "version": 1.7,
        "date": "2021-04",
        "changelog": [
            "New data 'ancestry_distribution' in the `/rest/score` endpoints, providing information about ancestry distribution on each stage of the PGS",
            "New endpoint `/rest/ancestry_categories` providing the list of ancestry symbols and names."
        ]
    },
    "previous": [
        {
            "version": 1.6,
            "date": "2021-03",
            "changelog": [
                "New endpoint `/rest/info` with data such as the REST API version, latest release date and counts, PGS citation, ...",
                "New endpoint `/rest/cohort/all` returning all the Cohorts and their associated PGS.",
                "New endpoint `/rest/sample_set/all` returning all the Sample Set data."
            ]
        },
        {
            "version": 1.5,
            "date": "2021-02",
            "changelog": [
                "Split the array of the field 'associated_pgs_ids' (from the `/rest/publication/` endpoint) in 2 arrays 'development' and 'evaluation'.",
                "New flag parameter 'include_parents' for the endpoint `/rest/trait/all` to display the traits in the catalog + their parent traits (default: 0)."
            ]
        },
        ...
    ]
}

I hope this makes sense.

Best regards,
Laurent

@ramiromagno
Copy link
Author

Thank you Laurent for taking the time to explain. Makes perfect sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants