Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get only id of result #179

Closed
rkrug opened this issue Oct 18, 2023 · 11 comments
Closed

Get only id of result #179

rkrug opened this issue Oct 18, 2023 · 11 comments

Comments

@rkrug
Copy link

rkrug commented Oct 18, 2023

I am trying to get only the OpenAlex ids of the works resulting from a search. As I sis not find this option in openalexR, I just pasted ?select=id to the query (see https://docs.openalex.org/how-to-use-the-api/get-single-entities/select-fields).

But something strange happens: the number of results changes:

r$> q <- oa_query(search = "transformative change")

r$> oa_request(paste0(q), verbose = TRUE, count_only = TRUE)
$count
[1] 7619132

$db_response_time_ms
[1] 1082

$page
[1] 1

$per_page
[1] 1


r$> oa_request(paste0(q, "?select=id"), verbose = TRUE, count_only = TRUE)
$count
[1] 184443

$db_response_time_ms
[1] 364

$page
[1] 1

$per_page
[1] 1

Am I doing something wrong or misunderstand something, or do I have to get hold of OpenAlex?

@rkrug
Copy link
Author

rkrug commented Oct 19, 2023

OK - solved. The ? should be an '&'

r$> q <- oa_query(search = "transformative change")

r$> oa_request(paste0(q), verbose = TRUE, count_only = TRUE) |> unlist()
              count db_response_time_ms                page            per_page 
            7619126                1467                   1                   1 

r$> oa_request(paste0(q, "&select=id"), verbose = TRUE, count_only = TRUE) |> unlist()
              count db_response_time_ms                page            per_page 
            7619118                1531                   1                   1 

@rkrug rkrug closed this as completed Oct 20, 2023
@trangdata
Copy link
Collaborator

trangdata commented Oct 20, 2023

Hey @rkrug, glad you figured it out! Another way to set select = "id" is to plug it into options. Below, q and q2 are the same (if you use oa_query). Or you can feed it directly in oa_fetch:

library(openalexR)
q <- oa_query(search = "transformative change")
q2 <- oa_query(
  search = "transformative change", 
  options = list(select = "id")
)
identical(paste0(q, "&select=id"), q2)
#> [1] TRUE
oa_request(q2, verbose = TRUE, count_only = TRUE)
#> $count
#> [1] 7619312
#> 
#> $db_response_time_ms
#> [1] 1264
#> 
#> $page
#> [1] 1
#> 
#> $per_page
#> [1] 1

# or
oa_fetch(
  search = "transformative change", 
  options = list(select = "id"),
  verbose = TRUE, 
  count_only = TRUE
)
#> Requesting url: https://api.openalex.org/works?search=transformative%20change&select=id
#>        count db_response_time_ms page per_page
#> [1,] 7619312                1264    1        1

Created on 2023-10-20 with reprex v2.0.2

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023

Thanks @trangdata . Yes - that solved my issue, and also that I have to use the ampersand & instead of the ?.

Although I think the select deserves a spot along the same line as search, as it is not an option as the others are?
But probably I am just biased...

@trangdata
Copy link
Collaborator

Although I think the select deserves a spot along the same line as search, as it is not an option as the others are?

You're probably right. We implemented search very early on, but it should probably be in options as well. We will need a major refactor of the package for this. Other parameters that I think should go in options include per_page and group_by(s). CC'ing @yjunechoe and @massimoaria if you have other thoughts!

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023

I do not think it should be in option - I rather think select should be a top-level argument. options should be be arguments, which are of secondary importance, e.g., as you say, per_page or paging etc. In other words: options is for power users, and normal users do not have to go there.

@trangdata
Copy link
Collaborator

trangdata commented Oct 20, 2023

@rkrug Then I think we need to think carefully about which of these parameters are "of secondary important".

Currently, I'm thinking all of the parameters should be equal and moved to options (with the exceptions of filter). Take this query for example: https://api.openalex.org/works?selector=id (just an example to see what all the parameters are), we get this:

<selector is not a valid parameter. Valid parameters are: cursor, filter, format, group_by, group-by, group_bys, group-bys, mailto, page, per_page, per-page, q, sample, seed, search, select, sort.>

So, which of these should be in options?

Another factor to consider is the number of arguments in oa_fetch. I'm not sure if there is a recommended style somewhere, but I personally don't want oa_fetch to have too many arguments (the call would already be very long with lots of filters). But I could be convinced otherwise.

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023

What about using the approach the grass package is taking. It has a similar problem, that it interfaces with grass (a GIS program) where each command has many different arguments.

So it is a two step process:

  1. get the allowed parameter (in OpenAlex case via a call like you mention above) and parses these to get a list of allowed parameter
  2. use ... and parameters <- list(...) which can than be parsed if the parameter are valid (the names of the arguments) and then processed or passed on to OpenAlex (see https://github.com/rsbivand/rgrass/blob/6611c3d304d91c3c7c918e72696b4bf1c2ce2904/R/xml1.R#L165 for their implementation).

This is not messy, flexible, in the help pages, one can mention the most important and relevant parameters and how they can be used, it is future proof, as unk known parameter can simply (with a warning) passed on to OpenAlex, etc.

I think that would give the best of both worlds.

The parameters of oa_fetch would therefore become:

  entity = if (is.null(identifier)) NULL else id_type(shorten_oaid(identifier[[1]])),
  ...,
  output = c("tibble", "dataframe", "list"),
  abstract = TRUE,
  endpoint = "https://api.openalex.org",
  count_only = FALSE,
  mailto = oa_email(),
  api_key = oa_apikey(),
  verbose = FALSE

and in the help page, mention the arguments which are hidden behind ....

And oa_query could use the same approach, i.e. the handling of the ... would be assigned to oa_query.

@trangdata
Copy link
Collaborator

@rkrug Unfortunately, the ellipses ... are already reserved for different filter parameters. This was a design choice early on to simplify the levels of nesting; the rationale was that that the user would often use doi = ... and similar identifiers in oa_fetch, and making them wrap it inside a potential argument filter would be a little too cumbersome.

Because of this, I chose to put other parameters like select and sort in options = list().

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023

Makes sense.

You know the structures much better than I do - but shouldn't it be possible to use the ellipses for both? Are there any combinations which would lead to collisions? I do not think there are any keywords for the filter in this list of parameter

cursor, filter, format, group_by, group-by, group_bys, group-bys, mailto, page, per_page, per-page, q, sample, seed, search, select, sort

So all named arguments in the ellipsis which are in this list, will be interpreted as parameter, all others will be wrapped in the filter as it is now.

@trangdata
Copy link
Collaborator

Theoretically, yes, we could do this. However, I don't know if it's best practice to try to combine different levels of parameters (filter and higher-level ones) into one .... I have tried to be fancy like this many times in the past and it never came out well...

I do think we need to revise how we're implementing these arguments to the query and reorganize re what should go in options and what should be moved out. I will create a new issue for this.

@rkrug
Copy link
Author

rkrug commented Oct 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants