Get only id of result #179

rkrug · 2023-10-18T15:34:10Z

I am trying to get only the OpenAlex ids of the works resulting from a search. As I sis not find this option in openalexR, I just pasted ?select=id to the query (see https://docs.openalex.org/how-to-use-the-api/get-single-entities/select-fields).

But something strange happens: the number of results changes:

r$> q <- oa_query(search = "transformative change")

r$> oa_request(paste0(q), verbose = TRUE, count_only = TRUE)
$count
[1] 7619132

$db_response_time_ms
[1] 1082

$page
[1] 1

$per_page
[1] 1


r$> oa_request(paste0(q, "?select=id"), verbose = TRUE, count_only = TRUE)
$count
[1] 184443

$db_response_time_ms
[1] 364

$page
[1] 1

$per_page
[1] 1

Am I doing something wrong or misunderstand something, or do I have to get hold of OpenAlex?

The text was updated successfully, but these errors were encountered:

rkrug · 2023-10-19T13:31:32Z

OK - solved. The ? should be an '&'

r$> q <- oa_query(search = "transformative change")

r$> oa_request(paste0(q), verbose = TRUE, count_only = TRUE) |> unlist()
              count db_response_time_ms                page            per_page 
            7619126                1467                   1                   1 

r$> oa_request(paste0(q, "&select=id"), verbose = TRUE, count_only = TRUE) |> unlist()
              count db_response_time_ms                page            per_page 
            7619118                1531                   1                   1

trangdata · 2023-10-20T10:24:26Z

Hey @rkrug, glad you figured it out! Another way to set select = "id" is to plug it into options. Below, q and q2 are the same (if you use oa_query). Or you can feed it directly in oa_fetch:

library(openalexR)
q <- oa_query(search = "transformative change")
q2 <- oa_query(
  search = "transformative change", 
  options = list(select = "id")
)
identical(paste0(q, "&select=id"), q2)
#> [1] TRUE
oa_request(q2, verbose = TRUE, count_only = TRUE)
#> $count
#> [1] 7619312
#> 
#> $db_response_time_ms
#> [1] 1264
#> 
#> $page
#> [1] 1
#> 
#> $per_page
#> [1] 1

# or
oa_fetch(
  search = "transformative change", 
  options = list(select = "id"),
  verbose = TRUE, 
  count_only = TRUE
)
#> Requesting url: https://api.openalex.org/works?search=transformative%20change&select=id
#>        count db_response_time_ms page per_page
#> [1,] 7619312                1264    1        1

^{Created on 2023-10-20 with reprex v2.0.2}

rkrug · 2023-10-20T10:57:34Z

Thanks @trangdata . Yes - that solved my issue, and also that I have to use the ampersand & instead of the ?.

Although I think the select deserves a spot along the same line as search, as it is not an option as the others are?
But probably I am just biased...

trangdata · 2023-10-20T11:52:21Z

Although I think the select deserves a spot along the same line as search, as it is not an option as the others are?

You're probably right. We implemented search very early on, but it should probably be in options as well. We will need a major refactor of the package for this. Other parameters that I think should go in options include per_page and group_by(s). CC'ing @yjunechoe and @massimoaria if you have other thoughts!

rkrug · 2023-10-20T12:41:55Z

I do not think it should be in option - I rather think select should be a top-level argument. options should be be arguments, which are of secondary importance, e.g., as you say, per_page or paging etc. In other words: options is for power users, and normal users do not have to go there.

trangdata · 2023-10-20T13:47:45Z

@rkrug Then I think we need to think carefully about which of these parameters are "of secondary important".

Currently, I'm thinking all of the parameters should be equal and moved to options (with the exceptions of filter). Take this query for example: https://api.openalex.org/works?selector=id (just an example to see what all the parameters are), we get this:

<selector is not a valid parameter. Valid parameters are: cursor, filter, format, group_by, group-by, group_bys, group-bys, mailto, page, per_page, per-page, q, sample, seed, search, select, sort.>

So, which of these should be in options?

Another factor to consider is the number of arguments in oa_fetch. I'm not sure if there is a recommended style somewhere, but I personally don't want oa_fetch to have too many arguments (the call would already be very long with lots of filters). But I could be convinced otherwise.

rkrug · 2023-10-20T13:56:36Z

What about using the approach the grass package is taking. It has a similar problem, that it interfaces with grass (a GIS program) where each command has many different arguments.

So it is a two step process:

get the allowed parameter (in OpenAlex case via a call like you mention above) and parses these to get a list of allowed parameter
use ... and parameters <- list(...) which can than be parsed if the parameter are valid (the names of the arguments) and then processed or passed on to OpenAlex (see https://github.com/rsbivand/rgrass/blob/6611c3d304d91c3c7c918e72696b4bf1c2ce2904/R/xml1.R#L165 for their implementation).

This is not messy, flexible, in the help pages, one can mention the most important and relevant parameters and how they can be used, it is future proof, as unk known parameter can simply (with a warning) passed on to OpenAlex, etc.

I think that would give the best of both worlds.

The parameters of oa_fetch would therefore become:

  entity = if (is.null(identifier)) NULL else id_type(shorten_oaid(identifier[[1]])),
  ...,
  output = c("tibble", "dataframe", "list"),
  abstract = TRUE,
  endpoint = "https://api.openalex.org",
  count_only = FALSE,
  mailto = oa_email(),
  api_key = oa_apikey(),
  verbose = FALSE

and in the help page, mention the arguments which are hidden behind ....

And oa_query could use the same approach, i.e. the handling of the ... would be assigned to oa_query.

trangdata · 2023-10-20T14:55:08Z

@rkrug Unfortunately, the ellipses ... are already reserved for different filter parameters. This was a design choice early on to simplify the levels of nesting; the rationale was that that the user would often use doi = ... and similar identifiers in oa_fetch, and making them wrap it inside a potential argument filter would be a little too cumbersome.

openalexR/R/oa_fetch.R

Line 62 in 5134044

...,

Because of this, I chose to put other parameters like select and sort in options = list().

rkrug · 2023-10-20T15:07:48Z

Makes sense.

You know the structures much better than I do - but shouldn't it be possible to use the ellipses for both? Are there any combinations which would lead to collisions? I do not think there are any keywords for the filter in this list of parameter

cursor, filter, format, group_by, group-by, group_bys, group-bys, mailto, page, per_page, per-page, q, sample, seed, search, select, sort

So all named arguments in the ellipsis which are in this list, will be interpreted as parameter, all others will be wrapped in the filter as it is now.

trangdata · 2023-10-20T16:31:31Z

Theoretically, yes, we could do this. However, I don't know if it's best practice to try to combine different levels of parameters (filter and higher-level ones) into one .... I have tried to be fancy like this many times in the past and it never came out well...

I do think we need to revise how we're implementing these arguments to the query and reorganize re what should go in options and what should be moved out. I will create a new issue for this.

rkrug · 2023-10-20T17:03:28Z

Thus us definitely the best approach for this. Sent from my iPhoneOn 20 Oct 2023, at 18:31, Trang Le ***@***.***> wrote: Theoretically, yes, we could do this. However, I don't know if it's best practice to try to combine different levels of parameters (filter and higher-level ones) into one .... I have tried to be fancy like this many times in the past and it never came out well... I do think we need to revise how we're implementing these arguments to the query and reorganize re what should go in options and what should be moved out. I will create a new issue for this. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

rkrug closed this as completed Oct 20, 2023

trangdata mentioned this issue Oct 20, 2023

Refactor oa_fetch parameters #182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get only id of result #179

Get only id of result #179

rkrug commented Oct 18, 2023

rkrug commented Oct 19, 2023 •

edited

trangdata commented Oct 20, 2023 •

edited

rkrug commented Oct 20, 2023

trangdata commented Oct 20, 2023

rkrug commented Oct 20, 2023

trangdata commented Oct 20, 2023 •

edited

rkrug commented Oct 20, 2023 •

edited

trangdata commented Oct 20, 2023

rkrug commented Oct 20, 2023

trangdata commented Oct 20, 2023

rkrug commented Oct 20, 2023 via email

Get only id of result #179

Get only id of result #179

Comments

rkrug commented Oct 18, 2023

rkrug commented Oct 19, 2023 • edited

trangdata commented Oct 20, 2023 • edited

rkrug commented Oct 20, 2023

trangdata commented Oct 20, 2023

rkrug commented Oct 20, 2023

trangdata commented Oct 20, 2023 • edited

rkrug commented Oct 20, 2023 • edited

trangdata commented Oct 20, 2023

rkrug commented Oct 20, 2023

trangdata commented Oct 20, 2023

rkrug commented Oct 20, 2023 via email

rkrug commented Oct 19, 2023 •

edited

trangdata commented Oct 20, 2023 •

edited

trangdata commented Oct 20, 2023 •

edited

rkrug commented Oct 20, 2023 •

edited