Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a find_hits function? #21

Closed
maelle opened this issue May 15, 2017 · 38 comments
Closed

Add a find_hits function? #21

maelle opened this issue May 15, 2017 · 38 comments

Comments

@maelle
Copy link
Member

maelle commented May 15, 2017

As suggested in @njahn82's tweet and following my blog post

My questions regarding adding this function:

  • I think it might be useful to make it more flexible with arguments for type and source (both can't be used at the same time). What is your opinion?

  • In my use case I used it by year and the function was a wrapper to get all results for a given year range. However, is no. of hits / year a sufficient use case? Maybe the function should have a date range and a frequency or a sequence of dates as arguments? 🤔

and if it all becomes too complicated (to imagine all use cases / the main ones), should it rather be an example in a vignette?

@njahn82
Copy link
Member

njahn82 commented May 15, 2017

Thank you! To your first question:

More flexible arguments would be useful. I also think of the synonym search, which is enabled by default by Europe PMC and which might not be feasible for all tasks. Europe PMC has a very detailed query language (e.g. search within sources or article sections), which could be of use to make the the function more flexible

I also agree with your last point: When the function becomes too hard to implement and to maintain, we can write an extra vignette with examples like you did in your blog post.

@maelle
Copy link
Member Author

maelle commented May 15, 2017

Is your preference a vignette or a function in this case? :-)

@njahn82
Copy link
Member

njahn82 commented May 15, 2017

Let's include your function in the package. Based on that we could present more complex queries in a vignette. Here's a first attempt based on your function: A simple query of "Aspirin", as well as two more detailed query of cran.r-project.org* in reference sections. The last example only searches in the PubMed/Medline collection and for review articles.

#' @param query_term Europe PMC search query: see <https://europepmc.org/Help>
#' for details
#' @param synonym logical, use MeSH terminology and the UniProt synonyms list, TRUE by default
#' @param period 1975:2016 by default
#'
find_hits <-
  function(query_term,
           synonym = TRUE,
           period = 1975:2016) {
    years <- period
    results <-
      purrr::map(years,
                 find_hits_by_year,
                 query_term = query_term,
                 synonym = synonym)
    dplyr::bind_rows(results)
  }

find_hits_by_year <- function(year, query_term, synonym) {
  queryforall <-
    paste0('(FIRST_PDATE:[', year, '-01-01+TO+', year, '-12-31])')
  all_hits <-
    as.numeric(europepmc::epmc_profile(queryforall, synonym)$pubType[1, 2])
  
  queryforterm <-
    paste0('(FIRST_PDATE:[',
           year,
           '-01-01+TO+',
           year,
           '-12-31]) AND ',
           query_term)
  
  term_hits <-
    as.numeric(europepmc::epmc_profile(queryforterm, synonym)$pubType[1, 2])
  
  return(tibble::tibble(
    year = year,
    all_hits = all_hits,
    term_hits = term_hits
  ))
}

find_hits('aspirin', period = 2006:2016, synonym = FALSE)
#> # A tibble: 11 × 3
#>     year all_hits term_hits
#>    <int>    <dbl>     <dbl>
#> 1   2006   741228      2463
#> 2   2007   759017      2614
#> 3   2008   794495      3209
#> 4   2009   824702      3608
#> 5   2010   880624      4318
#> 6   2011   932154      4724
#> 7   2012   974162      5341
#> 8   2013  1032377      5974
#> 9   2014  1081101      6441
#> 10  2015  1139797      6685
#> 11  2016  1094684      5532
# link to cran packages in reference lists
find_hits('REF:"cran.r-project.org*"', period = 2006:2016, synonym = FALSE)
#> # A tibble: 11 × 3
#>     year all_hits term_hits
#>    <int>    <dbl>     <dbl>
#> 1   2006   741228         7
#> 2   2007   759017        18
#> 3   2008   794495        45
#> 4   2009   824702        83
#> 5   2010   880624       190
#> 6   2011   932154       300
#> 7   2012   974162       563
#> 8   2013  1032377       915
#> 9   2014  1081101      1423
#> 10  2015  1139797      2039
#> 11  2016  1094684      2206
# more complex with source Medline and publication type review
find_hits('(REF:"cran.r-project.org*") AND (SRC:"MED") AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")', period = 2006:2016, synonym = FALSE)
#> # A tibble: 11 × 3
#>     year all_hits term_hits
#>    <int>    <dbl>     <dbl>
#> 1   2006   741228         1
#> 2   2007   759017         2
#> 3   2008   794495         1
#> 4   2009   824702         2
#> 5   2010   880624        11
#> 6   2011   932154         9
#> 7   2012   974162        25
#> 8   2013  1032377        25
#> 9   2014  1081101        39
#> 10  2015  1139797        36
#> 11  2016  1094684        24

@maelle
Copy link
Member Author

maelle commented May 15, 2017

Ok, do you want me to do a PR with this version? I'll add a test + remove the purrr dependency.

@njahn82
Copy link
Member

njahn82 commented May 15, 2017

Yes, please send a PR with this version!

@njahn82
Copy link
Member

njahn82 commented May 15, 2017

(There is currently an issue with the resulttype idlist, which causes tests to fail. I already alarmed the Europe PMC team)

@maelle
Copy link
Member Author

maelle commented May 15, 2017

I wasn't too sure how to name the function properly 🤔

@maelle
Copy link
Member Author

maelle commented May 15, 2017

Also absolutely no problem if you want to remove the ref to my own blog 😁

@njahn82
Copy link
Member

njahn82 commented May 15, 2017

I both like the name and the link to your blog. Wonder if you want to add yourself as author, and if tibble::tibble could be replaced with dplyr::as_data_frame

@maelle
Copy link
Member Author

maelle commented May 15, 2017

true I figured out tibble was a dependency of dplyr but this will make the list shorter, I'll do that! and yep I'll add myself as ctb, thanks. :-)

@njahn82
Copy link
Member

njahn82 commented May 15, 2017

Great!

@maelle
Copy link
Member Author

maelle commented May 15, 2017

Added the modifications. I have a different Roxygen version so all .Rd were modified, they don't have an empty line as final line any more, sorry about that.

njahn82 added a commit that referenced this issue May 15, 2017
adds function for finding hits trends cf #21
@njahn82
Copy link
Member

njahn82 commented May 15, 2017

Was able to merge it without lots of pain, many thanks again. Will start a vignette about #evergreenreviewgraph this afternoon while on a train. Please feel free to contribute!

@maelle
Copy link
Member Author

maelle commented May 15, 2017

Cool! And now that this package will have 2 vignettes, why not start using pkgdown?

njahn82 added a commit that referenced this issue May 15, 2017
@maelle
Copy link
Member Author

maelle commented May 17, 2017

Just had a look at the beginning of the vignette, looks promising, it will give super useful Insight! And by that I don't mean the link to my blog post!

njahn82 added a commit that referenced this issue May 18, 2017
njahn82 added a commit that referenced this issue May 18, 2017
njahn82 added a commit that referenced this issue May 18, 2017
njahn82 added a commit that referenced this issue May 19, 2017
njahn82 added a commit that referenced this issue Jun 7, 2017
@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

A few comments about the vignette (sorry I'm too lazy for a PR today):

  • Great work! I like the use cases. 😁 Especially the second one!

  • The "#" disappears from the title?

  • I've noticed that some non native speakers don't know what evergreen means. Could you add a sentence like e.g. "Note that evergreen is applied to e.g. blog topics that don't depend on the news and can always be run. " to explain what it is?

  • "Be careful with the interpretation of the slower growth in the last years because Europe PMC also contains journals, which make their articles openly available after an embargo period that can be up to two years." Could you be more explicit about the influence of these journals on the growth?

  • "four general purpose hosting services for version-controlled code" maybe make the links clickable?

  • " in the acknowledgement" -> " in the acknowledgements"

  • "represented in review graph" -> "represented in the review graph"

  • There's no general conclusion? 😉

But again really nice vignette, hence this issue to make it more visible!

@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

Wow, so helpful, many thanks! Will add it to the vignette and will try to start with blogdown!

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Awesome! I can help with blogdown if needed. Do you have admin rights to the repo of this package? You'll need that to change the source of Github pages to master/docs.

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Note: I can help with making the blogdown website but I can't give you admin access if you haven't got it yet, I don't have these rights. 😁

@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

I don't find the "Settings" tab, so I guess not.

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Pinging @sckott , could you please give @njahn82 access rights to the Settings tab of this repo? 🙏 😸

@sckott
Copy link

sckott commented Jun 7, 2017

yep!

@sckott
Copy link

sckott commented Jun 7, 2017

try it again

@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

Thanks, I got it!

@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

Cannot change to master/docsfolder, guess it is because it does not exist yet.

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Yes I think you need to pkgdown::init() and pkgdown::build_site() first.

@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

Oh no, got a pandoc document conversion failed with error 67, running pkgdown::build_site()

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Are you an Ubuntu user? I saw this error in ropensci/osmdata#64

I can build the website for you tomorrow on Windows 😎 but it'd be great to understand this Pandoc thing. Are you on the rOpenSci slack?

@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

No, a mac user, and unfortunately also not on the rOpenSci slack. Would be great if you could try to build the website tomorrow.

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

I will!

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Maybe try the self_contained thing as in ropensci/osmdata@ea0676e ?

njahn82 added a commit that referenced this issue Jun 7, 2017
@njahn82
Copy link
Member

njahn82 commented Jun 7, 2017

Wohoo 😃 using your trick did it! I pushed it, so it should be visible soon!

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

well that's @mpadge's trick, thank him! 😉

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

and awesome!

@maelle
Copy link
Member Author

maelle commented Jun 7, 2017

Once the website is up I'd recommend putting the link in the repo description :-)

@njahn82 njahn82 mentioned this issue Jun 7, 2017
njahn82 added a commit that referenced this issue Jun 8, 2017
njahn82 added a commit that referenced this issue Jun 8, 2017
@maelle
Copy link
Member Author

maelle commented Jun 8, 2017

@njahn82 when the vignette is finalized you might want to tell it to @sckott so that it might be included in the rOpenSci newsletter that goes out once every 2 weeks.

@njahn82
Copy link
Member

njahn82 commented Jun 8, 2017

Will do!

@njahn82
Copy link
Member

njahn82 commented May 27, 2020

Forgot to close this issue, thanks again for this great functionality.

@njahn82 njahn82 closed this as completed May 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants