New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oec package #217

Open
pachamaltese opened this Issue May 14, 2018 · 92 comments

Comments

Projects
None yet
5 participants
@pachamaltese

pachamaltese commented May 14, 2018

Summary

  • What does this package do? (explain in 50 words or less):

oec provides an easy way to obtain data from the Observatory of Economic Complexity by accessing its API.

  • Paste the full DESCRIPTION file inside a code block below:
Package: oec
Type: Package
Title: Observatory of Economic Complexity API Wrapper and Utility Program
Version: 2.8.2
Date: 2018-06-09
Authors@R: c(
  person("Mauricio", "Vargas S.", role = c("aut", "cre", "cph"), email = "oec@media.mit.edu"),
  person("Manuel", "Aristaran", role = c("ctb")),
  person("Pablo", "Paladino", role = c("ctb")),
  person("Gabriela", "Perez", role = c("ctb")),
  person(family = "UN Comtrade", role = c("dtc")),
  person(family = "MIT Media Lab", role = c("dtc")),
  person(family = "Datawheel", role = c("fnd"))
  )
URL: https://CRAN.R-project.org/package=oec
BugReports: https://github.com/pachamaltese/oec-r/issues
Description: Access The Observatory of Economic Complexity API from R to download international trade data.
License: MIT + file LICENSE
LazyData: TRUE
Depends:
  R (>= 3.2)
Imports:
  rlang,
  magrittr,
  dplyr,
  stringr,
  curl,
  purrr,
  jsonlite
RoxygenNote: 6.0.1
Suggests: 
    knitr,
    rmarkdown,
    testthat,
    covr
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/pachamaltese/oec-r-package

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

databases, because the package connects to an API and does 3 or more API calls to simplify things for the final user who wants imports/exports and some metrics such as % of increase/decrease.

  •   Who is the target audience and what are scientific applications of this package?  

Non-expert users that use international trade data. This can also be targeted to intermediate/advanced users who can benefit from the speed and short syntax that this package provides.

Not at the moment (in 2 yrs this is the only one)

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

No.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • [] Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

None.

  • If this is a resubmission following rejection, please explain the change in circumstances:

No.

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

I don't really know.

@maelle

This comment has been minimized.

Member

maelle commented May 15, 2018

@pachamaltese Thanks for your submission! From a quick look you checked "contains a vignette with examples of its essential functions and uses." and "has a test suite." but I do not see neither a vignette nor a test suite in the GitHub repo? We can put this submission on hold while you add them, and we can provide links about e.g. unit testing if necessary.

@maelle

This comment has been minimized.

Member

maelle commented May 15, 2018

Also, could you please expand here on the added value of the data processing offered by your package as opposed to using the original data sources?

@pachamaltese

This comment has been minimized.

pachamaltese commented May 15, 2018

@maelle thanks, no worries, let me check my laptop hdd vs my external drive, I probably forgot to move something

About the added value if I run getdata_batch("chl", "chn", 2010, 2015) I will obtain a tibble with 6 years of data for chilean exports to china but with these additions:

  • country codes corrected and written as ISO-3
  • full product names
  • full country names
  • trade balance
  • economic complexity measures and ranks
  • top importers and exporters
  • a color column to use the same palette as atlas.media.mit.edu

On the other hand, with API calls I would need to read https://atlas.media.mit.edu/hs92/export/2010/chl/chn/show/
https://atlas.media.mit.edu/hs92/export/2011/chl/chn/show/
... etc ...
https://atlas.media.mit.edu/hs92/export/2015/chl/chn/show/

and also to read this
https://atlas.media.mit.edu/hs92/export/2010/chl/all/show/
https://atlas.media.mit.edu/hs92/export/2011/chl/all/show/
... etc ...
https://atlas.media.mit.edu/hs92/export/2015/chl/all/show/

and also this
https://atlas.media.mit.edu/hs92/export/2010/all/all/show/
https://atlas.media.mit.edu/hs92/export/2011/all/all/show/
... etc ...
https://atlas.media.mit.edu/hs92/export/2015/all/all/show/

After reading those 15 json files the user would need some additional operations to see the same result as getdata_batch("chl", "chn", 2010, 2015):

  • a left_join to match API product code (e.g. 2709) to product name (e.g. petroleum), a mayor drawback here is that the code-name table can't be accessed by using the API
  • another left_join to match country codes (e.g. sachl) to country names (e.g. Chile) with the big problem that the API returns non-ISO codes (e.g. sachl vs chl), again the user would need a code-name table and would need to crop the strings
  • join everything and finally use purrr to create a unique tibble
@pachamaltese

This comment has been minimized.

pachamaltese commented May 19, 2018

Hi @maelle.
I did put all the pieces together: external disks, google drive, etc. I have pushed a new version with lots of changes as well as a testing unit.
Have a good weekend!

@maelle

This comment has been minimized.

Member

maelle commented May 24, 2018

Hi @pachamaltese! Thanks.

I've been traveling so my initial checks will take a bit longer than expected.

I am not sure I understood how your package got scattered over hard drives &co but please ask us any git/GitHub question you might have. See also http://happygitwithr.com

@pachamaltese

This comment has been minimized.

pachamaltese commented May 24, 2018

thanks a lot @maelle

happy travelling!!

I have +5 external hdd since my macbook fried itself and I lost tons of tidy useful data :(
I'm still recovering from that accident ;)

@maelle

This comment has been minimized.

Member

maelle commented May 24, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks again for your submission @pachamaltese!

Here are a some points that should be solved/discussed before I start looking for reviewers. Note that now that you have everything on GitHub, you shouldn't loose pieces again, which is good news. Sorry to hear about your Macbook! 😢

This process is an ongoing discussion so ask any question/make any remark you might have about my comments!

Code

  • You use assign in functions instead of returning output. Please return output, you don't need to use assign here.

  • Why do you use rm inside a function?

  • There is a separate file containing imports (confusingly called exports 😉). It'd be easier to see where functions are imported from if you used the syntax jsonlite::fromJSON. You could then remove the separate R code with imports. Cf http://r-pkgs.had.co.nz/namespace.html#imports

  • Your package depends on purrr for a single function. Consider replacing it with do.call and rbind.

  • Regarding all the stopifnot, have informative error messages when the user entered a wrong parameter value.

  • You could have a single exported function instead of two if I follow correctly? It'd take arguments like the batch one, that could be vector of length 1 (the user would input say one or several country pairs). The other function would be an utility function, that wouldn't need to be exported. It might simplify the interface?

  • If there is only one function call in a pipeline it might be clearer to remove the pipe.

  origin_destination <- origin_destination %>%
    mutate(trade_exchange_val = 
!!sym("export_val") + !!sym("import_val"))

would become

  origin_destination <-  mutate(origin_destination,
    trade_exchange_val = !!sym("export_val") + !!sym("import_val"))
  • In case you might ignore it, and in case it could be useful, there's a countrycode package.

Data

  • Where do the data files in data come from? Why are they pre-prepared, are you sure they won't be updated?

  • You could keep their data processing in data-raw.

Documentation

  • Please add the URL to the data source.

  • Are you allowed to use the logo in your README?

  • A flowchart of what processing your package does might be useful to explain the added value of using the package and to explain what has been preprocessed or not (what's being downloaded vs what's in data/).

  • In the README when mentioning the vignette have a link to the pkgdown website.

  • Please add @return fields to the documentation of the functions.

  • A vignette with an actual use case, and visualizations, would be nice (or an extension of the existing vignette).

  • Please explain what the data is e.g. SITC or add an external link.

  • Because some chunks of the vignette are unevaluated without a pre-generated output, I can't understand what the "problem" is. You could pre-generate the output.

Misc

  • Could you rename the repo "oec" like the package?

  • The "Maintainer" field in DESCRIPTION is useless because it'll be generated automatically from Authors.

  • What are the files .Rapp.history in data/ and .build.timestamp in vignettes/?

  • Please add a code coverage integration and badge (via e.g. usethis::usethis::use_coverage())

goodpractice output

omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
    often. A build date will be added to the package when you perform `R CMD build` on
    it.
  ✖ avoid long code lines, it is bad for readability. Also, many people
    prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R\getdata.R:56:1
    R\getdata_batch.R:44:1
    R\getdata_batch.R:68:1
    tests\testthat\test-oec.R:3:1
    tests\testthat\test-oec.R:15:1
    ... and 10 more linesavoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on
    the right hand side is zero. Use seq_len() or seq_along() instead.

    R\getdata_batch.R:66:13

Reviewers: @aedobbyn @cimentadaj
Due date: 2018-07-04

@maelle

This comment has been minimized.

Member

maelle commented Jun 7, 2018

@pachamaltese 👋 any question or update? 😺

@pachamaltese

This comment has been minimized.

pachamaltese commented Jun 7, 2018

@maelle not yet, I've been moving a lot. Can I send it tomorrow at night?

@maelle

This comment has been minimized.

Member

maelle commented Jun 7, 2018

If you think you won't have enough time to work on the package in the next weeks, we can set this submission on hold until you have more time.

@pachamaltese

This comment has been minimized.

pachamaltese commented Jun 7, 2018

tomorrow's night is ok, I have to complete the vignettes and that takes 1-2 hours :)

@pachamaltese

This comment has been minimized.

pachamaltese commented Jun 9, 2018

@maelle it's ok now, I did all the steps from your feedback and just a few are explained why yes/no to it is better for this particular API.

@maelle

This comment has been minimized.

Member

maelle commented Jun 9, 2018

Thanks, can you please answer with the explanations in this thread? 🙂

@pachamaltese

This comment has been minimized.

pachamaltese commented Jun 9, 2018

Maelle' s initial feedback

Code

  • You use assign in functions instead of returning output. Please return output, you don't need to use assign here.
  • Why do you use rm inside a function? fixed, it was a bad try to make lighter function calls
  • There is a separate file containing imports (confusingly called exports wink). It'd be easier to see where functions are imported from if you used the syntax jsonlite::fromJSON. You could then remove the separate R code with imports. http://r-pkgs.had.co.nz/namespace.html#imports
  • Your package depends on purrr for a single function. Consider replacing it with do.call and rbind. To (over)simplify now I also use purrr's flatten df function
  • Regarding all the stopifnot, have informative error messages when the user entered a wrong parameter value.
  • You could have a single exported function instead of two if I follow correctly? It'd take arguments like the batch one, that could be vector of length 1 (the user would input say one or several country pairs). The other function would be an utility function, that wouldn't need to be exported. It might simplify the interface?
  • If there is only one function call in a pipeline it might be clearer to remove the pipe.
origin_destination <- origin_destination %>%
    mutate(trade_exchange_val = 
!!sym("export_val") + !!sym("import_val"))

would become

origin_destination <-  mutate(origin_destination,
    trade_exchange_val = !!sym("export_val") + !!sym("import_val"))
  • In case you might ignore it, and in case it could be useful, there's a countrycode package. the OEC uses an ISO modification, so I'm not so sure about it

Data

  • Where do the data files in data come from? Why are they pre-prepared, are you sure they won't be updated? everything in data/ comes from atlas.media.mit.edu and corresponds to unaltered trade codes (i.e HS96 was created not to touch HS92 created in 1992, HS02 was an update to HS96, etc so
    these tables do not change
  • You could keep their data processing in data-raw. ok, I'll add a vignette to explain the changes

Documentation

  • Please add the URL to the data source.
  • Are you allowed to use the logo in your README? yes, I' m a part of the original team
  • A flowchart of what processing your package does might be useful to explain the added value of using the package and to explain what has been preprocessed or not (what's being downloaded vs what's in data/).
  • In the README when mentioning the vignette have a link to the pkgdown website.
  • Please add @return fields to the documentation of the functions.
  • A vignette with an actual use case, and visualizations, would be nice (or an extension of the existing vignette).
  • Please explain what the data is e.g. SITC or add an external link.
  • Because some chunks of the vignette are unevaluated without a pre-generated output, I can't understand what the "problem" is. You could pre-generate the output.

Misc

  • Could you rename the repo "oec" like the package? travis also renamed
  • The "Maintainer" field in DESCRIPTION is useless because it'll be generated automatically from Authors.
  • What are the files .Rapp.history in data/ and .build.timestamp in vignettes/? temp files from R CMD, ok now
  • Please add a code coverage integration and badge (via e.g. usethis::usethis::use_coverage())
@maelle

This comment has been minimized.

Member

maelle commented Jun 10, 2018

Thanks a lot @pachamaltese! 😺 I started looking and two small comments already:

  • The URL to the pkgdown site should be updated now that the repo has a new name.

  • You don't export the batch function any more but it's still present in the website docs at least cf http://pacha.hk/oec/reference/index.html

@maelle

This comment has been minimized.

Member

maelle commented Jun 10, 2018

The link to the logo is also broken at the moment.

@maelle

This comment has been minimized.

Member

maelle commented Jun 10, 2018

Thanks again @pachamaltese!

A few more points before I start looking for reviewers

  • The URL in the repo description should be updated.

  • The documentation website has to be updated (it still documents the function that was removed)

  • The logo doesn't appear any more in the README.

  • You can now remove my feedback from the README 😉 and have a more prominent link to the pkgdown website in it.

  • You can now add a peer-review badge to your README

[![](https://badges.ropensci.org/217_status.svg)](https://github.com/ropensci/onboarding/issues/217)

It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.

  • Please add a code of conduct to your repo, e.g. via usethis::use_code_of_conduct()

  • Rbuildignore data-raw e.g. via usethis::use_build_ignore("data-raw")

  • The codecov badge has a wrong URL.

  • goodpractice output

It is good practice towrite unit tests for all functions, and all package code in
    general. 89% of code lines are covered by test cases.

    R/getdata.R:44:NA
    R/getdata.R:47:NA
    R/getdata.R:50:NA
    R/getdata.R:55:NA
    R/getdata.R:59:NA
    ... and 13 more linesomit "Date" in DESCRIPTION. It is not required and it gets
    invalid quite often. A build date will be added to the package when you
    perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also, many
    people prefer editor windows that are about 80 characters wide. Try make
    your lines shorter than 80 characters

    R\getdata.R:47:1
    R\getdata.R:50:1
    R\getdata.R:59:1
    R\getdata.R:64:1
    R\getdata.R:69:1
    ... and 5 more linesavoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the
    expression on the right hand side is zero. Use seq_len() or seq_along()
    instead.

    R\getdata.R:111:32
    R\getdata.R:174:26
    R\getdata.R:200:25

I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should

  • test that the function returns errors when it should

  • test more use cases (e.g. when destination == "all")

Thanks for all your work!

@pachamaltese

This comment has been minimized.

pachamaltese commented Jun 10, 2018

@maelle

This comment has been minimized.

Member

maelle commented Jun 10, 2018

You're welcome, no hurry and no problem!

@maelle

This comment has been minimized.

Member

maelle commented Jun 10, 2018

Note that only one vignette appeared on the pkgdown website last time I looked

@pachamaltese

This comment has been minimized.

pachamaltese commented Jun 10, 2018

@maelle ready, I used Jim Hester's lintr package for this 2nd round of feedback. Thanks a lot for the useful comments !!

@maelle

This comment has been minimized.

Member

maelle commented Jun 11, 2018

Many thanks, I'll now start looking for reviewers!

@maelle

This comment has been minimized.

Member

maelle commented Jun 13, 2018

Thanks @aedobbyn and @cimentadaj for accepting to review this package! 😸

Your reviews are due on 2018-07-04. As a reminder here are links to our bookdown in particular the packaging guidelines and reviewer's guide and to the review template.

@pachamaltese

This comment has been minimized.

pachamaltese commented Aug 20, 2018

@maelle finally I implemented webmockr and I also used vcr and crul, so the functions changed a bit :)

@maelle

This comment has been minimized.

Member

maelle commented Aug 22, 2018

@pachamaltese cool, congrats! So you're done with all comments?

@pachamaltese

This comment has been minimized.

pachamaltese commented Aug 22, 2018

@maelle

This comment has been minimized.

Member

maelle commented Aug 22, 2018

@aedobbyn @cimentadaj can you please have a last look at the package? Thanks!

@cimentadaj

This comment has been minimized.

cimentadaj commented Aug 23, 2018

Will do this weekend!

@aedobbyn

This comment has been minimized.

aedobbyn commented Aug 24, 2018

Looking good @pachamaltese!

Here are a few notes. I hope to be able to to look through more thoroughly this weekend but in case I'm not able to I wanted to give you something. By topic, in no particular order:

get_data

  • In get_data, the warnings No encoding supplied: defaulting to UTF-8. occur during the parse() call

    • This warning isn't very useful to the user (since they can't do anything about it) and there could potentially be a lot of them since you have one call per year. I'd either set the encoding explicitly using parse(encoding = "UTF-8") or suppress the message
  • In read_from_api you have an attempts_left argument but there's no way for the user to change this. Not required but you could think about including this as a user-defined variable in get_data

  • This is super nitpicky but the processing data message is a little misleading because it seems like a progress bar when really it messages all the years upfront. That means that if there were a problem with a certain year these messages wouldn't help you pin down which year failed. Maybe something like sprintf("\nProcessing %s to %s data...", range(years)[1], range(years)[2]) would be better. Otherwise, you could include the message of which year we're on in the read_from_api function.

get_countrycode

  • The error message for country code that doesn't exist could be moe informative than length(countrycode) != 0 is not TRUE

  • You might want to include a mapping from "us" to "united states", "uk" to "united kingdom", and other similar situations since I can see people trying get_countrycode("us") and being surprised when they don't get a US result

Other

  • In the devtools install line, it should be devtools::install_github("pachamaltese/oec")

  • You shouldn't need to prefix package functions with oec:: (e.g. https://github.com/pachamaltese/oec/tree/64b185675e9faf1318f6cae10d16b75f97607423/R/get_data.R#L87), I think

  • I'm not sure what the correct protocol for the fixtures dir . Looks like you use it in tests/testthat/helper-oec.R to get vcr configurations. The presence of the extra directory does give you a NOTE on devtools::check(), so I think it should at least be buildignored

  • Should update the vignette titles (line 7 for both). On devtools::release_checks(): WARNING: placeholder 'Vignette Title' detected in 'title' field and/or 'VignetteIndexEntry' for: oec-data.Rmd,oec.Rmd

  • May want to update the year on your LICENSE and in the README to 2018

  • What was the rationale for keeping the tidyeval instead of removing it and using a globals.R file?

This is looking really solid overall -- will be excited to use it very soon!

@pachamaltese

This comment has been minimized.

pachamaltese commented Aug 24, 2018

WHOA !! thanks a lot @aedobbyn !!!!!!!!!
I'm making changes this weekend. For now, the rationale for keeping the tidyeval was just to mimic some design decisions I like, for example those in the highcharter package by @jbkunst.

@aedobbyn

This comment has been minimized.

aedobbyn commented Aug 24, 2018

No problem, glad it's helpful! And makes sense -- it doesn't affect the functionality of course and assuming rlang stays backward-compatible (which I think they will?) you should be fine if they change the tidyeval syntax.

@pachamaltese

This comment has been minimized.

pachamaltese commented Aug 24, 2018

get_data

  • In get_data, the warnings No encoding supplied: defaulting to UTF-8. occur during the parse() call. This warning isn't very useful to the user (since they can't do anything about it) and there could potentially be a lot of them since you have one call per year. I'd either set the encoding explicitly using parse(encoding = "UTF-8") or suppress the message
  • In read_from_api you have an attempts_left argument but there's no way for the user to change this. Not required but you could think about including this as a user-defined variable in get_data
  • This is super nitpicky but the processing data message is a little misleading because it seems like a progress bar when really it messages all the years upfront. That means that if there were a problem with a certain year these messages wouldn't help you pin down which year failed. Maybe something like sprintf("\nProcessing %s to %s data...", range(years)[1], range(years)[2]) would be better. Otherwise, you could include the message of which year we're on in the read_from_api function.

get_countrycode

  • The error message for country code that doesn't exist could be moe informative than length(countrycode) != 0 is not TRUE
  • You might want to include a mapping from "us" to "united states", "uk" to "united kingdom", and other similar situations since I can see people trying get_countrycode("us") and being surprised when they don't get a US result

Other

  • In the devtools install line, it should be devtools::install_github("pachamaltese/oec")
  • You shouldn't need to prefix package functions with oec:: (e.g. https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L87), I think
  • I'm not sure what the correct protocol for the fixtures dir . Looks like you use it in tests/testthat/helper-oec.R to get vcr configurations. The presence of the extra directory does give you a NOTE on devtools::check(), so I think it should at least be buildignored
  • Should update the vignette titles (line 7 for both). On devtools::release_checks(): WARNING: placeholder 'Vignette Title' detected in 'title' field and/or 'VignetteIndexEntry' for: oec-data.Rmd,oec.Rmd
  • May want to update the year on your LICENSE and in the README to 2018
  • What was the rationale for keeping the tidyeval instead of removing it and using a globals.R file?
@aedobbyn

This comment has been minimized.

aedobbyn commented Aug 26, 2018

Thanks for implementing those changes so quickly! Last bit of suggestions from me, I think:

  • In your get_data @return documentation I would point people to your handy reference about what each of the columns means since a lot of these are non-obvious.

  • If you feel it's worth it (though I don't think strictly necessary), you could refactor out some of the redundancies in the get_data function. For instance, you use the same rename + mutate flow in each extraction, so this could be pulled out into a utility function. Similarly, the checks at the beginning of the function making sure that the years specified are in bounds for the classification could be stored in a lookup table. This way you only need one of these in get_data instead of five.

  • Right now you only allow years up to 2016, but in the coming years will the OEC have data past that year? Instead of updating this by hand is there a way to check what the greatest year available is, maybe on package load and store that as a variable?

  • I think in all cases you can replace single &s with &&s and |s with ||s

  • Small typo here ("lower of equal to")

  • Unnecessary line, I think

  • Future improvement for get_productcode would be a word_boundary boolean param for an exact word match. That would take care of the first result for get_productcode("wine") being "Live swine" 😆

  • Very small detail -- you might want to put the top_exporter_code and top_exporter cols next to each other in the output since they reference the same thing. Right now they're sometimes separated by a few columns (e.g. trade_exchange_val). Same for top_importer_code and top_importer

  • Should re-run pkgdown once new changes are implemented

@pachamaltese

This comment has been minimized.

pachamaltese commented Aug 26, 2018

thanks a lot @aedobbyn !!
I still have some pendings that I am adding to this new list
btw, after all of this cool changes you suggested, can I contact you by email? it's for a related project that benefited a lot from this feedback
mine is mvargas &a t& dcc * uchile * cl

@cimentadaj

This comment has been minimized.

cimentadaj commented Aug 26, 2018

Great job @pachamaltese. I think @aedobbyn has covered most of the major problems. Here are some minor concerns.

  • https://github.com/pachamaltese/oec/blob/master/R/get_countrycode.R#L51-L52 should say 'vector' rather than package.

  • Perhaps you can move this outside the main function so it's cleaner.

  • In general, I would create tests that check that the content of the request resembles what you requested and not only the format of the tibble. I would add a few tests checking whether get_data returned the requested country as well as for the different years. Maybe this applies to some topics as well. I just had a nasty bug in one of my packages related to this and it was awful!

  • Here I would check whether it has a 'try-error' to be safe. This makes sense because you're using try anyways.

Hope these are useful!

@pachamaltese

This comment has been minimized.

pachamaltese commented Aug 28, 2018

@cimentadaj once again thanks for the feedback !! I completed all the points that Amanda listed (above I wrote a clickable list that I completed)

the server is down again, please give 1-2 days to complete Jorge's addresses?

@cimentadaj: any chance to contact you by email? mine is " m vargas a/t dcc uchile cl"

@cimentadaj

This comment has been minimized.

cimentadaj commented Aug 28, 2018

Yes @pachamaltese , feel free to contact me and no worries about the 1-2 days.

@pachamaltese

This comment has been minimized.

pachamaltese commented Sep 5, 2018

@aedobbyn @cimentadaj thanks a lot!!
I had hectic days at the office, here are some changes that I'll try to complete ASAP

  • In your get_data @return documentation I would point people to your handy reference about what each of the columns means since a lot of these are non-obvious.
  • If you feel it's worth it (though I don't think strictly necessary), you could refactor out some of the redundancies in the get_data function. For instance, you use the same rename + mutate flow in each extraction, so this could be pulled out into a utility function. Similarly, the checks at the beginning of the function making sure that the years specified are in bounds for the classification could be stored in a lookup table. This way you only need one of these in get_data instead of five. ok , I added a wrapper to tidy IDs, probably more wrappers coming during the week
  • Right now you only allow years up to 2016, but in the coming years will the OEC have data past that year? Instead of updating this by hand is there a way to check what the greatest year available is, maybe on package load and store that as a variable? there's no way with the current API without loading "trash" with the year value (for future work, sure!)
  • I think in all cases you can replace single &s with &&s and |s with ||s* ok, modified to use correct 1st element comparison
  • Small typo here ("lower of equal to")
  • Future improvement for get_productcode would be a word_boundary boolean param for an exact word match. That would take care of the first result for get_productcode("wine") being "Live swine" laughing Added to issues pachamaltese/oec#22
  • Very small detail -- you might want to put the top_exporter_code and top_exporter cols next to each other in the output since they reference the same thing. Right now they're sometimes separated by a few columns (e.g. trade_exchange_val). Same for top_importer_code and top_importer noted, thanks!! I moved that and others columns to a more "logical" order
  • Should re-run pkgdown once new changes are implemented
  • https://github.com/pachamaltese/oec/blob/master/R/get_countrycode.R#L51-L52 should say 'vector' rather than package. thanks!! totally re-phrased now
  • Here I would check whether it has a 'try-error' to be safe. This makes sense because you're using try anyways. https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L145
  • Unnecessary line, I think (https://github.com/pachamaltese/oec/blob/73f4a7493aa649311f88f60dcbd0cfd4a203c895/R/get_data.R#L97)* Without it the checks fail
  • Perhaps you can move this outside the main function so it's cleaner. https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L112-L161
  • In general, I would create tests that check that the content of the request resembles what you requested and not only the format of the tibble. I would add a few tests checking whether get_data returned the requested country as well as for the different years. Maybe this applies to some topics as well. I just had a nasty bug in one of my packages related to this and it was awful!
@pachamaltese

This comment has been minimized.

pachamaltese commented Sep 12, 2018

@aedobbyn @cimentadaj Hi. Thanks for the feedback! there are four point I can't solve without breaking the check :S
This weekend I' ll try to solve https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L112-L161 as it is totally cleaner with that change that without

@maelle

This comment has been minimized.

Member

maelle commented Sep 27, 2018

👋 @pachamaltese did you get any chance to make progress on the package? When you do, please summarize your response again (using your checklist again I guess 😉).

@pachamaltese

This comment has been minimized.

pachamaltese commented Sep 27, 2018

@maelle Hi, I was not able to test a lot because the server has been down a lot actually.
I updated the list to reflect the only 3 point I could not fix. Those can be more tricky actually!

@maelle

This comment has been minimized.

Member

maelle commented Oct 16, 2018

@pachamaltese any update?

@pachamaltese

This comment has been minimized.

pachamaltese commented Oct 16, 2018

@maelle

This comment has been minimized.

Member

maelle commented Oct 16, 2018

Do you mean you don't trust the data provider enough with stability?

@pachamaltese

This comment has been minimized.

pachamaltese commented Oct 16, 2018

@maelle

This comment has been minimized.

Member

maelle commented Oct 16, 2018

Ok, too bad, will put the submission on hold, I hope you can solve the problem, good luck!

@maelle maelle added the holding label Oct 16, 2018

@maelle

This comment has been minimized.

Member

maelle commented Dec 6, 2018

@pachamaltese 👋, any news?

@pachamaltese

This comment has been minimized.

pachamaltese commented Dec 6, 2018

@maelle

This comment has been minimized.

Member

maelle commented Dec 6, 2018

Cool to read! Is there a public repo we can link from here in case someone reads the thread and gets curious?

@pachamaltese

This comment has been minimized.

pachamaltese commented Dec 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment