Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: UTDEventData: An R package to access political event data #1322

Open
whedon opened this Issue Mar 14, 2019 · 18 comments

Comments

Projects
None yet
5 participants
@whedon
Copy link
Collaborator

whedon commented Mar 14, 2019

Submitting author: @KateHyoung (Hyoungah Kim)
Repository: https://github.com/KateHyoung/UTDEventData
Version: v1.0.0
Editor: @alexhanna
Reviewer: @briatte, @andrewheiss
Archive: Pending

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/fd6fd7f126c57c8aed32d2920f66c608"><img src="http://joss.theoj.org/papers/fd6fd7f126c57c8aed32d2920f66c608/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/fd6fd7f126c57c8aed32d2920f66c608/status.svg)](http://joss.theoj.org/papers/fd6fd7f126c57c8aed32d2920f66c608)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@briatte & @andrewheiss, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @alexhanna know.

Please try and complete your review in the next two weeks

Review checklist for @briatte

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (v1.0.0)?
  • Authorship: Has the submitting author (@KateHyoung) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @andrewheiss

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (v1.0.0)?
  • Authorship: Has the submitting author (@KateHyoung) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon

This comment has been minimized.

Copy link
Collaborator Author

whedon commented Mar 14, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @briatte, it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
@whedon

This comment has been minimized.

Copy link
Collaborator Author

whedon commented Mar 14, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

whedon commented Mar 14, 2019

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 1 [replicating vignette UTDEventData.html]

Quick note re: API registration at http://eventdata.utdallas.edu/signup

The UTD server does not check for email address validity and will return an error (500) if provided an invalid email address.

That's not something that can be blamed on the package, of course, but the UTD server admins might benefit from improving their API registration form.

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 2 [replicating vignette UTDEventData.html]

The error message shown when the API key is wrong could be improved:

DataTables(api_key="foobar")
[1] "{\"STATUS\": \"ERROR\", \"DATA\":\"(<TYPE 'EXCEPTIONS.VALUEERROR'>, VALUEERROR('INVALID API KEY',), <TRACEBACK OBJECT AT 0X7F5ACD310488>)\"}"
@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 3 [replicating vignette UTDEventData.html]

Is it deliberate that variable names all start with a space? Example below. This will be confusing for most users.

r$> tableVar(k, "Phoenix_rt")
 [1] " code"            " src_actor"       " month"
 [4] " tgt_actor"       " country_code"    " year"
 [7] " date8_val"       " id"              " source"
[10] " date8"           " src_agent"       " latitude"
[13] " src_other_agent" " geoname"         " quad_class"
[16] " source_text"     " root_code"       " tgt_other_agent"
[19] " day"             " target"          " goldstein"
[22] " tgt_agent"       " longitude"       " url"
[25] " _id"
@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Typo in vignette:

but "cline_phenix" will return noting.

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 4 [replicating vignette UTDEventData.html]

The orList constructor asks for a list:

# a boolean logic, or, with the two query blocks
or_query <- orList(list(ctr, time))

It will throw an error if the list is provided in nonstandard-evaluation style:

r$> orList(ctr, time)
Error in orList(ctr, time) : unused argument (time)

Perhaps it would help the user if the function would accept objects directly:

function (...)
{
    return(list(`$or` = list(...))
}

Same goes for similar query constructors.

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 5 [replicating vignette UTDEventData.html]

Thinking about the API key, it is customary to allow its storage as an environment variable (see ?options). Is that possible with this package? The vignette does not mention it.

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 6 [replicating vignette UTDEventData.html]

The vignette is very helpful (and the authors do a great job at documenting possible errors/traps, e.g. Windows memory issues), yet I believe it would benefit from being broken down in more digestible chunks, such as:

  • an 'intro' vignette explaining the gist of the package, how to cite it, and further useful links
  • one or two 'example' vignettes (the first one could combine Examples 1 and 2)
  • a 'designing complex queries' vignette covering things like regex-ed queries
@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 7 [replicating vignette UTDEventData.html]

I'm done replicating the vignette.

Thinking about the package and data more globally, I think I'd appreciate a mention, somewhere in the docs, of how the data relate to (and whether it can be articulated with) similar event data and related event nomenclatures, e.g. those that Phil Schrodt has worked (or is working) on.

Apologies for not being more specific here, as I have limited experience with event data -- if this comment is too vague to be addressed, I'll inquire a bit and reformulate.

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 8 [reviewing paper.bib]

Re: this review point,

References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

It seems to me that IEEE papers always come with a DOI. Those are not in the bibliography. Same goes for the Schrodt paper.

As for the Althaus et al. dataset (BibTeX reference cline), is that kind of data that could get distributed via e.g. Zenodo (and get a DOI from there)?

@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 16, 2019

Comment # 9 [checking my unchecked review points]

Functionality: Have the functional claims of the software been confirmed?

Yup. Checking off.

Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

I do not think this really applies here. Performance in this package boils down (mostly) to API query speed. While replicating the vignette, I found some queries to be slow-ish, but given the size of the data returned by some examples (250,000+ obs.), I'm fine with it.

Checking off.

Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?

Also checking off this one, given that the package cannot really be tested against the API without providing an API key.

Some very basic unit tests could be imagined for small parts of the code: perhaps the authors will want to introduce a few control flow checks (e.g. making sure the argument to andList is a list).

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Also checking off this one, as I'm not sure the authors really expect community contributions. The README includes everything else needed to email e.g. bugs or questions.

That's pretty much it for me: over to @andrewheiss.

Last, for the editor (@alexhanna), we need:

@whedon check references (see comment # 8 above for why)

@arfon

This comment has been minimized.

Copy link
Member

arfon commented Mar 20, 2019

@whedon check references

@whedon

This comment has been minimized.

Copy link
Collaborator Author

whedon commented Mar 20, 2019

Attempting to check references...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

whedon commented Mar 20, 2019


OK DOIs

- None

MISSING DOIs

- https://doi.org/10.1109/isi.2016.7745457 may be missing for title: Near real-time atrocity event coding
- https://doi.org/10.1109/iri.2018.00065 may be missing for title: TwoRavens for Event Data
- https://doi.org/10.1080/03050629.2012.697430 may be missing for title: Precedents, progress, and prospects in political event data
- https://doi.org/10.1109/bigdata.2017.8258256 may be missing for title: Adaptive scalable pipelines for political event data generation

INVALID DOIs

- None
@briatte

This comment has been minimized.

Copy link
Collaborator

briatte commented Mar 21, 2019

Dear @KateHyoung (I hope it's fine to address submitting authors in JOSS review threads?)

I posted a bunch of numbered comments in the thread above. I hope that you will find them useful in improving your package, which I found helpful and carefully coded. Thanks for your work, looking forward to discussing things further.

All the best~

@KateHyoung

This comment has been minimized.

Copy link

KateHyoung commented Mar 21, 2019

Thank you @briatte for taking your time to review our R package and documents. Your comments are very helpful to improve our package. Regarding comment #1, I will contact the server manager at UTD to improve security. For others, I will try my best to reflect your comments on codes and vignette and will discuss if I have questions on your comments. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.