Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing PMCID to PMID citation parser #71

Closed
agitter opened this issue Oct 12, 2018 · 5 comments
Closed

Passing PMCID to PMID citation parser #71

agitter opened this issue Oct 12, 2018 · 5 comments

Comments

@agitter
Copy link
Member

agitter commented Oct 12, 2018

I wanted to use Manubot to cite PMCID PMC6063279 and instead received a citation for PMID 6063279. I had accidentally written

manubot cite pmid:PMC6063279

Apparently the Entrez eutils takes the PMCID, strips PMC, and then returns the XML for that numeric PMID. This seems like an error on the part of eutils. Do we want to try to prevent this type of user error? Or should we instead leave the Manubot behavior but notify the eutils maintainers?

@dhimmel
Copy link
Member

dhimmel commented Oct 12, 2018

Ooh that's an interesting behavior to strip PMC.

Do we want to try to prevent this type of user error?

Yes, we could check the pubmed ID for being all numeric.

This seems like an error on the part of eutils

I have found it's difficult to get in touch with the right developers at NCBI. For example, see https://github.com/ncbi/citation-exporter/issues/3: I created a GitHub issue, but I was still told to go through Helpdesk tickets, where I never received a response that actually acknowledged the feature request.

@dhimmel
Copy link
Member

dhimmel commented Oct 12, 2018

Apparently the Entrez eutils takes the PMCID, strips PMC

Also sounds like whoever engineered this considered the character stripping a feature... so it's likely not considered a bug, and is perhaps utilized by other applications (;

@agitter
Copy link
Member Author

agitter commented Oct 12, 2018

Yes, we could check the pubmed ID for being all numeric.

What would be more user-friendly?

  • Throw an error if someone uses pmid:PMC6063279
  • Check whether the non-numeric PMID starts with PMC and redirect the citation to get_pmc_citeproc

@dhimmel
Copy link
Member

dhimmel commented Oct 17, 2018

We could warn and redirect, but I'm in favor of throwing the error such that the user is required to fix it. We want manubot manuscripts to adhere to good standards, especially say for future projects that analyze citation practices.

I'm thinking we need better validation of all identifier types. If we know that a certain ID must adhere to a given format, we should do those checks for the user and provide good error messages, rather than the potentially more confusing error message provided by our retrieval functions.

@agitter
Copy link
Member Author

agitter commented Oct 18, 2018

I'm supportive of throwing an error if the error message is informative. We could potentially make detailed error messages that help catch the most common errors. Something like

PMC6063279 is not a valid PubMed Identifier. You can cite this PubMed Centeral Identifier using pmcid:PMC6063279?

dhimmel added a commit that referenced this issue Nov 13, 2018
Merges #76
Closes #71

* Improve citation string error message
* Add citation validation checks in is_valid_citation_string
* Detect source capitalization issues
* is_valid_citation_string enable allow_* options to properly distinguish the validation
  needs of manuscript citation strings versus cite command citation strings.
* manubot cite CLI: validate citations & do not attempt CSL if invalid citation
* Reformat process.util.get_citation_df error message
* Improve process.manuscript.get_citation_strings docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants