New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major overhaul to port manubot-rootstock functionality #2
Conversation
@agitter big changes on the way. Packaging lot's of the manubot-rootstock python code here. Another goal which I haven't mentioned elsewhere is to cache citation metadata requests at the earliest possible stage. Therefore if the processing code changes, we don't need to flush the entire cache every time. This will likely use This PR will change the referencing format to be pandoc compatible as per manubot/rootstock#2 (comment). If there are major breaking changes that you've wanted but previously deemed too destructive, now's the time to suggest them. |
Structuring this as a distinct package is a great idea. It will help separate the content from the build system and minimize our need to merge After changing citations to be pandoc compatible, will we still use the previous syntax such as I haven't looked at the code changes closely. Did you reimplement the essential parts of |
Yes. No more bibtex and the arXiv citeproc should now be more extensive.
Yes, but you will need a semicolon to separate multiple references:
But you will be able to do things like:
See this. Currently, I've deviated from pandoc style in 1 important way. References can end in slash. This is because so many URLs end in slash. We don't have to allow this (you can use tags for citations that aren't valid pandoc citation syntax). |
I agree with that choice. URLs ending with slash are too common a case to force the use of tags. |
Manubot is now a command line utility. It now requires a specific set of files to be in the content directory. Process the example-manuscript in the PR like:
The process should now be more tolerant to manuscript citation errors. Builds will still fail if there are any errors, but the process should complete. @agitter now's a good time to review. |
manubot/manubot.py
Outdated
prepare_manuscript(args) | ||
|
||
if error_handler.fired: | ||
print('Failure: exiting with code 1 due to logged errors') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably have this as a log myself ;-)
@dhimmel how thorough of a review would you like? I'll have to wait until this weekend at the earliest to do a complete line-by-line review. |
Happy to wait. This pull request requires lot's of updates to manuscript repos, so we should review it carefully and make sure we get everything right. |
8ca7c86 enables providing additional template variables for jinja2, like:
Therefore, you could have your analysis repo generate a bunch of stats, which you refer to by variable in the manuscript. |
@dhimmel Because there are a lot of changes here, would it be helpful to have one of the contributors from |
It's very hard to review this many changes at once. But @slochower @vsmalladi, your comments are welcome. We can also merge this PR without extensive review and take the codebase from there with follow up PRs to address any changes people have. |
I'm less familiar with doing code review through GitHub's interface. Is there a way I could merge the PR on a local clone of the repository and run it through test of various citation styles to make sure all's well? |
@slochower checking out the code from a PR is a great way to review it. Oftentimes, there is no substitute for actually testing code. You can checkout this PR into a branch. Let's assume this repo is your
|
Well start taking a look and see if I notice anything. |
Me too, sorry didn't get a chance to do it today, but I can take a look early next week if this isn't merged by then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only focused on the tests so far. The test code looks reasonable. Everything passes for me locally except the test that fails on Travis in 21bf789. I don't understand the Invalid response from https://data.datacite.org/10.7287%2Fpeerj.preprints.3100v1
error because the URL works in my browser. Maybe switch to a different example for now?
I ran the local tests in the manubot
conda environment from manubot-rootstock after installing this package by following the .travis.yml
install commands.
Reported upstream at datacite/datacite#187. I'll skip the test for time being. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've browsed through all of the files now and only have these minor comments. The new structure looks logical and more extensible to me. I didn't test the code outside of the formal test cases, but I'm okay merging this and testing it with our real manuscripts.
manubot/citations.py
Outdated
|
||
def citeproc_passthrough(csl_item, set_id=None): | ||
""" | ||
Fix errors in a CSL item and optional change its id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optionally change
manubot/manubot.py
Outdated
Read and process command line arguments. | ||
""" | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--content-directory', type=pathlib.Path, required=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--content-directory
and --output-directory
may seem like obvious arguments, but I would find it helpful to add a help message anyway.
manubot/manubot.py
Outdated
parser = argparse.ArgumentParser() | ||
parser.add_argument('--content-directory', type=pathlib.Path, required=True) | ||
parser.add_argument('--output-directory', type=pathlib.Path, required=True) | ||
parser.add_argument('--template-variables-path', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to include an example manuscript that uses the template variables feature, perhaps in manubot-rootstock
instead of here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add an example manuscript under tests
for testing --template-variables-path
.pipe(collections.Counter) | ||
) | ||
ref_counts['total'] = sum(ref_counts.values()) | ||
stats['reference_counts'] = ref_counts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the reference counts and word count get logged at some level? I may have missed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No ATM they only get saved to variables.json
. Hence they're available to the jinja2 templator / whenever you have access to the output files (anytime besides PR CI builds). Do you think all of variables.tsv should be logged? In what cases do you like seeing the reference and word counts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to have the user set a word count limit that would trigger logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhimmel if the manuscript has a word count or reference limit imposed by a journal or conference, logging those values so they can be inspected during PR review is potentially useful. Those two variables have special significance to me. I don't think it is necessary to log all of variables.tsv
.
@vsmalladi I think that setting a word count threshold to trigger logging would be more complicated than logging at an appropriate level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agitter Ya i think logging the word count and reference limit would be what i would look at. I think the only other thing I would possibly log, would be the number of figures. Since each journal could also have a limit to number of figures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vsmalladi that's a good idea, it could log figure and table counts. That could come in a later pull request if @dhimmel wants to get this massive one merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agitter That sound reasonable.
I have ran the installation and test code locally and everything seems to be working. I am starting to look line by line at the code. |
See https://travis-ci.org/greenelab/manubot/builds/258991246#L556 ``` error: The 'requests' distribution was not found and is required by manubot ```
Provide JSON variables for jinja2 from either a path or URL. Add logging for jinja2 template variables in the manuscript that are undefined.
Add test for a manuscript with zero files (missing content directory). Closes manubot/rootstock#47
Builds failing due to datacite/datacite#187
Okay thanks everyone for the feedback. In the interest of time, I'm going to merge. |
* Update for expanded manubot package * Create distinct caching directory * Deployment: references branch now output * Rewrite documentation * Rename CONTRIBUTING.md to USAGE.md Refs manubot/manubot#2
This build is based on 084e30d. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/manubot-rootstock/builds/262705136 https://travis-ci.org/greenelab/manubot-rootstock/jobs/262705137 [ci skip] The full commit message that triggered this build is copied below: Update for expanded manubot package (#48) * Update for expanded manubot package * Create distinct caching directory * Deployment: references branch now output * Rewrite documentation * Rename CONTRIBUTING.md to USAGE.md Refs manubot/manubot#2
This build is based on 084e30d. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/manubot-rootstock/builds/262705136 https://travis-ci.org/greenelab/manubot-rootstock/jobs/262705137 [ci skip] The full commit message that triggered this build is copied below: Update for expanded manubot package (#48) * Update for expanded manubot package * Create distinct caching directory * Deployment: references branch now output * Rewrite documentation * Rename CONTRIBUTING.md to USAGE.md Refs manubot/manubot#2
This PR aims to move much of the Python code from manubot-rootstock to this package.