Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor textbook generator to check redirects #27

Merged
merged 4 commits into from Dec 10, 2018

Conversation

matthew-brett
Copy link
Contributor

On a case insensitive file-system, the redirect pages could write over
the originals, changing the case of file name, and breaking the links on
a static site. This wasn't obvious serving locally on my Mac, because
the server is also working on a case-insensitive filesystem and
therefore fetches 'numbers.html' when asked for 'Numbers.html'. But,
when built and pushed up to - say - Github - with a case-sensitive
filesystem, the breakage appears.

@matthew-brett
Copy link
Contributor Author

A little more on the problem.

By default, y'all put a redirect-from into the YaML header of the built markdown. This has a sanitized link, which is lower-cased, and replaces underscores with dashes.

Lots of the Berkeley textbook pages have names with upper-case first letters.

I'm building on a Mac. When the page is called, say Numbers.ipynb then Numbers.md is the built markdown page. That page now has something like:

redirect_from:
  - 'chapters/02/numbers'

When Jekyll does the build, it first builds the page Numbers.html and then builds numbers.html (from the redirect-from section). On a Mac, as standard, the file-system is case-insensitive, and these are the same file, so Numbers.html disappears to be replaced by numbers.html. This isn't a problem serving locally, because the same case-insensitivity means that fetching Numbers.html will find numbers.html. It is a problem when pushed up to - say - Github, or some other file system that is case-sensitive. Now there is noNumbers.html page, and the textbook links are broken.

TMI, I'm sure, but just so you know why I'm suggesting this change.

@matthew-brett
Copy link
Contributor Author

Sorry - I should also say - this is only a problem if you're doing a static build / upload, instead of getting Github to build the pages with Jekyll. Doing a static build is a good idea because it allows fancy plugins, such as https://github.com/archome/jekyll-citation

@choldgraf
Copy link
Member

Ah good catch - I'll take a look this weekend.

re: your second point, I was intentionally not requiring people build locally and push all the HTML up, just because I didn't wanna scare folks off by requiring Ruby. But I agree w/ you there are a lot more possibilities if you do this. Think it's worth documenting? Maybe in an "advanced" section or something?

@matthew-brett
Copy link
Contributor Author

Yes, sure, I don't think it should be required, but maybe supported. The here PR is only because I got pretty confused about what was going on for a while. Reflecting, given that most people are OK with the local server, maybe a flag in the config to specify when you are doing static builds would work, and only raise the error when that flag is set.

For static-build instructions - it's pretty easy - here are the changes I needed - they include irrelevant changes adding a submodule with a bibliography : matthew-brett/dsfe@f96c3b3

@choldgraf
Copy link
Member

@matthew-brett a bit of context for the merge conflicts. I noticed that this PR had been opened on what was a deprecated repo. Development was going on in a new repository called "jupyter book" which was a streamlined version of the original "textbooks with jekyll and jupyter" repository. However I noticed that more people had been coming to the original repo than the new repo, so decided to update the original with a new codebase, rather than move to a new repo entirely. This is a one-time change but it'll create merge conflicts. I'll try to take a look at this PR to implement the feature w/ the new codebase.

Just FYI, the build system from a user's standpoint is quite similar to what it was before, with a couple small differences. Would love feedback on whether you think it's a more straightforward approach (the goal was to reduce cruft in the repo). New build guide is here:

https://predictablynoisy.com/jupyter-book/guide/01_overview

@matthew-brett
Copy link
Contributor Author

Hi Chris. I just had a look. Am I right in thinking that that all the stuff in the content folder gets processed, so it can't have Jekyll yaml in it? For example - some of my current pages have redirect_from yaml bits in them - but I think these can't work with the current setup - is that right?

@matthew-brett
Copy link
Contributor Author

I solved this for my own repo with this hack : matthew-brett/dsfe@7a089ed

@choldgraf
Copy link
Member

@matthew-brett currently that is correct, the yaml would get obliterated. However I've been thinking this would be good to support (having some files w/ pre-defined YAML in them) and then jupyter-book would only append to that list (not overwrite values that already existed). Do you think this would be useful?

@matthew-brett
Copy link
Contributor Author

Yes, right - that's what I'm doing now, with that commit - appending the existing YaML to the new stuff in the book generation step.

@choldgraf
Copy link
Member

yep! I'd be friendly to something like this for sure. Actually your code brings up a second point: we need tests for this repo. What do you think of a script/tests folder to start putting this stuff?

@matthew-brett
Copy link
Contributor Author

Yes, sure, a folder sounds reasonable. We'd have to import the script from the folder, maybe ugly, but better than nothing.

Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally having a chance to take a look at this again :-) In case you'd be willing to push this through, I just had two quick nits on the code + I think we need a rebase since the build has been modified a little bit. There's also a tests/ folder now, BTW...though I'm not sure how we'd test this since we can't mimic both a case-insensitive and case-sensitive filesystem in the tests, I think. Lemme know!

if not cased_fs and page_link.lower() == sanitized:
raise RuntimeError('Redirect {} clashes with page {} '
'on case-insensitive FS'.format(
sanitized, page_link))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should people do in this situation? Make sure their file names are already-sanitized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I guess they would have to do that - or do what I did, and switch the filesystem (by making a disk image with a case-sensitive filesystem on it, and copying there).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - think it makes sense to add a quick sentence along these lines to the error? Or we can add a short description of this problem in the docs and link to it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking of how to add a guide for the "build the site locally and push the whole site to the internet" pathway for people...I feel like this should be a part of that info

scripts/generate_textbook.py Outdated Show resolved Hide resolved
Added extra flag to signify local build; only check case-sensitivity of
pages in that case.
@matthew-brett
Copy link
Contributor Author

I rebased, and refined the logic a little - to only check when flag --local-build specified. Then, I can just add that flag to my Makefile, to enable the check, without annoying anyone who doesn't need it.

@choldgraf
Copy link
Member

Nice! Thanks for giving it a shot. Lemme figure out why the circle tests are failing...I doubt it's related to this PR so I'll try to debug and push to this branch if that's true!

@choldgraf
Copy link
Member

ah I think I figured it out. The generate_book.py script was actually erroring in the tests but it wasn't being caught. There were two things:

  1. We were using call instead of check_call which is why pytest didn't error when the generate_book script errored
  2. The new function didn't account for the possibility that written wouldn't be define because the try: block ran into an exception. I patched this with a kinda brute force approach (except Exception) but feel free to change if there's a better way!

Lemme know if these changes look OK to you!

@@ -91,6 +91,8 @@ def _case_sensitive_fs(path):
with open(fname, 'wt') as fobj:
fobj.write('text')
written = glob(root + '*')
except Exception:
written = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Do you think the routine should return False in this case, or raise an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this would error on some machines? Actually it does think that we'd want an error thrown here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess when the BUILD_FOLDER directory is not writable? Maybe it does not exist yet? I guess I could create it, if it does not exist?

Yes, I think an error is the right thing too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool - would you like to wrap this one up, or want me to push to the branch again? No hurry :-)

Allow errors to propagate if files cannot be made.
@@ -233,8 +259,17 @@ def _copy_non_content_files():
# Front-matter YAML
yaml_fm = []
yaml_fm += ['---']
# In case pre-existing links are sanitized
sanitized = url_page.lower().replace('_', '-')
if not args.local_build and cased_fs and url_page.lower() == sanitized:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we only want this logic to trigger if it is a local build, right?
The tests are erroring out because it's triggering this runtime error on the circleci build

Suggested change
if not args.local_build and cased_fs and url_page.lower() == sanitized:
if args.local_build and cased_fs and url_page.lower() == sanitized:

@@ -233,8 +259,17 @@ def _copy_non_content_files():
# Front-matter YAML
yaml_fm = []
yaml_fm += ['---']
# In case pre-existing links are sanitized
sanitized = url_page.lower().replace('_', '-')
if not args.local_build and cased_fs and url_page.lower() == sanitized:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, wouldn't this also trigger the warning if they were using the (correct) lowercase naming already? Since url_page.lower() == sanitized is true regardless of whether url_page == "my/page-one.html" vs. url_page == "my/Page-One.html"? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whoops.

yaml_fm += [' - "/"']
# In case pre-existing links are sanitized
sanitized = url_page.lower().replace('_', '-')
if sanitized != url_page:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok one more annoying gotcha, here it would also fail if people used all lower-case names, but also used underscores. E.g.

if the page name is my_page.html

then the sanitized is my-page.html

then the sanitized != url_page is True and the case check will raise the error.

I think we could get around this w/ the following suggestion:

Suggested change
if sanitized != url_page:
if sanitized != url_page.replace('_', '-'):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing these up @matthew-brett - sorry for the slow response but I was off at the neurips conference and just got back yesterday!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what the check is for on the next line - no? I mean, in that case, url_page.lower() != sanitized and it won't raise the error. Or am I not thinking straight?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh you're right. In my example above, the first check (sanitized != url_page would be True, but the second check url_page.lower() == sanitized would be False so the error wouldn't be triggered...in that case I think this LGTM! Merging away, thanks for grinding through this one w/ me :-)

@choldgraf choldgraf merged commit 1af7ff0 into executablebooks:master Dec 10, 2018
choldgraf added a commit to choldgraf/jupyter-book that referenced this pull request Apr 28, 2020
…c_comments

Pages for sections, file for path, docs improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants