New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor textbook generator to check redirects #27
Refactor textbook generator to check redirects #27
Conversation
A little more on the problem. By default, y'all put a Lots of the Berkeley textbook pages have names with upper-case first letters. I'm building on a Mac. When the page is called, say
When Jekyll does the build, it first builds the page TMI, I'm sure, but just so you know why I'm suggesting this change. |
Sorry - I should also say - this is only a problem if you're doing a static build / upload, instead of getting Github to build the pages with Jekyll. Doing a static build is a good idea because it allows fancy plugins, such as https://github.com/archome/jekyll-citation |
Ah good catch - I'll take a look this weekend. re: your second point, I was intentionally not requiring people build locally and push all the HTML up, just because I didn't wanna scare folks off by requiring Ruby. But I agree w/ you there are a lot more possibilities if you do this. Think it's worth documenting? Maybe in an "advanced" section or something? |
Yes, sure, I don't think it should be required, but maybe supported. The here PR is only because I got pretty confused about what was going on for a while. Reflecting, given that most people are OK with the local server, maybe a flag in the config to specify when you are doing static builds would work, and only raise the error when that flag is set. For static-build instructions - it's pretty easy - here are the changes I needed - they include irrelevant changes adding a submodule with a bibliography : matthew-brett/dsfe@f96c3b3 |
@matthew-brett a bit of context for the merge conflicts. I noticed that this PR had been opened on what was a deprecated repo. Development was going on in a new repository called "jupyter book" which was a streamlined version of the original "textbooks with jekyll and jupyter" repository. However I noticed that more people had been coming to the original repo than the new repo, so decided to update the original with a new codebase, rather than move to a new repo entirely. This is a one-time change but it'll create merge conflicts. I'll try to take a look at this PR to implement the feature w/ the new codebase. Just FYI, the build system from a user's standpoint is quite similar to what it was before, with a couple small differences. Would love feedback on whether you think it's a more straightforward approach (the goal was to reduce cruft in the repo). New build guide is here: |
Hi Chris. I just had a look. Am I right in thinking that that all the stuff in the |
I solved this for my own repo with this hack : matthew-brett/dsfe@7a089ed |
@matthew-brett currently that is correct, the yaml would get obliterated. However I've been thinking this would be good to support (having some files w/ pre-defined YAML in them) and then jupyter-book would only append to that list (not overwrite values that already existed). Do you think this would be useful? |
Yes, right - that's what I'm doing now, with that commit - appending the existing YaML to the new stuff in the book generation step. |
yep! I'd be friendly to something like this for sure. Actually your code brings up a second point: we need tests for this repo. What do you think of a |
Yes, sure, a folder sounds reasonable. We'd have to import the script from the folder, maybe ugly, but better than nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally having a chance to take a look at this again :-) In case you'd be willing to push this through, I just had two quick nits on the code + I think we need a rebase since the build has been modified a little bit. There's also a tests/
folder now, BTW...though I'm not sure how we'd test this since we can't mimic both a case-insensitive and case-sensitive filesystem in the tests, I think. Lemme know!
scripts/generate_textbook.py
Outdated
if not cased_fs and page_link.lower() == sanitized: | ||
raise RuntimeError('Redirect {} clashes with page {} ' | ||
'on case-insensitive FS'.format( | ||
sanitized, page_link)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should people do in this situation? Make sure their file names are already-sanitized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I guess they would have to do that - or do what I did, and switch the filesystem (by making a disk image with a case-sensitive filesystem on it, and copying there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - think it makes sense to add a quick sentence along these lines to the error? Or we can add a short description of this problem in the docs and link to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking of how to add a guide for the "build the site locally and push the whole site to the internet" pathway for people...I feel like this should be a part of that info
Added extra flag to signify local build; only check case-sensitivity of pages in that case.
181ad06
to
20f14f7
Compare
I rebased, and refined the logic a little - to only check when flag |
Nice! Thanks for giving it a shot. Lemme figure out why the circle tests are failing...I doubt it's related to this PR so I'll try to debug and push to this branch if that's true! |
ah I think I figured it out. The
Lemme know if these changes look OK to you! |
scripts/generate_book.py
Outdated
@@ -91,6 +91,8 @@ def _case_sensitive_fs(path): | |||
with open(fname, 'wt') as fobj: | |||
fobj.write('text') | |||
written = glob(root + '*') | |||
except Exception: | |||
written = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch. Do you think the routine should return False in this case, or raise an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this would error on some machines? Actually it does think that we'd want an error thrown here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess when the BUILD_FOLDER directory is not writable? Maybe it does not exist yet? I guess I could create it, if it does not exist?
Yes, I think an error is the right thing too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool - would you like to wrap this one up, or want me to push to the branch again? No hurry :-)
Allow errors to propagate if files cannot be made.
scripts/generate_book.py
Outdated
@@ -233,8 +259,17 @@ def _copy_non_content_files(): | |||
# Front-matter YAML | |||
yaml_fm = [] | |||
yaml_fm += ['---'] | |||
# In case pre-existing links are sanitized | |||
sanitized = url_page.lower().replace('_', '-') | |||
if not args.local_build and cased_fs and url_page.lower() == sanitized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we only want this logic to trigger if it is a local build, right?
The tests are erroring out because it's triggering this runtime error on the circleci build
if not args.local_build and cased_fs and url_page.lower() == sanitized: | |
if args.local_build and cased_fs and url_page.lower() == sanitized: |
scripts/generate_book.py
Outdated
@@ -233,8 +259,17 @@ def _copy_non_content_files(): | |||
# Front-matter YAML | |||
yaml_fm = [] | |||
yaml_fm += ['---'] | |||
# In case pre-existing links are sanitized | |||
sanitized = url_page.lower().replace('_', '-') | |||
if not args.local_build and cased_fs and url_page.lower() == sanitized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, wouldn't this also trigger the warning if they were using the (correct) lowercase naming already? Since url_page.lower() == sanitized
is true regardless of whether url_page == "my/page-one.html"
vs. url_page == "my/Page-One.html"
? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, whoops.
yaml_fm += [' - "/"'] | ||
# In case pre-existing links are sanitized | ||
sanitized = url_page.lower().replace('_', '-') | ||
if sanitized != url_page: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok one more annoying gotcha, here it would also fail if people used all lower-case names, but also used underscores. E.g.
if the page name is my_page.html
then the sanitized is my-page.html
then the sanitized != url_page
is True
and the case check will raise the error.
I think we could get around this w/ the following suggestion:
if sanitized != url_page: | |
if sanitized != url_page.replace('_', '-'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing these up @matthew-brett - sorry for the slow response but I was off at the neurips conference and just got back yesterday!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what the check is for on the next line - no? I mean, in that case, url_page.lower() != sanitized
and it won't raise the error. Or am I not thinking straight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh you're right. In my example above, the first check (sanitized != url_page
would be True, but the second check url_page.lower() == sanitized
would be False so the error wouldn't be triggered...in that case I think this LGTM! Merging away, thanks for grinding through this one w/ me :-)
…c_comments Pages for sections, file for path, docs improvements
On a case insensitive file-system, the redirect pages could write over
the originals, changing the case of file name, and breaking the links on
a static site. This wasn't obvious serving locally on my Mac, because
the server is also working on a case-insensitive filesystem and
therefore fetches 'numbers.html' when asked for 'Numbers.html'. But,
when built and pushed up to - say - Github - with a case-sensitive
filesystem, the breakage appears.