Refactor textbook generator to check redirects #27

matthew-brett · 2018-10-20T14:44:28Z

On a case insensitive file-system, the redirect pages could write over
the originals, changing the case of file name, and breaking the links on
a static site. This wasn't obvious serving locally on my Mac, because
the server is also working on a case-insensitive filesystem and
therefore fetches 'numbers.html' when asked for 'Numbers.html'. But,
when built and pushed up to - say - Github - with a case-sensitive
filesystem, the breakage appears.

matthew-brett · 2018-10-20T17:29:22Z

A little more on the problem.

By default, y'all put a redirect-from into the YaML header of the built markdown. This has a sanitized link, which is lower-cased, and replaces underscores with dashes.

Lots of the Berkeley textbook pages have names with upper-case first letters.

I'm building on a Mac. When the page is called, say Numbers.ipynb then Numbers.md is the built markdown page. That page now has something like:

redirect_from:
  - 'chapters/02/numbers'

When Jekyll does the build, it first builds the page Numbers.html and then builds numbers.html (from the redirect-from section). On a Mac, as standard, the file-system is case-insensitive, and these are the same file, so Numbers.html disappears to be replaced by numbers.html. This isn't a problem serving locally, because the same case-insensitivity means that fetching Numbers.html will find numbers.html. It is a problem when pushed up to - say - Github, or some other file system that is case-sensitive. Now there is noNumbers.html page, and the textbook links are broken.

TMI, I'm sure, but just so you know why I'm suggesting this change.

matthew-brett · 2018-10-20T17:37:45Z

Sorry - I should also say - this is only a problem if you're doing a static build / upload, instead of getting Github to build the pages with Jekyll. Doing a static build is a good idea because it allows fancy plugins, such as https://github.com/archome/jekyll-citation

choldgraf · 2018-10-20T19:08:54Z

Ah good catch - I'll take a look this weekend.

re: your second point, I was intentionally not requiring people build locally and push all the HTML up, just because I didn't wanna scare folks off by requiring Ruby. But I agree w/ you there are a lot more possibilities if you do this. Think it's worth documenting? Maybe in an "advanced" section or something?

matthew-brett · 2018-10-20T22:34:05Z

Yes, sure, I don't think it should be required, but maybe supported. The here PR is only because I got pretty confused about what was going on for a while. Reflecting, given that most people are OK with the local server, maybe a flag in the config to specify when you are doing static builds would work, and only raise the error when that flag is set.

For static-build instructions - it's pretty easy - here are the changes I needed - they include irrelevant changes adding a submodule with a bibliography : matthew-brett/dsfe@f96c3b3

choldgraf · 2018-10-23T23:47:02Z

@matthew-brett a bit of context for the merge conflicts. I noticed that this PR had been opened on what was a deprecated repo. Development was going on in a new repository called "jupyter book" which was a streamlined version of the original "textbooks with jekyll and jupyter" repository. However I noticed that more people had been coming to the original repo than the new repo, so decided to update the original with a new codebase, rather than move to a new repo entirely. This is a one-time change but it'll create merge conflicts. I'll try to take a look at this PR to implement the feature w/ the new codebase.

Just FYI, the build system from a user's standpoint is quite similar to what it was before, with a couple small differences. Would love feedback on whether you think it's a more straightforward approach (the goal was to reduce cruft in the repo). New build guide is here:

https://predictablynoisy.com/jupyter-book/guide/01_overview

matthew-brett · 2018-10-26T11:08:36Z

Hi Chris. I just had a look. Am I right in thinking that that all the stuff in the content folder gets processed, so it can't have Jekyll yaml in it? For example - some of my current pages have redirect_from yaml bits in them - but I think these can't work with the current setup - is that right?

matthew-brett · 2018-10-26T12:52:50Z

I solved this for my own repo with this hack : matthew-brett/dsfe@7a089ed

choldgraf · 2018-10-26T15:10:52Z

@matthew-brett currently that is correct, the yaml would get obliterated. However I've been thinking this would be good to support (having some files w/ pre-defined YAML in them) and then jupyter-book would only append to that list (not overwrite values that already existed). Do you think this would be useful?

matthew-brett · 2018-10-26T15:15:52Z

Yes, right - that's what I'm doing now, with that commit - appending the existing YaML to the new stuff in the book generation step.

choldgraf · 2018-10-26T15:49:43Z

yep! I'd be friendly to something like this for sure. Actually your code brings up a second point: we need tests for this repo. What do you think of a script/tests folder to start putting this stuff?

matthew-brett · 2018-10-26T15:53:04Z

Yes, sure, a folder sounds reasonable. We'd have to import the script from the folder, maybe ugly, but better than nothing.

choldgraf

Finally having a chance to take a look at this again :-) In case you'd be willing to push this through, I just had two quick nits on the code + I think we need a rebase since the build has been modified a little bit. There's also a tests/ folder now, BTW...though I'm not sure how we'd test this since we can't mimic both a case-insensitive and case-sensitive filesystem in the tests, I think. Lemme know!

choldgraf · 2018-11-22T03:11:22Z

scripts/generate_textbook.py

+        if not cased_fs and page_link.lower() == sanitized:
+            raise RuntimeError('Redirect {} clashes with page {} '
+                               'on case-insensitive FS'.format(
+                                   sanitized, page_link))


What should people do in this situation? Make sure their file names are already-sanitized?

Yeah - I guess they would have to do that - or do what I did, and switch the filesystem (by making a disk image with a case-sensitive filesystem on it, and copying there).

Cool - think it makes sense to add a quick sentence along these lines to the error? Or we can add a short description of this problem in the docs and link to it?

I've been thinking of how to add a guide for the "build the site locally and push the whole site to the internet" pathway for people...I feel like this should be a part of that info

scripts/generate_textbook.py

Added extra flag to signify local build; only check case-sensitivity of pages in that case.

matthew-brett · 2018-11-28T12:59:33Z

I rebased, and refined the logic a little - to only check when flag --local-build specified. Then, I can just add that flag to my Makefile, to enable the check, without annoying anyone who doesn't need it.

choldgraf · 2018-11-28T19:18:00Z

Nice! Thanks for giving it a shot. Lemme figure out why the circle tests are failing...I doubt it's related to this PR so I'll try to debug and push to this branch if that's true!

…e checking pass

choldgraf · 2018-11-29T02:05:25Z

ah I think I figured it out. The generate_book.py script was actually erroring in the tests but it wasn't being caught. There were two things:

We were using call instead of check_call which is why pytest didn't error when the generate_book script errored
The new function didn't account for the possibility that written wouldn't be define because the try: block ran into an exception. I patched this with a kinda brute force approach (except Exception) but feel free to change if there's a better way!

Lemme know if these changes look OK to you!

matthew-brett · 2018-11-29T09:19:35Z

scripts/generate_book.py

@@ -91,6 +91,8 @@ def _case_sensitive_fs(path):
            with open(fname, 'wt') as fobj:
                fobj.write('text')
        written = glob(root + '*')
+    except Exception:
+        written = []


Thanks for the catch. Do you think the routine should return False in this case, or raise an error?

Do you know why this would error on some machines? Actually it does think that we'd want an error thrown here

I guess when the BUILD_FOLDER directory is not writable? Maybe it does not exist yet? I guess I could create it, if it does not exist?

Yes, I think an error is the right thing too.

cool - would you like to wrap this one up, or want me to push to the branch again? No hurry :-)

Allow errors to propagate if files cannot be made.

choldgraf · 2018-11-30T23:24:28Z

scripts/generate_book.py

@@ -233,8 +259,17 @@ def _copy_non_content_files():
        # Front-matter YAML
        yaml_fm = []
        yaml_fm += ['---']
+        # In case pre-existing links are sanitized
+        sanitized = url_page.lower().replace('_', '-')
+        if not args.local_build and cased_fs and url_page.lower() == sanitized:


I believe we only want this logic to trigger if it is a local build, right?
The tests are erroring out because it's triggering this runtime error on the circleci build

Suggested change

if not args.local_build and cased_fs and url_page.lower() == sanitized:

if args.local_build and cased_fs and url_page.lower() == sanitized:

choldgraf · 2018-11-30T23:26:18Z

scripts/generate_book.py

@@ -233,8 +259,17 @@ def _copy_non_content_files():
        # Front-matter YAML
        yaml_fm = []
        yaml_fm += ['---']
+        # In case pre-existing links are sanitized
+        sanitized = url_page.lower().replace('_', '-')
+        if not args.local_build and cased_fs and url_page.lower() == sanitized:


also, wouldn't this also trigger the warning if they were using the (correct) lowercase naming already? Since url_page.lower() == sanitized is true regardless of whether url_page == "my/page-one.html" vs. url_page == "my/Page-One.html"? Or am I missing something?

Yes, whoops.

choldgraf · 2018-12-07T19:39:37Z

scripts/generate_book.py

-            yaml_fm += ['  - "/"']
+        # In case pre-existing links are sanitized
+        sanitized = url_page.lower().replace('_', '-')
+        if sanitized != url_page:


ok one more annoying gotcha, here it would also fail if people used all lower-case names, but also used underscores. E.g.

if the page name is my_page.html

then the sanitized is my-page.html

then the sanitized != url_page is True and the case check will raise the error.

I think we could get around this w/ the following suggestion:

Suggested change

if sanitized != url_page:

if sanitized != url_page.replace('_', '-'):

thanks for fixing these up @matthew-brett - sorry for the slow response but I was off at the neurips conference and just got back yesterday!

I think that's what the check is for on the next line - no? I mean, in that case, url_page.lower() != sanitized and it won't raise the error. Or am I not thinking straight?

ohhh you're right. In my example above, the first check (sanitized != url_page would be True, but the second check url_page.lower() == sanitized would be False so the error wouldn't be triggered...in that case I think this LGTM! Merging away, thanks for grinding through this one w/ me :-)

…c_comments Pages for sections, file for path, docs improvements

matthew-brett mentioned this pull request Oct 24, 2018

Consider porting to new textbook framework matthew-brett/dsfe#2

Open

This was referenced Nov 22, 2018

Citation for Jupyter Book #49

Closed

Allow people to put YAML in their content files #51

Closed

choldgraf reviewed Nov 22, 2018

View reviewed changes

Rebase case-sensitive FS changes, refine

20f14f7

Added extra flag to signify local build; only check case-sensitivity of pages in that case.

matthew-brett force-pushed the case-sensitive-or-no branch from 181ad06 to 20f14f7 Compare November 28, 2018 12:58

updating tests so they're a bit more robust and making filesystem cas…

95e3225

…e checking pass

matthew-brett commented Nov 29, 2018

View reviewed changes

Create directory for case sensitive check

cc36c26

Allow errors to propagate if files cannot be made.

choldgraf reviewed Nov 30, 2018

View reviewed changes

Fix up sanitized checks (maybe)

80fbf3c

choldgraf reviewed Dec 7, 2018

View reviewed changes

choldgraf merged commit 1af7ff0 into executablebooks:master Dec 10, 2018

choldgraf added a commit to choldgraf/jupyter-book that referenced this pull request Apr 28, 2020

Merge pull request executablebooks#27 from ExecutableBookProject/jsta…

3758825

…c_comments Pages for sections, file for path, docs improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor textbook generator to check redirects #27

Refactor textbook generator to check redirects #27

matthew-brett commented Oct 20, 2018

matthew-brett commented Oct 20, 2018

matthew-brett commented Oct 20, 2018

choldgraf commented Oct 20, 2018

matthew-brett commented Oct 20, 2018

choldgraf commented Oct 23, 2018

matthew-brett commented Oct 26, 2018

matthew-brett commented Oct 26, 2018

choldgraf commented Oct 26, 2018

matthew-brett commented Oct 26, 2018

choldgraf commented Oct 26, 2018

matthew-brett commented Oct 26, 2018

choldgraf left a comment

choldgraf Nov 22, 2018

matthew-brett Nov 22, 2018

choldgraf Nov 22, 2018

choldgraf Nov 22, 2018

matthew-brett commented Nov 28, 2018

choldgraf commented Nov 28, 2018

choldgraf commented Nov 29, 2018

matthew-brett Nov 29, 2018

choldgraf Nov 30, 2018

matthew-brett Nov 30, 2018

choldgraf Nov 30, 2018

choldgraf Nov 30, 2018

choldgraf Nov 30, 2018

matthew-brett Dec 1, 2018

choldgraf Dec 7, 2018

choldgraf Dec 7, 2018

matthew-brett Dec 7, 2018

choldgraf Dec 10, 2018

	if not args.local_build and cased_fs and url_page.lower() == sanitized:
	if args.local_build and cased_fs and url_page.lower() == sanitized:

	if sanitized != url_page:
	if sanitized != url_page.replace('_', '-'):

Refactor textbook generator to check redirects #27

Refactor textbook generator to check redirects #27

Conversation

matthew-brett commented Oct 20, 2018

matthew-brett commented Oct 20, 2018

matthew-brett commented Oct 20, 2018

choldgraf commented Oct 20, 2018

matthew-brett commented Oct 20, 2018

choldgraf commented Oct 23, 2018

matthew-brett commented Oct 26, 2018

matthew-brett commented Oct 26, 2018

choldgraf commented Oct 26, 2018

matthew-brett commented Oct 26, 2018

choldgraf commented Oct 26, 2018

matthew-brett commented Oct 26, 2018

choldgraf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthew-brett commented Nov 28, 2018

choldgraf commented Nov 28, 2018

choldgraf commented Nov 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment