-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: initial user-guide pass for string functions #2635
Conversation
We should just write some demo data to disk, rather than loading a huge remote resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I'd like to review the web-preview, but it's not getting built yet because of an error in the tests.
We should just write some demo data to disk, rather than loading a huge remote resource.
Yeah. Even though it's Zenodo and intended to be a long-lived resource, if we tie ourselves to too many of these things, we'll have to spend time repairing them if they ever do go offline. For datasets in git repos, like the bike paths, I made a fork of that repo and pointed to my fork. This one is not hosted by GitHub, though, it's Zenodo because it's very large.
But another problem with basing tutorials on very large datasets is that it introduces a large computation cost every time we build documentation, as well as a likely source of failure. (For example, the machine running the test to generate the output is overwhelmed and the test sometimes fails to build.)
A tutorial doesn't need a large dataset to be instructive; large datasets are for demos to be convincing. Could these inputs be cut down to something on the order of kB and put in the tests/samples directory? Then we'd be hosting them, and they'd be small.
(A large dataset is good for demonstrating scaling, but I'm wary of demonstrating scaling in auto-run tutorials. When I did that with the bike routes example, the top-voted comment on Hacker News was that the demo broke, due to a version issue between Numba and NumPy and the fact that our "tests are broken" error didn't stop the build.)
Yes, that's the main reason that I'm in favour of vendoring (a subset of) these data. |
ee51821
to
7a3b467
Compare
Codecov Report
Additional details and impacted files |
Should this one be included in today's release? I approved it because I think it's the right documentation, though I'm not sure if the switch to vendoring a subset of the data is to happen in this PR or another one. The tests aren't all running because it's a documentation-only PR, which I can force-merge if it's ready to go today. |
Yes, we should include it! It contains the vendored log files as discussed :) |
Okay! I'll merge it now. |
This PR adds three user-guide entries under a new "strings" ToC heading. These guides handle loading strings from an array, splitting and joining, and component extraction.