Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: initial user-guide pass for string functions #2635

Merged
merged 4 commits into from
Aug 11, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Aug 10, 2023

This PR adds three user-guide entries under a new "strings" ToC heading. These guides handle loading strings from an array, splitting and joining, and component extraction.

Warning
Depends upon #2632

@agoose77
Copy link
Collaborator Author

We should just write some demo data to disk, rather than loading a huge remote resource.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I'd like to review the web-preview, but it's not getting built yet because of an error in the tests.

We should just write some demo data to disk, rather than loading a huge remote resource.

Yeah. Even though it's Zenodo and intended to be a long-lived resource, if we tie ourselves to too many of these things, we'll have to spend time repairing them if they ever do go offline. For datasets in git repos, like the bike paths, I made a fork of that repo and pointed to my fork. This one is not hosted by GitHub, though, it's Zenodo because it's very large.

But another problem with basing tutorials on very large datasets is that it introduces a large computation cost every time we build documentation, as well as a likely source of failure. (For example, the machine running the test to generate the output is overwhelmed and the test sometimes fails to build.)

A tutorial doesn't need a large dataset to be instructive; large datasets are for demos to be convincing. Could these inputs be cut down to something on the order of kB and put in the tests/samples directory? Then we'd be hosting them, and they'd be small.

(A large dataset is good for demonstrating scaling, but I'm wary of demonstrating scaling in auto-run tutorials. When I did that with the bike routes example, the top-voted comment on Hacker News was that the demo broke, due to a version issue between Numba and NumPy and the fact that our "tests are broken" error didn't stop the build.)

@agoose77
Copy link
Collaborator Author

But another problem with basing tutorials on very large datasets is that it introduces a large computation cost every time we build documentation, as well as a likely source of failure. (For example, the machine running the test to generate the output is overwhelmed and the test sometimes fails to build.)

Yes, that's the main reason that I'm in favour of vendoring (a subset of) these data.

@agoose77 agoose77 force-pushed the agoose77/docs-strings-initial-guide branch from ee51821 to 7a3b467 Compare August 11, 2023 11:44
@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Merging #2635 (ee51821) into main (2d21296) will increase coverage by 0.02%.
The diff coverage is n/a.

❗ Current head ee51821 differs from pull request most recent head 9e0e37a. Consider uploading reports for the commit 9e0e37a to get more accurate results

Additional details and impacted files

see 3 files with indirect coverage changes

@agoose77 agoose77 temporarily deployed to docs-preview August 11, 2023 12:14 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview August 11, 2023 13:00 — with GitHub Actions Inactive
@jpivarski
Copy link
Member

Should this one be included in today's release? I approved it because I think it's the right documentation, though I'm not sure if the switch to vendoring a subset of the data is to happen in this PR or another one.

The tests aren't all running because it's a documentation-only PR, which I can force-merge if it's ready to go today.

@agoose77
Copy link
Collaborator Author

agoose77 commented Aug 11, 2023

Yes, we should include it! It contains the vendored log files as discussed :)

@jpivarski
Copy link
Member

Okay! I'll merge it now. awkward-cpp is almost done.

@jpivarski jpivarski merged commit e2e5df6 into main Aug 11, 2023
16 checks passed
@jpivarski jpivarski deleted the agoose77/docs-strings-initial-guide branch August 11, 2023 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants