Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use without cache? #34

Closed
garywill opened this issue Nov 15, 2018 · 14 comments
Closed

Use without cache? #34

garywill opened this issue Nov 15, 2018 · 14 comments

Comments

@garywill
Copy link

Would there be a feature using without cache?
Just directly curl the page and display it.

@raylee
Copy link
Owner

raylee commented Nov 15, 2018

That's an interesting thought. It'd require two curls at minimum though, given the tldr pages are organized into a platform specific directory and a common directory as a fallback.

To clarify, is your intent to make sure tldr never touches the local filesystem? Or are you aiming for a simplification of what the script is currently doing?

@garywill
Copy link
Author

make sure tldr never touches the local filesystem

That's what I intented.

I thought about that after I found cheat.sh
It is used simply curl cht.sh/blahblah
cheat.sh should have a searching function on server side.

Two curl requests is fine, I think.

@raylee
Copy link
Owner

raylee commented Nov 16, 2018

Would you be okay if it required an explicit switch and/or an environment variable to enable that mode? Say "export TLDR_NOCACHE=1" or something?

It should be a relatively straightforward change but I'd like to keep the defaults using the cache.

@garywill
Copy link
Author

Yes, it's OK!
It would be better if NOCACHE can be set from any of

  • argument ( so people can use alias)
  • environment viriable
  • config file ( if there is)

@raylee
Copy link
Owner

raylee commented Nov 20, 2018

I've added this into the client in f56b013

The commit describes how to use it:

Add a mode which downloads the page from github every time, with caveats.

Enable with '-n' or the environment variable TLDR_CACHE=no. eg, add
export TLDR_CACHE=no to your startup script, or

alias tldr="env TLDR_CACHE=no tldr"
or
alias tldr="tldr -n"

This will cause tldr to directly curl the requested page from GitHub
each invocation with no local cache, and nothing touching the local
filesystem. You will be subject to GitHub's rate limits however,
which as of this writing appear to be 60 requests per hour.

This mode is not recommended for common usage.

@garywill
Copy link
Author

Great! Good job.

What about using personal token to bypass github rate limits?
curl -u username:token https://api.github.com/......
I think a token with no special scope can do it. Increases from 60 to 5000.

If you agree, I'm willing to make a PR ( on some weekend).

@raylee
Copy link
Owner

raylee commented Dec 3, 2018

That could work. Have you decided whether that's necessary yet? (Meaning, have you hit the rate limit issue?)

@garywill
Copy link
Author

garywill commented Dec 4, 2018

Not hit yet. Maybe that can be left there for now.

@n-st
Copy link

n-st commented Jul 1, 2020

It seems f56b013 isn't quite the full story yet, unfortunately:
The script always runs the config function unconditionally, which automatically populates the cache if the cache directory is non-existent (not if it's just empty, though).
And all the better for it, since --list directly uses the contents of the cache directory, without creating/populating it beforehand.

Currently, the only workaround I see would be to split argument parsing and actual actions (list/fetch/update/…) into two steps, so we can first decide whether to use the cache and only then do anything the might try to use the cache. Without that, cache usage would depend on the order of arguments (tldr -n -l → no cache, tldr -l -n → with cache).

Oh the joys of orthogonal features…

@raylee
Copy link
Owner

raylee commented Jul 1, 2020

Good catch. Adding something like the below before handling the flags would be a start.

flagged() {
	flag=$1
	shift
	for i do
		case $i in
			"$flag") true; return ;;
		esac
	done
	false
}

if flagged -n "$@"; then
	TLDR_CACHE=no
fi

Making --list useful without a cache or touching the filesystem at all is a bit tricky. The tldr pages archive is ~1.2MB, index.json itself is ~200K. I can't see anyone wanting to download that every time they do a list.

Or would you? What's your use case look like for this?

@n-st
Copy link

n-st commented Jul 1, 2020

Wow, that was quick!

As for my use case: I originally looked at this issue because I wanted to add a tldr client to my default shell config, but avoid having it use disk space for a lot of tldr pages, most of which I'll likely never need. (The cache directory is 15 MiB at the moment, after all.)
The TLDR_CACHE environment variable would work nicely for that, but I'd prefer to also have bash/zsh completion — which uses --list, which currently requires a fully populated cache…

I don't immediately see a clean universal solution, so just throwing some ideas around…

  • If the goal is to avoid disk access completely (e.g. on a read-only filesystem), I guess the only option is to always load everything live.
    The index file is 206 KiB, but could be gzip'd to 11.6 KiB for the HTTP transfer — but it seems Github's HTTP server doesn't do gzip, so no luck there…
  • If the goal is to save disk space:
    • It would give us more flexibility to generate --list from index.json (perhaps with some sed/awk scripting, since grep -o sadly isn't POSIX), so we don't need to keep all pages locally.
    • Then maybe cache pages and index separately (so one can use --list with a small cache, but disable the page cache)?
      For an even smaller cache, one could preprocess index.json into the plain list, and then only cache that list (1.7kB vs 216kB).
    • It would also be a step in this direction to only cache on-demand (as is already done?)
  • If the goal is to have pages more recent than the default 14 days (based on the index cache):
    • Introduce an environment variable to configure the cache duration? Though that's of course a bit out-of-scope for this issue. ;)
      (It's convenient to just clone the tldr-sh-client repo for easier updates, so it helps if configuration doesn't require a change to the script itself.)

@raylee
Copy link
Owner

raylee commented Jul 2, 2020

Yeah, this original issue was to have tldr not touch the filesystem at all. That doesn't preclude another environment option or switch to get the behavior you'd like, of course. But I'd like to not add a third option in the future when someone else points out I forgot another use case for slimming it down further :-)

For saving disk space but still allowing completions (which seems good), I think pulling down the list and extracting just the bits the script needs seems workable. Though I'm glad someone else shares my pain about grep -o not being POSIX. (Really, POSIX shell is the worst possible language for most things, except for something like this which should be universal and light on dependencies.)

This tldr client has an update option to force downloading fresh pages before the default cache expiration time, so that's sorted too.

So I think we're down to a parameter that tells tldr to get just the index.json and use that for its queries (perhaps with preprocessing). Can you think of any variations we're forgetting?

@raylee
Copy link
Owner

raylee commented Jul 2, 2020

Thinking about this more, the right solution is probably an environment variable which dictates what should be kept from the archive, and what should be tossed. That would generalize for handling different languages as well.

@raylee
Copy link
Owner

raylee commented Dec 19, 2021

The new version of the script relies upon the local filesystem to generate a listing. As it's a bit of surgery to remove that assumption I'm going to close this feature as "close enough".

@raylee raylee closed this as completed Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants