Use without cache? #34

garywill · 2018-11-15T10:22:07Z

Would there be a feature using without cache?
Just directly curl the page and display it.

The text was updated successfully, but these errors were encountered:

raylee · 2018-11-15T18:28:33Z

That's an interesting thought. It'd require two curls at minimum though, given the tldr pages are organized into a platform specific directory and a common directory as a fallback.

To clarify, is your intent to make sure tldr never touches the local filesystem? Or are you aiming for a simplification of what the script is currently doing?

garywill · 2018-11-16T02:15:12Z

make sure tldr never touches the local filesystem

That's what I intented.

I thought about that after I found cheat.sh
It is used simply curl cht.sh/blahblah
cheat.sh should have a searching function on server side.

Two curl requests is fine, I think.

raylee · 2018-11-16T02:18:49Z

Would you be okay if it required an explicit switch and/or an environment variable to enable that mode? Say "export TLDR_NOCACHE=1" or something?

It should be a relatively straightforward change but I'd like to keep the defaults using the cache.

garywill · 2018-11-16T04:45:42Z

Yes, it's OK!
It would be better if NOCACHE can be set from any of

argument ( so people can use alias)
environment viriable
config file ( if there is)

raylee · 2018-11-20T00:22:52Z

I've added this into the client in f56b013

The commit describes how to use it:

Add a mode which downloads the page from github every time, with caveats.

Enable with '-n' or the environment variable TLDR_CACHE=no. eg, add
export TLDR_CACHE=no to your startup script, or

alias tldr="env TLDR_CACHE=no tldr"
or
alias tldr="tldr -n"

This will cause tldr to directly curl the requested page from GitHub
each invocation with no local cache, and nothing touching the local
filesystem. You will be subject to GitHub's rate limits however,
which as of this writing appear to be 60 requests per hour.

This mode is not recommended for common usage.

garywill · 2018-11-20T02:40:01Z

Great! Good job.

What about using personal token to bypass github rate limits?
curl -u username:token https://api.github.com/......
I think a token with no special scope can do it. Increases from 60 to 5000.

If you agree, I'm willing to make a PR ( on some weekend).

raylee · 2018-12-03T22:14:52Z

That could work. Have you decided whether that's necessary yet? (Meaning, have you hit the rate limit issue?)

garywill · 2018-12-04T03:04:11Z

Not hit yet. Maybe that can be left there for now.

n-st · 2020-07-01T12:51:15Z

It seems f56b013 isn't quite the full story yet, unfortunately:
The script always runs the config function unconditionally, which automatically populates the cache if the cache directory is non-existent (not if it's just empty, though).
And all the better for it, since --list directly uses the contents of the cache directory, without creating/populating it beforehand.

Currently, the only workaround I see would be to split argument parsing and actual actions (list/fetch/update/…) into two steps, so we can first decide whether to use the cache and only then do anything the might try to use the cache. Without that, cache usage would depend on the order of arguments (tldr -n -l → no cache, tldr -l -n → with cache).

Oh the joys of orthogonal features…

raylee · 2020-07-01T16:42:43Z

Good catch. Adding something like the below before handling the flags would be a start.

flagged() {
	flag=$1
	shift
	for i do
		case $i in
			"$flag") true; return ;;
		esac
	done
	false
}

if flagged -n "$@"; then
	TLDR_CACHE=no
fi

Making --list useful without a cache or touching the filesystem at all is a bit tricky. The tldr pages archive is ~1.2MB, index.json itself is ~200K. I can't see anyone wanting to download that every time they do a list.

Or would you? What's your use case look like for this?

n-st · 2020-07-01T21:53:10Z

Wow, that was quick!

As for my use case: I originally looked at this issue because I wanted to add a tldr client to my default shell config, but avoid having it use disk space for a lot of tldr pages, most of which I'll likely never need. (The cache directory is 15 MiB at the moment, after all.)
The TLDR_CACHE environment variable would work nicely for that, but I'd prefer to also have bash/zsh completion — which uses --list, which currently requires a fully populated cache…

I don't immediately see a clean universal solution, so just throwing some ideas around…

If the goal is to avoid disk access completely (e.g. on a read-only filesystem), I guess the only option is to always load everything live.
The index file is 206 KiB, but could be gzip'd to 11.6 KiB for the HTTP transfer — but it seems Github's HTTP server doesn't do gzip, so no luck there…
If the goal is to save disk space:
- It would give us more flexibility to generate --list from index.json (perhaps with some sed/awk scripting, since grep -o sadly isn't POSIX), so we don't need to keep all pages locally.
- Then maybe cache pages and index separately (so one can use --list with a small cache, but disable the page cache)?
  For an even smaller cache, one could preprocess index.json into the plain list, and then only cache that list (1.7kB vs 216kB).
- It would also be a step in this direction to only cache on-demand (as is already done?)
If the goal is to have pages more recent than the default 14 days (based on the index cache):
- Introduce an environment variable to configure the cache duration? Though that's of course a bit out-of-scope for this issue. ;)
  (It's convenient to just clone the tldr-sh-client repo for easier updates, so it helps if configuration doesn't require a change to the script itself.)

raylee · 2020-07-02T01:53:15Z

Yeah, this original issue was to have tldr not touch the filesystem at all. That doesn't preclude another environment option or switch to get the behavior you'd like, of course. But I'd like to not add a third option in the future when someone else points out I forgot another use case for slimming it down further :-)

For saving disk space but still allowing completions (which seems good), I think pulling down the list and extracting just the bits the script needs seems workable. Though I'm glad someone else shares my pain about grep -o not being POSIX. (Really, POSIX shell is the worst possible language for most things, except for something like this which should be universal and light on dependencies.)

This tldr client has an update option to force downloading fresh pages before the default cache expiration time, so that's sorted too.

So I think we're down to a parameter that tells tldr to get just the index.json and use that for its queries (perhaps with preprocessing). Can you think of any variations we're forgetting?

raylee · 2020-07-02T03:24:17Z

Thinking about this more, the right solution is probably an environment variable which dictates what should be kept from the archive, and what should be tossed. That would generalize for handling different languages as well.

raylee · 2021-12-19T20:19:41Z

The new version of the script relies upon the local filesystem to generate a listing. As it's a bit of surgery to remove that assumption I'm going to close this feature as "close enough".

raylee closed this as completed Dec 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use without cache? #34

Use without cache? #34

garywill commented Nov 15, 2018

raylee commented Nov 15, 2018

garywill commented Nov 16, 2018

raylee commented Nov 16, 2018

garywill commented Nov 16, 2018

raylee commented Nov 20, 2018

garywill commented Nov 20, 2018

raylee commented Dec 3, 2018

garywill commented Dec 4, 2018

n-st commented Jul 1, 2020

raylee commented Jul 1, 2020 •

edited

n-st commented Jul 1, 2020

raylee commented Jul 2, 2020

raylee commented Jul 2, 2020

raylee commented Dec 19, 2021

Use without cache? #34

Use without cache? #34

Comments

garywill commented Nov 15, 2018

raylee commented Nov 15, 2018

garywill commented Nov 16, 2018

raylee commented Nov 16, 2018

garywill commented Nov 16, 2018

raylee commented Nov 20, 2018

garywill commented Nov 20, 2018

raylee commented Dec 3, 2018

garywill commented Dec 4, 2018

n-st commented Jul 1, 2020

raylee commented Jul 1, 2020 • edited

n-st commented Jul 1, 2020

raylee commented Jul 2, 2020

raylee commented Jul 2, 2020

raylee commented Dec 19, 2021

raylee commented Jul 1, 2020 •

edited