Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon.jsonrpc_file_summary: new method to print a summary of claims #3422

Closed
wants to merge 1 commit into from

Conversation

belikor
Copy link
Contributor

@belikor belikor commented Sep 14, 2021

This allows printing a list of all claim streams that were downloaded to the system. The list is printed to the terminal or to a specific file.

It accepts some parameters to control the information that is printed.

lbrynet file summary --blobs --show_channel --title --stream_type --path
lbrynet file summary --show=incomplete --start=10 --end=40
lbrynet file summary --sort=claim_name --reverse --sep=' ;.;'

The --file option writes the list of claims to a file which then can be shared with other users of LBRY in order to download the same claims and contribute to seeding that content.

lbrynet file summary --channel=@SomeChannel --file=summary.txt --fdate

By default it will print the date of the claim which is based on the 'claim_height', when the claim was registered in the blockchain, the 'claim_id', the 'claim_name', and whether the media is present or not in the download directory.

 1/42; 20200610_10:23:37-0500; b231714456ee832daeba4b8356803e7591126dff; "07-S"; no-media
 2/42; 20200610_10:27:06-0500; 31700ff11f900429d742f2f137ba25393bdb3b0a; "09-S"; media
 3/42; 20200609_23:14:47-0500; 70dfefa510ca6eee7023a2a927e34d385b5a18bd; "04-S"; no-media

@coveralls
Copy link

coveralls commented Sep 14, 2021

Coverage Status

Coverage decreased (-0.3%) to 67.638% when pulling 2718739 on belikor:print-summary into 561566e on lbryio:master.

@eukreign
Copy link
Member

@lyoshenka this PR involves API changes, please review

@lyoshenka
Copy link
Member

Thanks for writing this @belikor :-)

What's the purpose of this summary output? Is this for humans to read, or for another program to ingest?

Part of our API design philosophy is to keep it simple. The simpler you can make your APIs, the easier it will be to use and maintain them. Every extra API option is cognitive load on future devs and on your future self.

For example, the boolean flags to this call can be dropped and the extra info (channel, title, path, etc) can just be printed. Similarly, there's no need for flags to send the output to a file, to reverse or filter it, to choose a separator, or to add a date. Shells already have tools for all those things. And I'm not even sure what the comparison flag does.

In fact, I would lean towards not adding this api at all. Everything it does can be accomplished through file_list and a JSON-parsing tool such as jq. I could be wrong on this, so let me know what it does that jq cannot do and that's important to add.

If we do end up keeping it, here are the changes I'd make:

  • print the output in standard format. I suggest CSV.
  • drop all flags except sort. Everything else can be accomplished by the shell and by always printing all the fields.

Finally, next time you want to add a new API call, it might be helpful to run your idea by us first. I can help you design the API and avoid doing extra work.

@lyoshenka lyoshenka assigned belikor and unassigned lyoshenka Sep 21, 2021
@belikor
Copy link
Contributor Author

belikor commented Sep 22, 2021

What's the purpose of this summary output? Is this for humans to read, or for another program to ingest?

Both. Humans can share their lists of downloaded content, particularly if they have few videos downloaded (fewer than 100, perhaps); and programs can read (#3423) a long list of claims in order to do things with them (redownload, reflect, move, delete, copy blobs, etc.).

For example, the boolean flags to this call can be dropped and the extra info (channel, title, path, etc) can just be printed.

No. Too much information to read by a human, which is why they are optional.

Similarly, there's no need for flags to send the output to a file, to reverse or filter it, to choose a separator, or to add a date. Shells already have tools for all those things. And I'm not even sure what the comparison flag does.

Since this method is derived from file_list, the sort, reverse, and comparison flags are just passed to it. So comparison does whatever file_list --comparison already does.

Sending the output to a file is a must. That's how we'd be able to transmit information quickly in a way that is not JSON, which is hard to read by humans.

In fact, I would lean towards not adding this api at all. Everything it does can be accomplished through file_list and a JSON-parsing tool such as jq. I could be wrong on this, so let me know what it does that jq cannot do and that's important to add.

Probably everything can be done with file_list + jq, but why bother if we can already provide a small summary using the SDK itself (after all Python can parse JSON just fine). External applications that use the SDK can rely solely on the SDK, and not need a lot of extra libraries. Maybe this small summary is sufficient. People who need more information, can go the file_list + jq route.

  • print the output in standard format. I suggest CSV.

The reason I avoid the comma by default is because many claims include a comma as part of their claim name so using another separator, like the semicolon, seems better to me.

Finally, next time you want to add a new API call, it might be helpful to run your idea by us first. I can help you design the API and avoid doing extra work.

All these additions are inspired by my library, lbrytools, which I wrote to have more flexibility in downloading and managing multiple claims. If most of that functionality was in the SDK already, then I wouldn't need to write my own library on top of the SDK. So, I would prefer to integrate as many useful methods as possible to the SDK (batteries included) to reduce the amount of extra code that external applications need to write.

@lyoshenka
Copy link
Member

So, I would prefer to integrate as many useful methods as possible to the SDK (batteries included) to reduce the amount of extra code that external applications need to write.

I think this is the core difference in our approaches. We're not aiming to make the SDK be batteries-included. The more features you include, the harder it is to maintain the code. That said, since this is the main project implementing the LBRY protocol, we try to strike a balance between minimalism and usefulness. You could argue that the ideal approach is a very minimal library to implement the protocol, plus a batteries-included wrapper around it. I'd be open to such a design.

I do think printing a compact list of downloaded content can be useful, especially for human consumption (programs can ingest file_list output). But I'd prefer to keep the options on the API minimal. If someone wants something more specific, then file_list or an external tool such as lbrytools is the way to go. It makes a lot of sense to maintain external tools that interact with the SDK. We don't have to merge everything.

many claims include a comma as part of their claim

You'll have the same problem with claims that have a semicolon in the name. They are much less common but if you want scripts to ingest this output, you shouldn't count on that.

CSV handles commas just fine if you quote the field. The csv library will do it for you.

comparison does whatever file_list --comparison already does

file_list has params like added_on, where it makes sense to use a greater-than or less-than comparison. file_summary does not have such fields. can you give me an example of how you'd use --comparison with file_summary?

Reversing and saving output to a file are already handled by your shell. Here's how to reverse (at least on linux and mac):

lbrynet file summary --sort=claim_name | tac

And here's saving to a file

lbrynet file summary --sort=claim_name > my_claims.txt

There's no need to add these as flags.

@lyoshenka lyoshenka assigned belikor and unassigned lyoshenka Sep 28, 2021
This allows printing a list of all claim streams that were downloaded
to the system.
The list is printed to the terminal or to a specific file.

It accepts some parameters to control the information that is printed.
```
lbrynet file summary --blobs --show_channel --title --stream_type --path
lbrynet file summary --show=incomplete --start=10 --end=40
lbrynet file summary --sort=claim_name --reverse --sep=' ;.;'
```

The `--file` option writes the list of claims to a file
which then can be shared with other users of LBRY in order
to download the same claims and contribute to seeding that content.
```
lbrynet file summary --channel=@somechannel --file=summary.txt --fdate
```

By default it will print the date of the claim which is based on the `'claim_height'`,
when the claim was registered in the blockchain, the `'claim_id'`, the `'claim_name'`,
and whether the media is present or not in the download directory.
```
 1/42; 20200610_10:23:37-0500; b231714456ee832daeba4b8356803e7591126dff; "07-S"; no-media
 2/42; 20200610_10:27:06-0500; 31700ff11f900429d742f2f137ba25393bdb3b0a; "09-S"; media
 3/42; 20200609_23:14:47-0500; 70dfefa510ca6eee7023a2a927e34d385b5a18bd; "04-S"; no-media
```
@kauffj kauffj requested a review from lyoshenka October 4, 2021 15:47
@lyoshenka lyoshenka marked this pull request as draft October 5, 2021 13:05
@lyoshenka lyoshenka removed their request for review October 5, 2021 13:05
@kauffj kauffj closed this Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants