Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declare input and output types of commands #6796

Merged
merged 8 commits into from
Nov 9, 2022

Conversation

dandavison
Copy link
Contributor

@dandavison dandavison commented Oct 18, 2022

This PR adds input_output_types: Vec<(Type, Type)> to the command Signature struct. This is a collection of all the different input-output type signature variants.

The declared input and output types (shapes) are enforced by the test suite: they are checked when executing every example.

There are two main commits:

  • 3b5a6cb sets things up so that we can declare input-output type signatures and verifies them in the tests of examples

  • f47ba52 is a big commit that adds signatures to all commands that have automatically-tested examples (and adds/changes command examples along the way so that they cause the tests to pass)

The bottom line is that once this merges, we'll have the following rule:

  • If a command has examples, every example must match one of the input-output type signatures.
  • If the command has declared input-output type signatures, then every signature variant must have at least one example matching that signature.

Subsequently, we can add this information to the help commands and $nu.scope.commands tables, so that users can query nushell commands by their input and output types.

We can also use this to add signatures to the text help pages, peraps like #6754.

Being able to query commands by type will also provide a way to study the way that builtin commands are named, to check that we're keeping things reasonably consistent (e.g. to help with things like #6744).

image

  • Fail the test if any example did not match any input-output type pair
  • Fail the test if any input-output type pair did not match any example
  • Check example does appear to be testing the right command
  • Query: commands accepting table but not record

@sholderbach sholderbach marked this pull request as draft October 18, 2022 14:43
@fdncred
Copy link
Collaborator

fdncred commented Oct 18, 2022

this looks very cool. nice work dan! can't wait to play with it.

@fdncred
Copy link
Collaborator

fdncred commented Oct 18, 2022

It's clearly known that this is not complete yet but I was trying it out and saw that it was puking on plugins so I wanted to add it to the list of TODOs

Error:
  × Signature deserialization error
   ╭─[C:\Users\username\AppData\Roaming\nushell\plugin.nu:1:1]
 1 │ register C:\Users\username\.cargo\bin\nu_plugin_custom_values.exe  {
   · ────┬───
   ·     ╰── unable to deserialize signature: missing field `input_shape` at line 25 column 1
 2 │   "name": "custom-value generate",
   ╰────

@kubouch
Copy link
Contributor

kubouch commented Oct 19, 2022

I think this is a good information to add to the help table. My main concern is the table becomes very big. An alternative might be to add it to $nu.scope.commands instead.

Also, the input/output types should be shown in the help text itself when you type help foo, but that's a slightly different topic and can be a separate PR.

@merelymyself
Copy link
Contributor

How might this work out when you have a command which has multiple possible outputs? first x returns a list but first returns a value, for instance. Will it be marked as any?

@dandavison
Copy link
Contributor Author

My main concern is the table becomes very big. An alternative might be to add it to $nu.scope.commands instead.

Thanks @kubouch. I agree that we don't want the table to get too big. I think that the input and output type of a command is the most important information after name and usage, and so it should be in both help commands and $nu.scope.commands. However, I think there might be other things we could do to make the help commands table narrower, for example adding a --long flag to hide some extra information (maybe then we could combine help commands and $nu.scope.commands?) Also perhaps we could combine is_plugin is_custom is_keyword into a single enum-like column (I'm thinking they're mutually exclusive?)

 〉help commands | select is_plugin is_custom is_keyword | uniq -c | table --expand
╭────────────────────────────────────────┬───────╮
│                 value                  │ count │
├────────────────────────────────────────┼───────┤
│ ╭───────────┬───────────┬────────────╮ │    78 │
│ │ is_plugin │ is_custom │ is_keyword │ │       │
│ ├───────────┼───────────┼────────────┤ │       │
│ │ false     │ true      │ false      │ │       │
│ ╰───────────┴───────────┴────────────╯ │       │
│ ╭───────────┬───────────┬────────────╮ │   437 │
│ │ is_plugin │ is_custom │ is_keyword │ │       │
│ ├───────────┼───────────┼────────────┤ │       │
│ │ false     │ false     │ false      │ │       │
│ ╰───────────┴───────────┴────────────╯ │       │
│ ╭───────────┬───────────┬────────────╮ │    22 │
│ │ is_plugin │ is_custom │ is_keyword │ │       │
│ ├───────────┼───────────┼────────────┤ │       │
│ │ false     │ false     │ true       │ │       │
│ ╰───────────┴───────────┴────────────╯ │       │
╰────────────────────────────────────────┴───────╯

Also, the input/output types should be shown in the help text itself when you type help foo, but that's a slightly different topic and can be a separate PR.

Definitely, will do that as part of this work.

@dandavison
Copy link
Contributor Author

dandavison commented Oct 19, 2022

Thanks for thinking about this @merelymyself;

How might this work out when you have a command which has multiple possible outputs? first x returns a list but first returns a value, for instance. Will it be marked as any?

I would like to suggest that, if we go with this PR, we make return type consistent for a single command, i.e. we no longer permit flags and positional args to alter the return type of a command (some discussion in discord). So in the case of first, it will be changed so that first n no longer exists: that functionality is provided by take n.

I do agree with the discussion between you and @extemporalgenome in #6616 which points out that it's even worse for the return type to alter according to the value (as opposed to presence) of a positional. E.g. the way that first 1 and first 2 were returning different output types (which you fixed in #6616, and which someone will fix for take -- @Dorumin would you like to as you have a patch already?).

Related to that, I am not convinced that we want to retain the way that some commands automatically map over their input, e.g.

 〉[a b] | str upcase | to nuon
[A, B]

The language provides [a b] | each { str upcase } for that, and usage of constructs like each in a pipeline is an absolutely central part of the language, so it must be convenient and discoverable for users to do that, in which case there is only downside to having an unpredictable subset of commands provide the "map-over-input" behavior automagically.

I think that there are lots of advantages to having the return type remain consistent regardless of flags/positionals, as well as input value. The most important might be that it gives us a natural definition of what a nushell command is: it's something that transform this type of input into that type of output. So the rule I'm proposing will answer the question of whether functionality should be exposed as one command vs two separate commands, and help prevent commands from becoming complex and obscure to understand, like transpose (see e.g. @webbedspace's #6735).

One remaining thing that needs to be solved in this PR is how to handle "column paths". These are special positionals that, contrary to the proposed rule, exist specifically for the purpose of altering the output type. So the rule I'm imagining we introduce would be something like:

A nushell command always returns the same output type, for all combinations of flags and positional arguments. However if column paths are supplied, then the output type is the same as the input type.

For anyone who's not familiar with column paths (e.g. me until very recently), they behave like this for the commands that support them:

 〉'a' | str upcase | to nuon
"A"
 〉{'a': 'a', 'b': 'b'} | str upcase 'a' | to nuon
{a: A, b: b}

So concretely, what this PR needs to do, is detect when examples are using a column path, and alter the output type expectation accordingly. Or should the language have a more explicit way of distinguishing column paths from other positional arguments?

@fdncred
Copy link
Collaborator

fdncred commented Oct 19, 2022

maybe then we could combine help commands and $nu.scope.commands ...

I'm not sold on this idea yet but one vote for it is that I'm betting startup would be a little bit faster. I could be wrong but I think $scope is populated on startup, so if that didn't have to happen, nushell should startup faster.

@fdncred
Copy link
Collaborator

fdncred commented Oct 19, 2022

We talked about this PR in our team meeting today. We'd like to move forward with this PR and like the direction it's going. Thanks for your help. I think there are a few things left to do but once we get there we'll land this.

@rgwood
Copy link
Contributor

rgwood commented Oct 20, 2022

Thanks for the PR!

I noticed that after this change, we would be tracking both types and shapes:

    pub input_type: Type,
    pub output_type: Type,
    pub input_shape: SyntaxShape,
    pub output_shape: SyntaxShape,

Aside: What's the difference between shapes and types?

Types are more specific than shapes (thanks Dan for the reminder). For example, the "Record" type includes field names+types:

 〉{ foo: 1, bar: 2 } | describe
record<foo: int, bar: int>

Shapes are more general; you can say that something has the shape of a record without specifying exactly which fields it has.

OK, back to this PR

I would like to understand whether we need to track both type and shape.

This may come down the intended uses. Input+output types are currently used for enabling command overloading (like how we have 2 to csv commands, and Nu automatically uses the dataframe version if the input is a dataframe). I personally think we should remove overloading (sorry for being a broken record on this), but some people like the feature and as I understand it overloading was the initial motivation for input+output types.

I'm less clear on how shapes will be used (primarily as a UX/documentation improvement?) and unsure whether both shapes+types are needed.

@fdncred
Copy link
Collaborator

fdncred commented Oct 20, 2022

@rgwood We also have another type called FlatShapes that are used in syntax highlighting in the repl. (see syntax_highlighting.rs). So, I wonder if all 3 of these are really necessary?

@rgwood
Copy link
Contributor

rgwood commented Oct 20, 2022

How might this work out when you have a command which has multiple possible outputs?

I'm also wondering about the inverse of this; does this approach rule out commands with multiple possible input types? I guess those would need to be Any with a switch at runtime?

@dandavison
Copy link
Contributor Author

I'm also wondering about the inverse of this; does this approach rule out commands with multiple possible input types?

The way I'm thinking about this is that we can divide this question up into:

  1. Multiple input types are supported because the command supports cell paths.
  2. Multiple input types are supported but not due to cell paths.

I'm not sure yet how to handle (1). Putting that aside temporarily, what are examples of (2)? Do we even want (2)?

As I mentioned above, I'm aware of

〉[a b] | str upcase | to nuon
[A, B]

and I'm not convinced that it should behave like this since mapping over input is provided by [a b] | each { str upcase }.

@fdncred
Copy link
Collaborator

fdncred commented Oct 20, 2022

what are examples of (2) ...

I think there must be several commands that do this because I recall a match on value type in several places. One example may be sort. The sort command should work with ints, strings, other types? sort-by uses cell paths I believe but it's still similar. I bet there are more use cases, if I'm understanding properly.

  • fmt - matches on int and filesize

maybe these into commands should be exempt from type because they're really meant to convert one thing to another

  • into binary - matches on binary and stream
  • into decimal - matches on string, int, bool
  • into duration - matches on a few types
  • into blah ...
  • skip / take - i think they do something different depending on input, specifically if the input is binary

@dandavison
Copy link
Contributor Author

One example may be sort. Sort should work with ints, strings, other types?

Sort is a bit different -- it works on list<T> for various sortable T, and that's as expected.

However, it does also seem to accept an integer or a string, but it's not immediately obvious to me why it accepts those since it doesn't sort them in any sense:

〉'ba za' | sort
ba za
〉7 | sort
7

@webbedspace
Copy link
Contributor

webbedspace commented Oct 27, 2022

Problem I just found with this PR: input and output should be on $nu.scope.commands too, because that's supposed to be a superset of help commands's output.

@rgwood
Copy link
Contributor

rgwood commented Oct 28, 2022

I noticed that after this change, we would be tracking both types and shapes:

After giving this more thought, I think this redundancy will be temporary and doesn't need to be addressed in this PR. We haven't decided on the future of command overloading, but:

  • If we remove command overloading: we remove input/output types
  • If we keep command overloading: we can probably rewrite it to use shapes instead of types

@dandavison
Copy link
Contributor Author

We haven't decided on the future of command overloading, but:
If we remove command overloading: we remove input/output types

@rgwood I think that the "cell paths" feature in nushell is a form of command overloading, so in that sense we won't remove command overloading. Would you agree with that?

@dandavison
Copy link
Contributor Author

dandavison commented Oct 30, 2022

The current approach I'm taking here is:

  • Add a field operates_on_cell_paths to Signature (currently that's a new field on the struct, but actually I think we may be able to compute this by inspecting the rest positionals)
  • Allow input and output shapes to be declared as a map, allowing for the possibility of multiple input shapes (but each must map consistently to a single output shape). This currently handles commands that automatically map over a list, but as I've said elsewhere I'm not sure any commands should do that (that's what each is for).
  • If an example input does not match a declared input shape, and if operates_on_cell_paths=true, then require that the input shape is record|list|table and assert that the output shape is the same as input shape.

@dandavison dandavison force-pushed the input-output-syntax-shape branch 4 times, most recently from 42ba0e5 to 4ca8c68 Compare October 31, 2022 04:14
@rgwood
Copy link
Contributor

rgwood commented Oct 31, 2022

@rgwood I think that the "cell paths" feature in nushell is a form of command overloading

I’m not sure I understand, can you give an example?

@fdncred
Copy link
Collaborator

fdncred commented Oct 31, 2022

I didn't really follow how cell-paths are a form of overloading either? I just see the traditional nushell concept where we have the same command name in several contexts like nushell, dfr, db. The other overloading I'm familiar with is c# method overloading where you can have the same method name that take different inputs. I'm not sure that fits here either though, does it?

@dandavison
Copy link
Contributor Author

dandavison commented Oct 31, 2022

I’m not sure I understand, can you give an example?

AIUI "command overloading" means that the "same command" exists in a number of variant forms which differ in the number and types of inputs they accept, and in the type of output that a given input signature produces.

So if we take str upcase as an example. At its core, it is simply

〉'a' | str upcase
A

This form has the type (String) => String.

But as with many nushell commands there is a second variant:

〉[{k1: a k2: b}] | str upcase k1
╭────┬────╮
│ k1 │ k2 │
├────┼────┤
│ A  │ b  │
╰────┴────╯

Here 'k1' is a "cell path". This form has the type (Table, String) => Table (I'm using the first slot in the type signature to refer to input, and subsequent slots to refer to positional arguments).

So, it seems that the cell path feature is a form of command overloading.

And cell-paths actually give rise to a third variant (Record, String) => Record:

〉{k1: a k2: b} | str upcase k1
╭────┬───╮
│ k1 │ A │
│ k2 │ b │
╰────┴───╯

Incidentally, there also another form which many nushell commands have: List<String> => List<String> which I am calling "vectorize-over-list":

〉[a b] | str upcase
╭───╮
│ A │
│ B │
╰───╯

I'm thinking that all these polymorphisms are examples of "command overloading". If so, then since cell paths seem well established in the language, it seems that command overloading isn't going away. And the cell paths are very useful I think -- addressing and operating on deeply nested cells is not something all languages make easy.

The vectorization behavior also seems well-established, but it's less obvious that it should be since nushell also has each and par-each.

Incidentally, I've suggested an alternative way that we could expose cell paths:

〉[{k1: a k2: b}] | at k1 | str upcase
╭────┬────╮
│ k1 │ k2 │
├────┼────┤
│ A  │ b  │
╰────┴────╯

This would also be command overloading: the variant of str upcase dispatched to in this case would have the type

(TableAugmentedWithCellPath) => Table

or maybe

(TableAugmentedWithCellPath) => TableAugmentedWithCellPath

@dandavison
Copy link
Contributor Author

It's clearly known that this is not complete yet but I was trying it out and saw that it was puking on plugins so I wanted to add it to the list of TODOs

@fdncred This sounds like a good catch. I haven't done anything about it yet (would anyone be up for doing that?)

@dandavison dandavison marked this pull request as ready for review November 7, 2022 03:45
@sophiajt
Copy link
Contributor

sophiajt commented Nov 7, 2022

Let's wait to land until after this week's release. That hopefully gives us time to fix the plugins and then we can land it with a few weeks to test it until the next release.

@rgwood
Copy link
Contributor

rgwood commented Nov 7, 2022

Hmm, I'm having trouble checking out this PR locally:
image

The commit history for the PR looks a little strange (seems to have some unrelated commits), not sure if that's related.

@dandavison dandavison changed the title Suggestion: input and output syntax shapes Declare input and output types of commands Nov 7, 2022
@dandavison
Copy link
Contributor Author

@rgwood thanks I've rebase on main now. (I do force-push to my feature branches -- not sure if that causes a problem with the gh tool if you've previously checked out the branch?)

@fdncred
Copy link
Collaborator

fdncred commented Nov 7, 2022

I was going to try this morning too but can't. Maybe there's something wonky with gh?

gh pr checkout 6796
remote: Enumerating objects: 296, done.
remote: Counting objects: 100% (296/296), done.
remote: Compressing objects: 100% (173/173), done.
Receiving objects: 100% (296/296), 342.87 KiB | 1.52 MiB/s, done.d 0

Resolving deltas: 100% (161/161), completed with 5 local objects.
From https://github.com/nushell/nushell
 ! [rejected]            refs/pull/6796/head -> input-output-syntax-shape  (non-fast-forward)
exit status 1

Update: I used the "Github Pull Requests and Issues" plugin and was able to check it out. Apparently it works differently than gh.

@dandavison
Copy link
Contributor Author

I force-push to feature branches, so I'm guessing you'll need the --force flag if you're using gh and you previously pulled the branch. Can you try

gh pr checkout --force 6796

Otherwise just the usual vanilla git routine, e.g.

git remote add dandavison git@github.com:dandavison/nushell.git
git fetch dandavison input-output-syntax-shape
git checkout input-output-syntax-shape
git reset --hard dandavison/input-output-syntax-shape

@rgwood
Copy link
Contributor

rgwood commented Nov 7, 2022

--force worked! However, I'm not seeing the types in help commands (after a simple cargo run). Am I missing something obvious?

image

@dandavison
Copy link
Contributor Author

Ah sorry that screenshot dates from an initial draft version. The current version of the PR doesn't add them to the help commands or nu.scope.commands tables yet. (I was thinking of doing the latter first)

@dandavison
Copy link
Contributor Author

I've pushed a commit that adds the signatures in a simple way to the help commands table as formatted strings (with embedded newlines, so a command with multiple signatures takes up multiple lines). However, these "signatures" only show the input type, not the positionals or flags -- this needs more thought, as does the related question of how exactly to add them to $nu.scope.commands.

image

@fdncred
Copy link
Collaborator

fdncred commented Nov 8, 2022

IMO, the input/output type doesn't belong in help commands. I believe help commands came first way back when, as a simple listing of commands that was easily searched, and then we added $nu.scope.commands to add all the nitty gritty details about commands. I think this belongs in the signature block of $nu.scope.commands and if someone is interested enough to look for this, they can do $nu.scope.commands | where name == debug | get signature.0 and see an in => out column.

I think it just adds confusion.
Screenshot 2022-11-08 at 5 57 36 AM

However, I absolutely think the input/output should be part of the help <command_name> text. (see #6754) It may be more important to add it to each command's help than to have it in the rollup of help commands where it's difficult to search.

@sophiajt sophiajt merged commit df94052 into nushell:main Nov 9, 2022
@fdncred
Copy link
Collaborator

fdncred commented Nov 10, 2022

@dandavison I just noticed this issue with our nu_plugin_custom_values plugin. I think 🤔 it's related to this PR. Any thoughts on how to fix it if it is?

Error: 
  × Signature deserialization error
   ╭─[/Users/fdncred/Library/Application Support/nushell/plugin.nu:1:1]
 1 │ register /Users/fdncred/.cargo/bin/nu_plugin_custom_values  {
   · ────┬───
   ·     ╰── unable to deserialize signature: missing field `vectorizes_over_list` at line 25 column 1
 2 │   "name": "custom-value generate",
   ╰────

@dandavison
Copy link
Contributor Author

@fdncred I haven't started using plugins myself but from the error message it looks like some serialized plugin state needs to be regenerated -- can you blow it away and reinstall the plugin?

@fdncred
Copy link
Collaborator

fdncred commented Nov 10, 2022

I did that, rebuilt everything, got the same error registering it but it registered somehow. And I can execute the plugin. I have no idea what's going on.

Oh, it just occurred to me that when the signature changes we have to delete the $nu.plugin-path file. I think your PR changed the signature. I'll try that tomorrow and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants