Declare input and output types of commands #6796

dandavison · 2022-10-18T14:34:25Z

This PR adds input_output_types: Vec<(Type, Type)> to the command Signature struct. This is a collection of all the different input-output type signature variants.

The declared input and output types (shapes) are enforced by the test suite: they are checked when executing every example.

There are two main commits:

3b5a6cb sets things up so that we can declare input-output type signatures and verifies them in the tests of examples
f47ba52 is a big commit that adds signatures to all commands that have automatically-tested examples (and adds/changes command examples along the way so that they cause the tests to pass)

The bottom line is that once this merges, we'll have the following rule:

If a command has examples, every example must match one of the input-output type signatures.
If the command has declared input-output type signatures, then every signature variant must have at least one example matching that signature.

Subsequently, we can add this information to the help commands and $nu.scope.commands tables, so that users can query nushell commands by their input and output types.

We can also use this to add signatures to the text help pages, peraps like #6754.

Being able to query commands by type will also provide a way to study the way that builtin commands are named, to check that we're keeping things reasonably consistent (e.g. to help with things like #6744).

Fail the test if any example did not match any input-output type pair
Fail the test if any input-output type pair did not match any example
Check example does appear to be testing the right command
Query: commands accepting table but not record

fdncred · 2022-10-18T14:51:19Z

this looks very cool. nice work dan! can't wait to play with it.

fdncred · 2022-10-18T15:31:38Z

It's clearly known that this is not complete yet but I was trying it out and saw that it was puking on plugins so I wanted to add it to the list of TODOs

Error:
  × Signature deserialization error
   ╭─[C:\Users\username\AppData\Roaming\nushell\plugin.nu:1:1]
 1 │ register C:\Users\username\.cargo\bin\nu_plugin_custom_values.exe  {
   · ────┬───
   ·     ╰── unable to deserialize signature: missing field `input_shape` at line 25 column 1
 2 │   "name": "custom-value generate",
   ╰────

kubouch · 2022-10-19T10:13:05Z

I think this is a good information to add to the help table. My main concern is the table becomes very big. An alternative might be to add it to $nu.scope.commands instead.

Also, the input/output types should be shown in the help text itself when you type help foo, but that's a slightly different topic and can be a separate PR.

merelymyself · 2022-10-19T10:40:33Z

How might this work out when you have a command which has multiple possible outputs? first x returns a list but first returns a value, for instance. Will it be marked as any?

dandavison · 2022-10-19T13:27:39Z

My main concern is the table becomes very big. An alternative might be to add it to $nu.scope.commands instead.

Thanks @kubouch. I agree that we don't want the table to get too big. I think that the input and output type of a command is the most important information after name and usage, and so it should be in both help commands and $nu.scope.commands. However, I think there might be other things we could do to make the help commands table narrower, for example adding a --long flag to hide some extra information (maybe then we could combine help commands and $nu.scope.commands?) Also perhaps we could combine is_plugin is_custom is_keyword into a single enum-like column (I'm thinking they're mutually exclusive?)

 〉help commands | select is_plugin is_custom is_keyword | uniq -c | table --expand
╭────────────────────────────────────────┬───────╮
│                 value                  │ count │
├────────────────────────────────────────┼───────┤
│ ╭───────────┬───────────┬────────────╮ │    78 │
│ │ is_plugin │ is_custom │ is_keyword │ │       │
│ ├───────────┼───────────┼────────────┤ │       │
│ │ false     │ true      │ false      │ │       │
│ ╰───────────┴───────────┴────────────╯ │       │
│ ╭───────────┬───────────┬────────────╮ │   437 │
│ │ is_plugin │ is_custom │ is_keyword │ │       │
│ ├───────────┼───────────┼────────────┤ │       │
│ │ false     │ false     │ false      │ │       │
│ ╰───────────┴───────────┴────────────╯ │       │
│ ╭───────────┬───────────┬────────────╮ │    22 │
│ │ is_plugin │ is_custom │ is_keyword │ │       │
│ ├───────────┼───────────┼────────────┤ │       │
│ │ false     │ false     │ true       │ │       │
│ ╰───────────┴───────────┴────────────╯ │       │
╰────────────────────────────────────────┴───────╯

Also, the input/output types should be shown in the help text itself when you type help foo, but that's a slightly different topic and can be a separate PR.

Definitely, will do that as part of this work.

dandavison · 2022-10-19T14:08:10Z

Thanks for thinking about this @merelymyself;

How might this work out when you have a command which has multiple possible outputs? first x returns a list but first returns a value, for instance. Will it be marked as any?

I would like to suggest that, if we go with this PR, we make return type consistent for a single command, i.e. we no longer permit flags and positional args to alter the return type of a command (some discussion in discord). So in the case of first, it will be changed so that first n no longer exists: that functionality is provided by take n.

I do agree with the discussion between you and @extemporalgenome in #6616 which points out that it's even worse for the return type to alter according to the value (as opposed to presence) of a positional. E.g. the way that first 1 and first 2 were returning different output types (which you fixed in #6616, and which someone will fix for take -- @Dorumin would you like to as you have a patch already?).

Related to that, I am not convinced that we want to retain the way that some commands automatically map over their input, e.g.

 〉[a b] | str upcase | to nuon
[A, B]

The language provides [a b] | each { str upcase } for that, and usage of constructs like each in a pipeline is an absolutely central part of the language, so it must be convenient and discoverable for users to do that, in which case there is only downside to having an unpredictable subset of commands provide the "map-over-input" behavior automagically.

I think that there are lots of advantages to having the return type remain consistent regardless of flags/positionals, as well as input value. The most important might be that it gives us a natural definition of what a nushell command is: it's something that transform this type of input into that type of output. So the rule I'm proposing will answer the question of whether functionality should be exposed as one command vs two separate commands, and help prevent commands from becoming complex and obscure to understand, like transpose (see e.g. @webbedspace's #6735).

One remaining thing that needs to be solved in this PR is how to handle "column paths". These are special positionals that, contrary to the proposed rule, exist specifically for the purpose of altering the output type. So the rule I'm imagining we introduce would be something like:

A nushell command always returns the same output type, for all combinations of flags and positional arguments. However if column paths are supplied, then the output type is the same as the input type.

For anyone who's not familiar with column paths (e.g. me until very recently), they behave like this for the commands that support them:

 〉'a' | str upcase | to nuon
"A"
 〉{'a': 'a', 'b': 'b'} | str upcase 'a' | to nuon
{a: A, b: b}

So concretely, what this PR needs to do, is detect when examples are using a column path, and alter the output type expectation accordingly. Or should the language have a more explicit way of distinguishing column paths from other positional arguments?

fdncred · 2022-10-19T16:33:04Z

maybe then we could combine help commands and $nu.scope.commands ...

I'm not sold on this idea yet but one vote for it is that I'm betting startup would be a little bit faster. I could be wrong but I think $scope is populated on startup, so if that didn't have to happen, nushell should startup faster.

fdncred · 2022-10-19T23:36:24Z

We talked about this PR in our team meeting today. We'd like to move forward with this PR and like the direction it's going. Thanks for your help. I think there are a few things left to do but once we get there we'll land this.

rgwood · 2022-10-20T17:10:23Z

Thanks for the PR!

I noticed that after this change, we would be tracking both types and shapes:

    pub input_type: Type,
    pub output_type: Type,
    pub input_shape: SyntaxShape,
    pub output_shape: SyntaxShape,

Aside: What's the difference between shapes and types?

Types are more specific than shapes (thanks Dan for the reminder). For example, the "Record" type includes field names+types:

 〉{ foo: 1, bar: 2 } | describe
record<foo: int, bar: int>

Shapes are more general; you can say that something has the shape of a record without specifying exactly which fields it has.

OK, back to this PR

I would like to understand whether we need to track both type and shape.

This may come down the intended uses. Input+output types are currently used for enabling command overloading (like how we have 2 to csv commands, and Nu automatically uses the dataframe version if the input is a dataframe). I personally think we should remove overloading (sorry for being a broken record on this), but some people like the feature and as I understand it overloading was the initial motivation for input+output types.

I'm less clear on how shapes will be used (primarily as a UX/documentation improvement?) and unsure whether both shapes+types are needed.

fdncred · 2022-10-20T17:35:23Z

@rgwood We also have another type called FlatShapes that are used in syntax highlighting in the repl. (see syntax_highlighting.rs). So, I wonder if all 3 of these are really necessary?

rgwood · 2022-10-20T18:11:14Z

How might this work out when you have a command which has multiple possible outputs?

I'm also wondering about the inverse of this; does this approach rule out commands with multiple possible input types? I guess those would need to be Any with a switch at runtime?

dandavison · 2022-10-20T19:27:45Z

I'm also wondering about the inverse of this; does this approach rule out commands with multiple possible input types?

The way I'm thinking about this is that we can divide this question up into:

Multiple input types are supported because the command supports cell paths.
Multiple input types are supported but not due to cell paths.

I'm not sure yet how to handle (1). Putting that aside temporarily, what are examples of (2)? Do we even want (2)?

As I mentioned above, I'm aware of

〉[a b] | str upcase | to nuon
[A, B]

and I'm not convinced that it should behave like this since mapping over input is provided by [a b] | each { str upcase }.

fdncred · 2022-10-20T19:41:05Z

what are examples of (2) ...

I think there must be several commands that do this because I recall a match on value type in several places. One example may be sort. The sort command should work with ints, strings, other types? sort-by uses cell paths I believe but it's still similar. I bet there are more use cases, if I'm understanding properly.

fmt - matches on int and filesize

maybe these into commands should be exempt from type because they're really meant to convert one thing to another

into binary - matches on binary and stream
into decimal - matches on string, int, bool
into duration - matches on a few types
into blah ...
skip / take - i think they do something different depending on input, specifically if the input is binary

dandavison · 2022-10-20T19:47:50Z

One example may be sort. Sort should work with ints, strings, other types?

Sort is a bit different -- it works on list<T> for various sortable T, and that's as expected.

However, it does also seem to accept an integer or a string, but it's not immediately obvious to me why it accepts those since it doesn't sort them in any sense:

〉'ba za' | sort
ba za
〉7 | sort
7

webbedspace · 2022-10-27T16:26:47Z

Problem I just found with this PR: input and output should be on $nu.scope.commands too, because that's supposed to be a superset of help commands's output.

rgwood · 2022-10-28T16:33:02Z

I noticed that after this change, we would be tracking both types and shapes:

After giving this more thought, I think this redundancy will be temporary and doesn't need to be addressed in this PR. We haven't decided on the future of command overloading, but:

If we remove command overloading: we remove input/output types
If we keep command overloading: we can probably rewrite it to use shapes instead of types

dandavison · 2022-10-30T14:45:24Z

We haven't decided on the future of command overloading, but:
If we remove command overloading: we remove input/output types

@rgwood I think that the "cell paths" feature in nushell is a form of command overloading, so in that sense we won't remove command overloading. Would you agree with that?

dandavison · 2022-10-30T14:49:10Z

The current approach I'm taking here is:

Add a field operates_on_cell_paths to Signature (currently that's a new field on the struct, but actually I think we may be able to compute this by inspecting the rest positionals)
Allow input and output shapes to be declared as a map, allowing for the possibility of multiple input shapes (but each must map consistently to a single output shape). This currently handles commands that automatically map over a list, but as I've said elsewhere I'm not sure any commands should do that (that's what each is for).
If an example input does not match a declared input shape, and if operates_on_cell_paths=true, then require that the input shape is record|list|table and assert that the output shape is the same as input shape.

rgwood · 2022-10-31T15:19:35Z

@rgwood I think that the "cell paths" feature in nushell is a form of command overloading

I’m not sure I understand, can you give an example?

fdncred · 2022-10-31T15:39:51Z

I didn't really follow how cell-paths are a form of overloading either? I just see the traditional nushell concept where we have the same command name in several contexts like nushell, dfr, db. The other overloading I'm familiar with is c# method overloading where you can have the same method name that take different inputs. I'm not sure that fits here either though, does it?

dandavison · 2022-10-31T15:48:08Z

I’m not sure I understand, can you give an example?

AIUI "command overloading" means that the "same command" exists in a number of variant forms which differ in the number and types of inputs they accept, and in the type of output that a given input signature produces.

So if we take str upcase as an example. At its core, it is simply

〉'a' | str upcase
A

This form has the type (String) => String.

But as with many nushell commands there is a second variant:

〉[{k1: a k2: b}] | str upcase k1
╭────┬────╮
│ k1 │ k2 │
├────┼────┤
│ A  │ b  │
╰────┴────╯

Here 'k1' is a "cell path". This form has the type (Table, String) => Table (I'm using the first slot in the type signature to refer to input, and subsequent slots to refer to positional arguments).

So, it seems that the cell path feature is a form of command overloading.

And cell-paths actually give rise to a third variant (Record, String) => Record:

〉{k1: a k2: b} | str upcase k1
╭────┬───╮
│ k1 │ A │
│ k2 │ b │
╰────┴───╯

Incidentally, there also another form which many nushell commands have: List<String> => List<String> which I am calling "vectorize-over-list":

〉[a b] | str upcase
╭───╮
│ A │
│ B │
╰───╯

I'm thinking that all these polymorphisms are examples of "command overloading". If so, then since cell paths seem well established in the language, it seems that command overloading isn't going away. And the cell paths are very useful I think -- addressing and operating on deeply nested cells is not something all languages make easy.

The vectorization behavior also seems well-established, but it's less obvious that it should be since nushell also has each and par-each.

Incidentally, I've suggested an alternative way that we could expose cell paths:

〉[{k1: a k2: b}] | at k1 | str upcase
╭────┬────╮
│ k1 │ k2 │
├────┼────┤
│ A  │ b  │
╰────┴────╯

This would also be command overloading: the variant of str upcase dispatched to in this case would have the type

(TableAugmentedWithCellPath) => Table

or maybe

(TableAugmentedWithCellPath) => TableAugmentedWithCellPath

dandavison · 2022-11-07T03:43:42Z

It's clearly known that this is not complete yet but I was trying it out and saw that it was puking on plugins so I wanted to add it to the list of TODOs

@fdncred This sounds like a good catch. I haven't done anything about it yet (would anyone be up for doing that?)

sophiajt · 2022-11-07T04:46:23Z

Let's wait to land until after this week's release. That hopefully gives us time to fix the plugins and then we can land it with a few weeks to test it until the next release.

rgwood · 2022-11-07T05:10:58Z

Hmm, I'm having trouble checking out this PR locally:

The commit history for the PR looks a little strange (seems to have some unrelated commits), not sure if that's related.

- Enforce them in tests

dandavison · 2022-11-07T11:42:34Z

@rgwood thanks I've rebase on main now. (I do force-push to my feature branches -- not sure if that causes a problem with the gh tool if you've previously checked out the branch?)

fdncred · 2022-11-07T14:47:27Z

I was going to try this morning too but can't. Maybe there's something wonky with gh?

gh pr checkout 6796
remote: Enumerating objects: 296, done.
remote: Counting objects: 100% (296/296), done.
remote: Compressing objects: 100% (173/173), done.
Receiving objects: 100% (296/296), 342.87 KiB | 1.52 MiB/s, done.d 0

Resolving deltas: 100% (161/161), completed with 5 local objects.
From https://github.com/nushell/nushell
 ! [rejected]            refs/pull/6796/head -> input-output-syntax-shape  (non-fast-forward)
exit status 1

Update: I used the "Github Pull Requests and Issues" plugin and was able to check it out. Apparently it works differently than gh.

dandavison · 2022-11-07T14:54:04Z

I force-push to feature branches, so I'm guessing you'll need the --force flag if you're using gh and you previously pulled the branch. Can you try

gh pr checkout --force 6796

Otherwise just the usual vanilla git routine, e.g.

git remote add dandavison git@github.com:dandavison/nushell.git
git fetch dandavison input-output-syntax-shape
git checkout input-output-syntax-shape
git reset --hard dandavison/input-output-syntax-shape

rgwood · 2022-11-07T21:12:52Z

--force worked! However, I'm not seeing the types in help commands (after a simple cargo run). Am I missing something obvious?

dandavison · 2022-11-07T21:49:29Z

Ah sorry that screenshot dates from an initial draft version. The current version of the PR doesn't add them to the help commands or nu.scope.commands tables yet. (I was thinking of doing the latter first)

dandavison · 2022-11-08T04:18:44Z

I've pushed a commit that adds the signatures in a simple way to the help commands table as formatted strings (with embedded newlines, so a command with multiple signatures takes up multiple lines). However, these "signatures" only show the input type, not the positionals or flags -- this needs more thought, as does the related question of how exactly to add them to $nu.scope.commands.

fdncred · 2022-11-08T12:13:11Z

IMO, the input/output type doesn't belong in help commands. I believe help commands came first way back when, as a simple listing of commands that was easily searched, and then we added $nu.scope.commands to add all the nitty gritty details about commands. I think this belongs in the signature block of $nu.scope.commands and if someone is interested enough to look for this, they can do $nu.scope.commands | where name == debug | get signature.0 and see an in => out column.

I think it just adds confusion.

However, I absolutely think the input/output should be part of the help <command_name> text. (see #6754) It may be more important to add it to each command's help than to have it in the rollup of help commands where it's difficult to search.

fdncred · 2022-11-10T01:08:31Z

@dandavison I just noticed this issue with our nu_plugin_custom_values plugin. I think 🤔 it's related to this PR. Any thoughts on how to fix it if it is?

Error: 
  × Signature deserialization error
   ╭─[/Users/fdncred/Library/Application Support/nushell/plugin.nu:1:1]
 1 │ register /Users/fdncred/.cargo/bin/nu_plugin_custom_values  {
   · ────┬───
   ·     ╰── unable to deserialize signature: missing field `vectorizes_over_list` at line 25 column 1
 2 │   "name": "custom-value generate",
   ╰────

dandavison · 2022-11-10T01:59:33Z

@fdncred I haven't started using plugins myself but from the error message it looks like some serialized plugin state needs to be regenerated -- can you blow it away and reinstall the plugin?

fdncred · 2022-11-10T02:06:29Z

I did that, rebuilt everything, got the same error registering it but it registered somehow. And I can execute the plugin. I have no idea what's going on.

Oh, it just occurred to me that when the signature changes we have to delete the $nu.plugin-path file. I think your PR changed the signature. I'll try that tomorrow and report back.

sholderbach marked this pull request as draft October 18, 2022 14:43

dandavison mentioned this pull request Oct 20, 2022

help pages should unambiguously state the type signatures of each command #6754

Closed

dandavison mentioned this pull request Oct 23, 2022

Shouldn't split column and split chars be str subcommands? #6744

Open

sholderbach mentioned this pull request Oct 28, 2022

Refactor: introduce general operate commands to reduce duplicate code #6879

Merged

7 tasks

dandavison force-pushed the input-output-syntax-shape branch from 94f147a to 3403d0c Compare October 29, 2022 15:14

dandavison force-pushed the input-output-syntax-shape branch 4 times, most recently from 42ba0e5 to 4ca8c68 Compare October 31, 2022 04:14

dandavison marked this pull request as ready for review November 7, 2022 03:45

dandavison added 4 commits November 7, 2022 06:37

Add failing test that list of ints and floats is List<Number>

912ffa3

Start defining subtype relation

223826f

Make it possible to declare input and output types for commands

16d48b3

- Enforce them in tests

Declare input and output types of commands

15b2e83

dandavison force-pushed the input-output-syntax-shape branch from c975f60 to 15b2e83 Compare November 7, 2022 11:38

dandavison changed the title ~~Suggestion: input and output syntax shapes~~ Declare input and output types of commands Nov 7, 2022

Add formatted signatures to help commands table

7400a4a

dandavison added 2 commits November 8, 2022 08:39

Revert SyntaxShape::Table -> Type::Table change

96456e7

Revert unnecessary derive(Hash) on SyntaxShape

fcd6a03

dandavison mentioned this pull request Nov 9, 2022

Replace all instances of 'column path' in help messages with 'cell path' #7063

Merged

Merge branch 'main' into input-output-syntax-shape

becfd33

sophiajt merged commit df94052 into nushell:main Nov 9, 2022

rgwood mentioned this pull request Nov 9, 2022

Fix CI failures after PR merge conflicts #7072

Merged

hbt mentioned this pull request Nov 11, 2022

into datetime not parsed correctly when testing examples #7032

Closed

sholderbach mentioned this pull request Dec 2, 2022

Missing type information for builtins to be filled out #7320

Closed

94 tasks

sholderbach mentioned this pull request Jul 19, 2023

Signature::vectorizes_over_list not useful at runtime only relevant to test. #9734

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Declare input and output types of commands #6796

Declare input and output types of commands #6796

dandavison commented Oct 18, 2022 •

edited

Loading

fdncred commented Oct 18, 2022

fdncred commented Oct 18, 2022

kubouch commented Oct 19, 2022

merelymyself commented Oct 19, 2022

dandavison commented Oct 19, 2022

dandavison commented Oct 19, 2022 •

edited

Loading

fdncred commented Oct 19, 2022

fdncred commented Oct 19, 2022

rgwood commented Oct 20, 2022

fdncred commented Oct 20, 2022

rgwood commented Oct 20, 2022

dandavison commented Oct 20, 2022

fdncred commented Oct 20, 2022 •

edited

Loading

dandavison commented Oct 20, 2022

webbedspace commented Oct 27, 2022 •

edited

Loading

rgwood commented Oct 28, 2022

dandavison commented Oct 30, 2022

dandavison commented Oct 30, 2022 •

edited

Loading

rgwood commented Oct 31, 2022

fdncred commented Oct 31, 2022

dandavison commented Oct 31, 2022 •

edited

Loading

dandavison commented Nov 7, 2022

sophiajt commented Nov 7, 2022

rgwood commented Nov 7, 2022

dandavison commented Nov 7, 2022

fdncred commented Nov 7, 2022 •

edited

Loading

dandavison commented Nov 7, 2022

rgwood commented Nov 7, 2022 •

edited

Loading

dandavison commented Nov 7, 2022

dandavison commented Nov 8, 2022

fdncred commented Nov 8, 2022

fdncred commented Nov 10, 2022

dandavison commented Nov 10, 2022

fdncred commented Nov 10, 2022

Declare input and output types of commands #6796

Declare input and output types of commands #6796

Conversation

dandavison commented Oct 18, 2022 • edited Loading

fdncred commented Oct 18, 2022

fdncred commented Oct 18, 2022

kubouch commented Oct 19, 2022

merelymyself commented Oct 19, 2022

dandavison commented Oct 19, 2022

dandavison commented Oct 19, 2022 • edited Loading

fdncred commented Oct 19, 2022

fdncred commented Oct 19, 2022

rgwood commented Oct 20, 2022

Aside: What's the difference between shapes and types?

OK, back to this PR

fdncred commented Oct 20, 2022

rgwood commented Oct 20, 2022

dandavison commented Oct 20, 2022

fdncred commented Oct 20, 2022 • edited Loading

dandavison commented Oct 20, 2022

webbedspace commented Oct 27, 2022 • edited Loading

rgwood commented Oct 28, 2022

dandavison commented Oct 30, 2022

dandavison commented Oct 30, 2022 • edited Loading

rgwood commented Oct 31, 2022

fdncred commented Oct 31, 2022

dandavison commented Oct 31, 2022 • edited Loading

dandavison commented Nov 7, 2022

sophiajt commented Nov 7, 2022

rgwood commented Nov 7, 2022

dandavison commented Nov 7, 2022

fdncred commented Nov 7, 2022 • edited Loading

dandavison commented Nov 7, 2022

rgwood commented Nov 7, 2022 • edited Loading

dandavison commented Nov 7, 2022

dandavison commented Nov 8, 2022

fdncred commented Nov 8, 2022

fdncred commented Nov 10, 2022

dandavison commented Nov 10, 2022

fdncred commented Nov 10, 2022

dandavison commented Oct 18, 2022 •

edited

Loading

dandavison commented Oct 19, 2022 •

edited

Loading

fdncred commented Oct 20, 2022 •

edited

Loading

webbedspace commented Oct 27, 2022 •

edited

Loading

dandavison commented Oct 30, 2022 •

edited

Loading

dandavison commented Oct 31, 2022 •

edited

Loading

fdncred commented Nov 7, 2022 •

edited

Loading

rgwood commented Nov 7, 2022 •

edited

Loading