Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify reject algorithm for identical elements #8446

Merged
merged 2 commits into from
Mar 14, 2023

Conversation

tcoratger
Copy link
Contributor

Description

The correction made here concerns the issue #8431. Indeed, the algorithm initially proposed to remove elements of a vector performed a loop with remove and an incident therefore appeared when several values were equal because the deletion was done outside the length of the vector:

let mut found = false;
for (i, col) in cols.clone().iter().enumerate() {
    if col == col_name {
        cols.remove(i);
        vals.remove(i);
        found = true;
    }
}

Then, [[a, a]; [1, 2]] | reject a: gave thread 'main' panicked at 'removal index (is 1) should be < len (is 1)', crates/nu-protocol/src/value/mod.rs:1213:54.

The proposed correction is therefore the implementation of the retain_mut utility dedicated to this functionality.

let mut found = false;
let mut index = 0;
cols.retain_mut(|col| {
    if col == col_name {
        found = true;
        vals.remove(index);
        false
    } else {
        index += 1;
        true
    }
});

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #8446 (3c11558) into main (c7583ec) will increase coverage by 0.37%.
The diff coverage is 100.00%.

❗ Current head 3c11558 differs from pull request most recent head fe11ff4. Consider uploading reports for the commit fe11ff4 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #8446      +/-   ##
==========================================
+ Coverage   68.13%   68.50%   +0.37%     
==========================================
  Files         620      620              
  Lines       99723    99726       +3     
==========================================
+ Hits        67950    68322     +372     
+ Misses      31773    31404     -369     
Impacted Files Coverage Δ
crates/nu-protocol/src/value/mod.rs 56.57% <100.00%> (+0.04%) ⬆️

... and 2 files with indirect coverage changes

@fdncred
Copy link
Collaborator

fdncred commented Mar 14, 2023

Seems reasonable. Thanks!

Copy link
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for stopping the panic! Would you mind adding a test that reproduces the old panic so we can be sure that we don't run into that again?

@tcoratger
Copy link
Contributor Author

@sholderbach Tests are added :)

@sholderbach
Copy link
Member

Awesome!

@sholderbach sholderbach merged commit 0bd4d27 into nushell:main Mar 14, 2023
sholderbach added a commit to sholderbach/nushell that referenced this pull request Oct 29, 2023
For this we make the assumption that the keys are unique.

Note: this does not hold universally, yet. Implementations in
`nu-protocol` use an all key removal on the basis of retain in certain
places. (e.g. used by `reject` under the hood)

See nushell#8446 for this defensiveness
sholderbach added a commit that referenced this pull request Nov 1, 2023
# Description
Pretty much all operations/commands in Nushell assume that the column
names/keys in a record and thus also in a table (which consists of a
list of records) are unique.
Access through a string-like cell path should refer to a single column
or key/value pair and our output through `table` will only show the last
mention of a repeated column name.

```nu
[[a a]; [1 2]]
╭─#─┬─a─╮
│ 0 │ 2 │
╰───┴───╯
```

While the record parsing already either errors with the
`ShellError::ColumnDefinedTwice` or silently overwrites the first
occurence with the second occurence, the table literal syntax `[[header
columns]; [val1 val2]]` currently still allowed the creation of tables
(and internally records with more than one entry with the same name.

This is not only confusing, but also breaks some assumptions around how
we can efficiently perform operations or in the past lead to outright
bugs (e.g. #8431 fixed by #8446).

This PR proposes to make this an error.
After this change another hole which allowed the construction of records
with non-unique column names will be plugged.

## Parts
- Fix `SE::ColumnDefinedTwice` error code
- Remove previous tests permitting duplicate columns
- Deny duplicate column in table literal eval
- Deny duplicate column in const eval
- Deny duplicate column in `from nuon`

# User-Facing Changes
`[[a a]; [1 2]]` will now return an error:

```
Error: nu::shell::column_defined_twice

  × Record field or table column used twice
   ╭─[entry #2:1:1]
 1 │ [[a a]; [1 2]]
   ·   ┬ ┬
   ·   │ ╰── field redefined here
   ·   ╰── field first defined here
   ╰────
```

this may under rare circumstances block code from evaluating.

Furthermore this makes some NUON files invalid if they previously
contained tables with repeated column names.

# Tests + Formatting
Added tests for each of the different evaluation paths that materialize
tables.
sholderbach added a commit to sholderbach/nushell that referenced this pull request Nov 2, 2023
This breaks two tests

Using `Record::remove` assumes that the keys are all unique. The current
implementation since nushell#8446 uses the less efficient way of going over the
whole record with `.retain` to deal with repeated keys.

Culprit in the example is the table literal
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description
Pretty much all operations/commands in Nushell assume that the column
names/keys in a record and thus also in a table (which consists of a
list of records) are unique.
Access through a string-like cell path should refer to a single column
or key/value pair and our output through `table` will only show the last
mention of a repeated column name.

```nu
[[a a]; [1 2]]
╭─#─┬─a─╮
│ 0 │ 2 │
╰───┴───╯
```

While the record parsing already either errors with the
`ShellError::ColumnDefinedTwice` or silently overwrites the first
occurence with the second occurence, the table literal syntax `[[header
columns]; [val1 val2]]` currently still allowed the creation of tables
(and internally records with more than one entry with the same name.

This is not only confusing, but also breaks some assumptions around how
we can efficiently perform operations or in the past lead to outright
bugs (e.g. nushell#8431 fixed by nushell#8446).

This PR proposes to make this an error.
After this change another hole which allowed the construction of records
with non-unique column names will be plugged.

## Parts
- Fix `SE::ColumnDefinedTwice` error code
- Remove previous tests permitting duplicate columns
- Deny duplicate column in table literal eval
- Deny duplicate column in const eval
- Deny duplicate column in `from nuon`

# User-Facing Changes
`[[a a]; [1 2]]` will now return an error:

```
Error: nu::shell::column_defined_twice

  × Record field or table column used twice
   ╭─[entry nushell#2:1:1]
 1 │ [[a a]; [1 2]]
   ·   ┬ ┬
   ·   │ ╰── field redefined here
   ·   ╰── field first defined here
   ╰────
```

this may under rare circumstances block code from evaluating.

Furthermore this makes some NUON files invalid if they previously
contained tables with repeated column names.

# Tests + Formatting
Added tests for each of the different evaluation paths that materialize
tables.
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
Pretty much all operations/commands in Nushell assume that the column
names/keys in a record and thus also in a table (which consists of a
list of records) are unique.
Access through a string-like cell path should refer to a single column
or key/value pair and our output through `table` will only show the last
mention of a repeated column name.

```nu
[[a a]; [1 2]]
╭─#─┬─a─╮
│ 0 │ 2 │
╰───┴───╯
```

While the record parsing already either errors with the
`ShellError::ColumnDefinedTwice` or silently overwrites the first
occurence with the second occurence, the table literal syntax `[[header
columns]; [val1 val2]]` currently still allowed the creation of tables
(and internally records with more than one entry with the same name.

This is not only confusing, but also breaks some assumptions around how
we can efficiently perform operations or in the past lead to outright
bugs (e.g. nushell#8431 fixed by nushell#8446).

This PR proposes to make this an error.
After this change another hole which allowed the construction of records
with non-unique column names will be plugged.

## Parts
- Fix `SE::ColumnDefinedTwice` error code
- Remove previous tests permitting duplicate columns
- Deny duplicate column in table literal eval
- Deny duplicate column in const eval
- Deny duplicate column in `from nuon`

# User-Facing Changes
`[[a a]; [1 2]]` will now return an error:

```
Error: nu::shell::column_defined_twice

  × Record field or table column used twice
   ╭─[entry nushell#2:1:1]
 1 │ [[a a]; [1 2]]
   ·   ┬ ┬
   ·   │ ╰── field redefined here
   ·   ╰── field first defined here
   ╰────
```

this may under rare circumstances block code from evaluating.

Furthermore this makes some NUON files invalid if they previously
contained tables with repeated column names.

# Tests + Formatting
Added tests for each of the different evaluation paths that materialize
tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reject panic when given columns with the same name
3 participants