accounting for columns #9

danielecook · 2015-10-31T22:27:44Z

It doesn't look like you can execute diffs when you've added or removed a column from a csv unless I am missing something. Perhaps this would be useful to implement? (Happy to help!)

larsyencken · 2015-11-02T21:27:46Z

Hi Daniel, well picked up! It's assumed both files have the same columns.

If they didn't, then I wonder how we'd want to represent it in the diff. Perhaps we could say that values changed from None to something, for every row (or vice versa).

Do you have a use case yourself for this?

danielecook · 2015-11-02T21:45:47Z

More just kind of playing around. The biggest case that I can think of would be where someone adds or removes a column in a csv file (or renames one). I wound up including the module and made a few minor changes (I hope you don't mind!) in a tool I am developing. Its a little utility for viewing csv diffs in git repos:

Right now its not very sophisticated. In the screenshot there - its treating a column rename as the addition and removal of a bunch of cells.

larsyencken · 2015-11-03T22:39:43Z

Nice! I think treating column renames as addition and removal wouldn't be too difficult. Detecting renames automatically might be more expensive though.

I don't have time to put together a patch this week, but contributions are more than welcome.

danielecook · 2015-11-03T23:13:59Z

Great - I'm a little flooded as well, but i'll see if I can find some time in the next few weeks. In terms of the patch data structure, how would modify it to account for added / removed columns? Setting the to to None?

larsyencken · 2015-11-09T21:05:03Z

Yeah, I reckon added and deleted columns go to and from None for every row,
and let's say you can't change the index columns for now. That keeps the
patch format basically the same.

Thinking about it, I've realised that the whole project is doing a
row-based diff, which is why operations on whole columns aren't so natural.
But row-level is pretty useful for tons of applications.

On Wed, 4 Nov 2015 at 00:14 Daniel E Cook notifications@github.com wrote:

Great - I'm a little flooded as well, but i'll see if I can find some time
in the next few weeks. In terms of the patch data structure, how would
modify it to account for added / removed columns? Setting the to to None?

—
Reply to this email directly or view it on GitHub
#9 (comment).

larsyencken · 2017-05-08T21:42:32Z

Gonna close this, and accept that we're row-based instead of column-based.

friederschueler · 2018-04-20T11:54:52Z

Hi,
I would like to reopen this ticket as currently csvdiff will raise an exception (KeyError in patch.py, record_diff, line 264) when your rhs (new file) has a column that does not exist in the lhs (base file) and vice versa.

I am using csvdiff to analyze the output of some database tests and there are some rare occasions where columns will be added, removed or renamed. As I am only comparing files I don't need a patch file to convert my files.

I was thinking about checking the header line of the csv for added and removed columns (renamed columns are removed and added under new name) and if there are any changes just skip csvdiff analysis completly. But then I discovered with only a little rewriting, you can fix the missing key error and the output is exactly what I looked for.

I am accepting that csvcompare is row-based, but still there shouldn't be a python error, when you compare files with different columns. What do you think?

I did a pull reqeust #34 and so far all the tests on the ci-server still work 😀

halsafar · 2019-06-25T18:49:49Z

Just ran into the same KeyError as @friederschueler explains. I understand the solution he proposes might not apply but a Python traceback is hardly a graceful way to die.

catarak mentioned this issue Mar 16, 2017

csvdiff is broken if you change number of columns BarnesFoundation/barnes-tms-extract#9

Closed

larsyencken closed this as completed May 8, 2017

friederschueler mentioned this issue Apr 20, 2018

Added a quick fix for comparing files with added, removed or changed columns #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accounting for columns #9

accounting for columns #9

danielecook commented Oct 31, 2015

larsyencken commented Nov 2, 2015

danielecook commented Nov 2, 2015

larsyencken commented Nov 3, 2015

danielecook commented Nov 3, 2015

larsyencken commented Nov 9, 2015

larsyencken commented May 8, 2017

friederschueler commented Apr 20, 2018 •

edited

Loading

halsafar commented Jun 25, 2019

accounting for columns #9

accounting for columns #9

Comments

danielecook commented Oct 31, 2015

larsyencken commented Nov 2, 2015

danielecook commented Nov 2, 2015

larsyencken commented Nov 3, 2015

danielecook commented Nov 3, 2015

larsyencken commented Nov 9, 2015

larsyencken commented May 8, 2017

friederschueler commented Apr 20, 2018 • edited Loading

halsafar commented Jun 25, 2019

friederschueler commented Apr 20, 2018 •

edited

Loading