Skip to content
This repository has been archived by the owner on Feb 18, 2021. It is now read-only.

accounting for columns #9

Closed
danielecook opened this issue Oct 31, 2015 · 8 comments
Closed

accounting for columns #9

danielecook opened this issue Oct 31, 2015 · 8 comments

Comments

@danielecook
Copy link

It doesn't look like you can execute diffs when you've added or removed a column from a csv unless I am missing something. Perhaps this would be useful to implement? (Happy to help!)

@larsyencken
Copy link
Owner

Hi Daniel, well picked up! It's assumed both files have the same columns.

If they didn't, then I wonder how we'd want to represent it in the diff. Perhaps we could say that values changed from None to something, for every row (or vice versa).

Do you have a use case yourself for this?

@danielecook
Copy link
Author

More just kind of playing around. The biggest case that I can think of would be where someone adds or removes a column in a csv file (or renames one). I wound up including the module and made a few minor changes (I hope you don't mind!) in a tool I am developing. Its a little utility for viewing csv diffs in git repos:

screen shot 2015-11-02 at 3 37 45 pm

Right now its not very sophisticated. In the screenshot there - its treating a column rename as the addition and removal of a bunch of cells.

@larsyencken
Copy link
Owner

Nice! I think treating column renames as addition and removal wouldn't be too difficult. Detecting renames automatically might be more expensive though.

I don't have time to put together a patch this week, but contributions are more than welcome.

@danielecook
Copy link
Author

Great - I'm a little flooded as well, but i'll see if I can find some time in the next few weeks. In terms of the patch data structure, how would modify it to account for added / removed columns? Setting the to to None?

@larsyencken
Copy link
Owner

Yeah, I reckon added and deleted columns go to and from None for every row,
and let's say you can't change the index columns for now. That keeps the
patch format basically the same.

Thinking about it, I've realised that the whole project is doing a
row-based diff, which is why operations on whole columns aren't so natural.
But row-level is pretty useful for tons of applications.

On Wed, 4 Nov 2015 at 00:14 Daniel E Cook notifications@github.com wrote:

Great - I'm a little flooded as well, but i'll see if I can find some time
in the next few weeks. In terms of the patch data structure, how would
modify it to account for added / removed columns? Setting the to to None?


Reply to this email directly or view it on GitHub
#9 (comment).

@larsyencken
Copy link
Owner

Gonna close this, and accept that we're row-based instead of column-based.

@friederschueler
Copy link

friederschueler commented Apr 20, 2018

Hi,
I would like to reopen this ticket as currently csvdiff will raise an exception (KeyError in patch.py, record_diff, line 264) when your rhs (new file) has a column that does not exist in the lhs (base file) and vice versa.

I am using csvdiff to analyze the output of some database tests and there are some rare occasions where columns will be added, removed or renamed. As I am only comparing files I don't need a patch file to convert my files.

I was thinking about checking the header line of the csv for added and removed columns (renamed columns are removed and added under new name) and if there are any changes just skip csvdiff analysis completly. But then I discovered with only a little rewriting, you can fix the missing key error and the output is exactly what I looked for.

I am accepting that csvcompare is row-based, but still there shouldn't be a python error, when you compare files with different columns. What do you think?

I did a pull reqeust #34 and so far all the tests on the ci-server still work 😀

@halsafar
Copy link

Just ran into the same KeyError as @friederschueler explains. I understand the solution he proposes might not apply but a Python traceback is hardly a graceful way to die.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants