Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can list changes be more intelligent #97

Closed
ctmay4 opened this issue Feb 9, 2015 · 16 comments
Closed

Can list changes be more intelligent #97

ctmay4 opened this issue Feb 9, 2015 · 16 comments
Assignees

Comments

@ctmay4
Copy link

@ctmay4 ctmay4 commented Feb 9, 2015

I have a an entity with a List<List> that I am tracking changes on. I just tested an Object which has 837 List object

commit 3.0, author: me@here.com, Feb 9, 2015 2:25:48 PM
  changed object: StagingTable/54d909bb102cbbfdf54855ea
    list changed on 'rawRows' property: [
        (10).'[T0, N1, M0, SX, VALUE:IINOS]'>>'[T0, N1, M0, ERROR, VALUE:ERROR]', 
        (11).'[T0, N1, M0, ERROR, VALUE:ERROR]'>>'[T0, N2, M0, S0, VALUE:IIB]', 
        (12).'[T0, N2, M0, S0, VALUE:IIB]'>>'[T0, N2, M0, S1, VALUE:IIB]', 
        (13).'[T0, N2, M0, S1, VALUE:IIB]'>>'[T0, N2, M0, S2, VALUE:IIIB]', 
        (14).'[T0, N2, M0, S2, VALUE:IIIB]'>>'[T0, N2, M0, S3, VALUE:IIIC]', 
        (15).'[T0, N2, M0, S3, VALUE:IIIC]'>>'[T0, N2, M0, SX, VALUE:IINOS]', 
        (16).'[T0, N2, M0, SX, VALUE:IINOS]'>>'[T0, N2, M0, ERROR, VALUE:ERROR]',
        (17).'[T0, N2, M0, ERROR, VALUE:ERROR]'>>'[T0, N3, M0, S0, VALUE:IIC]', 
        (18).'[T0, N3, M0, S0, VALUE:IIC]'>>'[T0, N3, M0, S1, VALUE:IIC]', 
        (19).'[T0, N3, M0, S1, VALUE:IIC]'>>'[T0, N3, M0, S2, VALUE:IIIB]', 
        ...
        <SNIP>
        ...
        (837).'[TX, NX, M1NOS, S3, VALUE:IIINOS]'>>'[TX, NX, M1NOS, SX, VALUE:IIINOS]',
        (838).'[TX, NX, M1NOS, SX, VALUE:IIINOS]'>>'[TX, NX, M1NOS, ERROR, VALUE:ERROR]',
        (839).removed:'[TX, NX, M1NOS, ERROR, VALUE:ERROR]'
    ]

So removing a single line from the List causes an audit trail of 800+ entries. Instead, wouldn't it be better to have this be the result:

commit 3.0, author: me@here.com, Feb 9, 2015 2:25:48 PM
  changed object: StagingTable/54d909bb102cbbfdf54855ea
    list changed on 'rawRows' property: [
        (10).removed:'[T0, N1, M0, SX, VALUE:IINOS]'
    ]

For example, if I put the raw JSON out and delete the line, a command-line diff looks like this:

$ diff before.json after.json
44d43
<         [ "T0", "N1", "M0", "S3", "VALUE:IIIC" ],

Not sure how easy this would be to accomplish, but I do worry that if small changes are producing so much noise, what if I had a table with a very large list. The growth of the changelog would be too quick.

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 9, 2015

well, this is the issue concerning nested lists?
I'm aware that JaVers has only basic support for these structures.
You are right that it could produce noisy diffs.

The problem is that I don't know how to model these nested changes in the terms of Change.class
(this is a kind of core JaVers concept).
Maybe you have some suggestions?
What if some user creates Set<List<String>> or List<Set<String>>? It's not possible to cover all corner cases...

I think that the better way to solving this problem is to alter the data model and get rid of nested lists. I know that it could be painfull but honestly I don't have good idea how to model this kind of Changes. Take a look at Change.class hierarchy and maybe you come up with some solution.

@ctmay4
Copy link
Author

@ctmay4 ctmay4 commented Feb 11, 2015

I don't think altering the data model would help. I put together another Gist which I track changes on a List to get the nesting out of the test. The results were the same.

https://gist.github.com/ctmay4/1a328c800b459101d1c4

I basically create list of 100 Strings. Then I remove the 5th element and add new new elements at the 90th position.

For JaVers, I get the following output:

commit 2.0, author: me@here.com, Feb 11, 2015 2:54:32 PM
  changed object: com.imsweb.seerapi.DiffTesting$TestObject/1
    list changed on '_table' property: [(5).'A5'>>'A6', (6).'A6'>>'A7', (7).'A7'>>'A8', (8).'A8'>>'A9', (9).'A9'>>'A10', (10).'A10'>>'A11', (11).'A11'>>'A12', (12).'A12'>>'A13', (13).'A13'>>'A14', (14).'A14'>>'A15', (15).'A15'>>'A16', (16).'A16'>>'A17', (17).'A17'>>'A18', (18).'A18'>>'A19', (19).'A19'>>'A20', (20).'A20'>>'A21', (21).'A21'>>'A22', (22).'A22'>>'A23', (23).'A23'>>'A24', (24).'A24'>>'A25', (25).'A25'>>'A26', (26).'A26'>>'A27', (27).'A27'>>'A28', (28).'A28'>>'A29', (29).'A29'>>'A30', (30).'A30'>>'A31', (31).'A31'>>'A32', (32).'A32'>>'A33', (33).'A33'>>'A34', (34).'A34'>>'A35', (35).'A35'>>'A36', (36).'A36'>>'A37', (37).'A37'>>'A38', (38).'A38'>>'A39', (39).'A39'>>'A40', (40).'A40'>>'A41', (41).'A41'>>'A42', (42).'A42'>>'A43', (43).'A43'>>'A44', (44).'A44'>>'A45', (45).'A45'>>'A46', (46).'A46'>>'A47', (47).'A47'>>'A48', (48).'A48'>>'A49', (49).'A49'>>'A50', (50).'A50'>>'A51', (51).'A51'>>'A52', (52).'A52'>>'A53', (53).'A53'>>'A54', (54).'A54'>>'A55', (55).'A55'>>'A56', (56).'A56'>>'A57', (57).'A57'>>'A58', (58).'A58'>>'A59', (59).'A59'>>'A60', (60).'A60'>>'A61', (61).'A61'>>'A62', (62).'A62'>>'A63', (63).'A63'>>'A64', (64).'A64'>>'A65', (65).'A65'>>'A66', (66).'A66'>>'A67', (67).'A67'>>'A68', (68).'A68'>>'A69', (69).'A69'>>'A70', (70).'A70'>>'A71', (71).'A71'>>'A72', (72).'A72'>>'A73', (73).'A73'>>'A74', (74).'A74'>>'A75', (75).'A75'>>'A76', (76).'A76'>>'A77', (77).'A77'>>'A78', (78).'A78'>>'A79', (79).'A79'>>'A80', (80).'A80'>>'A81', (81).'A81'>>'A82', (82).'A82'>>'A83', (83).'A83'>>'A84', (84).'A84'>>'A85', (85).'A85'>>'A86', (86).'A86'>>'A87', (87).'A87'>>'A88', (88).'A88'>>'A89', (89).'A89'>>'A90', (90).'A90'>>'INSERT_A1', (91).'A91'>>'INSERT_A2', (92).'A92'>>'A91', (93).'A93'>>'A92', (94).'A94'>>'A93', (95).'A95'>>'A94', (96).'A96'>>'A95', (97).'A97'>>'A96', (98).'A98'>>'A97', (99).'A99'>>'A98', (100).added:'A99']
commit 1.0, author: me@here.com, Feb 11, 2015 2:54:32 PM
    new object: com.imsweb.seerapi.DiffTesting$TestObject/1

It is still listing almost every row since it doesn't "detect" a deletion.

My gist also includes an example from java-object-diff. Here is the output:

/ changed from com.imsweb.seerapi.DiffTesting$TestObject@d041cf to com.imsweb.seerapi.DiffTesting$TestObject@129a8472
/table changed from [A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32, A33, A34, A35, A36, A37, A38, A39, A40, A41, A42, A43, A44, A45, A46, A47, A48, A49, A50, A51, A52, A53, A54, A55, A56, A57, A58, A59, A60, A61, A62, A63, A64, A65, A66, A67, A68, A69, A70, A71, A72, A73, A74, A75, A76, A77, A78, A79, A80, A81, A82, A83, A84, A85, A86, A87, A88, A89, A90, A91, A92, A93, A94, A95, A96, A97, A98, A99] to [A0, A1, A2, A3, A4, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32, A33, A34, A35, A36, A37, A38, A39, A40, A41, A42, A43, A44, A45, A46, A47, A48, A49, A50, A51, A52, A53, A54, A55, A56, A57, A58, A59, A60, A61, A62, A63, A64, A65, A66, A67, A68, A69, A70, A71, A72, A73, A74, A75, A76, A77, A78, A79, A80, A81, A82, A83, A84, A85, A86, A87, A88, A89, A90, INSERT_A1, INSERT_A2, A91, A92, A93, A94, A95, A96, A97, A98, A99]
/table[INSERT_A1] changed from null to INSERT_A1
/table[INSERT_A2] changed from null to INSERT_A2
/table[A5] changed from A5 to null

It returns the entire before/after for the table, but also detects I added 2 rows and deleted one. Interestingly, it doesn't track which rows were added/deleted, just the values of those rows. I believe that library is comparing the values of the rows and assuming they are unique.

I just wanted to point out that this is not only an issue with complicated nested structures. A simple List exhibits the same behavior.

@bartoszwalacik bartoszwalacik self-assigned this Feb 12, 2015
@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 15, 2015

interesting, I'll investigate this issue shortly. Now we are in the middle of huge development in Javers, we are working on SQL repository, so our response times are bit slower, please be patient.

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 15, 2015

Think I got it. The reason why you get such big change list is:

// delete the 5th row
        base.getTable().remove(5);

Since we remove 5'th row, the rows with index 5 and higher are shifted down.
So all rows with index greater than 5 got new index (n-1).
When Javers compares Lists, it pays attention at item index ...

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 15, 2015

You are right, that JaVers could detect such shift and handle it in a more intelligent way

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 16, 2015

as a quick win solution, I can implement treatListsAsSets feature.
With this flag enabled, JaVers will compare Lists without paying attention to the item indexes.

Smart list comparing (including shift detection) requires more work.

So would this treatListsAsSets feature be suitable for your case?

@ctmay4
Copy link
Author

@ctmay4 ctmay4 commented Feb 17, 2015

It would possibly help, but I have not seen how Sets are handled. I assume this would be based on element equality? What if I have a Set with values

{ "1", "1", "1", "1", "2" }

How would it report the second "1" being removed? For my personal use-case there may be duplicates.

Once I get back to my office I will take a list at Set handling and let you know. Thanks again.

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 17, 2015

Yes, this simple treatListsAsSets solution would work well only for lists without duplicates.

I'm working on more intelligent algorithm for comparing lists, but it may take a while.
There would be performacne issues.
The main idea is to treat lists as queues, preserve ordering, but forget about exact values of item indexes.

@Kornel
Copy link
Member

@Kornel Kornel commented Feb 20, 2015

I'll add an implementation based on finding the min Levenshtein distance. I've got an POC algorithm ready which is both space and running time O(nm), where n and m is the length of list1 and list2 respectively. This can be optimised to be O(n) space and O(nm) time, will do a PR soon.

Kornel pushed a commit that referenced this issue Feb 20, 2015
Kornel Kielczewski
 - Added a smart comparison for lists based on the Levenshtein
  edit distance problem, see package-info for brief description
pszymczyk added a commit that referenced this issue Mar 11, 2015
pszymczyk added a commit that referenced this issue Mar 11, 2015
pszymczyk added a commit that referenced this issue Mar 11, 2015
@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Mar 11, 2015

@ctmay4, we have created smart list compara algorithm, based on Levenshtein distance.
It calculates smart diffs but could be slow for huge lists, how many elements you have in your lists usually?

@ctmay4
Copy link
Author

@ctmay4 ctmay4 commented Mar 12, 2015

The tables are usually really small (< 50 rows). We have a few tables that get as high as 4000 rows.

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Mar 14, 2015

great, so the new algorithm will be fine for you, i'am doing the release ...

Kornel pushed a commit that referenced this issue Mar 15, 2015
bartoszwalacik added a commit that referenced this issue Mar 16, 2015
#97 smare list compare
@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Mar 17, 2015

JaVers 1.1.1 is released, use the new Levenshtein algorithm for comparing lists,
http://javers.org/documentation/diff-configuration/#list-algorithms

@shivusajjan
Copy link

@shivusajjan shivusajjan commented Feb 26, 2020

I have a class Node and it has list of devices. For example
class Node {
String nodeCode;
String name;
List devices;
}

class Device {
String deviceCode;
String name;
}

I am comparing two Node objects using LEVENSHTEIN_DISTANCE algorithm and scenarios are mentioned below :

Senario - 1. Remove device1:

 Old Version:
         var device1 = Device("d1", "device1")
         var device2 = Device("d2", "device2")
         var devices = mutableListOf<Device>()
         devices.add(device1)
         devices.add(device2)
         val before = Node("n1", "node1", devices)  // Old Version

    
 Current Version:
       var newDevice1 = Device("d1", "device1")
       var newDevices = mutableListOf<Device>()
       newDevices.add(newDevice1)
       var after = Node("n1", "node1", newDevices) // New Version

 Compare both version:
 var diff = javers.compare(before, after)
 println("$diff")

Output:

  • changes on com.example.audit.Node/ :
    • 'devices' collection changes :
      1. 'com.example.audit.Node/#devices/1' removed

Senario - 2. Add device2:

  Old Version:
       var newDevice1 = Device("d1", "device1")
       var newDevices = mutableListOf<Device>()
       newDevices.add(newDevice1)
       var after = Node("n1", "node1", newDevices) // Old Version

  Current Version:
       var device1 = Device("d1", "device1") 
       var device2 = Device("d2", "device2")
       var devices = mutableListOf<Device>()
       devices.add(device1)
       devices.add(device2)
       val after1 = Node("n1", "node1", devices)  // New Version

  Compare both version:
  var diff = javers.compare(after, after1)

Output:

  • changes on com.example.audit.Node/ :
    • 'devices' collection changes :
      1. 'com.example.audit.Node/#devices/1' added

Senario - 3. Remove device1 from the position 0 in device list:

  Old Version:
       var device1 = Device("d1", "device1") 
       var device2 = Device("d2", "device2")
       var devices = mutableListOf<Device>()
       devices.add(device1)
       devices.add(device2)
       val after1 = Node("n1", "node1", devices)  //Old Version

 Current Version:
      device2 = Device("d2", "device2") 
      var newDevices2 = mutableListOf<Device>()
      newDevices2.add(device2)
      val after2 = Node("n1", "node1", newDevices2) // New Version

    Compare both version:
    diff = javers.compare(after1, after2)

Output:

  • changes on com.example.audit.Node/ :
    • 'devices' collection changes :
      1. 'com.example.audit.Node/#devices/1' removed

3rd scenario is giving wrong output because I removed device1 from the 0th position in the list but it is giving removed from the 1st position in the device list.

I want 3rd scenario output like:

  • changes on com.example.audit.Node/ :
    • 'devices' collection changes :
      0. 'com.example.audit.Node/#devices/0' removed

Is JaVers having limitations for list of custom object?

I missed 3rd scenario in my prototype. I went and integrated into my project and changes are moved to stage. This is blocked me to release. Could you please provide your thought on this? Thanks a lot

@bartoszwalacik
Copy link
Member

@bartoszwalacik bartoszwalacik commented Feb 26, 2020

@shivusajjan this issue was closed 5 years ago, why are you commenting it?

@shivusajjan
Copy link

@shivusajjan shivusajjan commented Feb 26, 2020

@shivusajjan this issue was closed 5 years ago, why are you commenting it?

LEVENSHTEIN_DISTANCE algorithm did not work for custom object list. For example List<Employee>. Still Index order getting changed when I removed object from starting or middle of the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants