-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle 'moved' files #4
Comments
[@mjherzog comment] In most move cases, I would expect many lower-level path elements to be the same so we are really looking at path similarities from file name back up the tree. |
[@MaJuRG comment] If a file has the same filename and sha1 but not same path, it should be considered moved. we will have to consider collisions in this case though. |
Moved files may be in either the Add or Removed buckets, so this would be a good place to start when thinking about a possible fix. |
* Added new method to recategorize 'added' and 'removed' files as 'moved'. * Modified related code to accommodate new category, including in 'Delta.to_dict()', which changes the JSON output. * CSV output will be modified once we’ve settled on the column headings and data to display for the 'moved' files. * Fixed failing tests, added three new tests. Signed-off-by: John M. Horan <johnmhoran@gmail.com>
@MaJuRG I’ve deleted the modified code in The
Perhaps a better approach would be a single new attribute denoting added/removed or new/old. I haven’t yet modified the CSV-generation code but will once we’ve settled on the column headings and data to display for the In addition to fixing a group of failing tests, I’ve added three new tests. I also plan to add some CSV tests once we’ve settled on what to display for the I’m available for an uberconf whenever it’s convenient for you. |
Ok, let me play around with this on my end. |
@johnmhoran After playing around with this, one of the first things I noticed is that we should have a single 'moved' object for an added/removed pair. It looks like we have two separate moved objects being created at this time. For how we do output, etc this is not ideal. our 'moved' delta object should have both the Hopefully that makes sense. |
@MaJuRG That was my original goal, but after working with my 3 test codebases I realized that pairing an It was in light of this potential complexity that I chose to treat each |
We should probably just focus on the easiest cases for now, and pair/create moved objects for those. This means, after we index, we should only look at places where Once we can reliable calculate moves for the most simplest of cases, we can think about how to handle these other scenarios. |
In other words, because we are indexing by So if we only look at the places where Yes, this will not catch all the possible moves, but it will get a good number of them. |
I'll refactor accordingly. |
@MaJuRG What data do you want to display in the CSV output for the Here's a sample excerpt from the current JSON output for
|
We can treat Honestly, all we really need is the
just to keep it consistent for now. |
And corresponding columns for the CSV output? |
You can add it to the end for now, so it doesn't clutter up the majority of the results. |
OK. |
* For now, we ignore 'added'/'removed' matches comprising more than one 'removed' and one 'added' file. * Modified structure of 'moved' Delta object. * Modified CSV output generation. * Fixed failing tests, added new tests. Signed-off-by: John M. Horan <johnmhoran@gmail.com>
* Convert utility function 'index_delta_files()' to DeltaCode method 'index_deltas()'. * Index Delta objects rather than their respective File objects. * Create new test codebases and add new tests. * Modify existing tests to include new 'moved' Delta as necessary. Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
* Convert utility function 'index_delta_files()' to DeltaCode method 'index_deltas()'. * Index Delta objects rather than their respective File objects. * Create new test codebases and add new tests. * Modify existing tests to include new 'moved' Delta as necessary. Signed-off-by: John M. Horan <johnmhoran@gmail.com>
@MaJuRG FYI, working on the dummy Have tried several approaches. For example, the following approach throws an error --
I think I'm stumbling on how to handle the |
@MaJuRG I've moved
Current test structure (similar errors with the alternative constructions of
|
@johnmhoran You are using dicts instead of Delta objects. You have to create actual Delta objects in order to access |
This test for example creates a single delta object:
|
Thanks @MaJuRG -- I thought I needed to create a "hand-made" |
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
* Converted 'check_moved()' from DeltaCode method to a utility. * Added four tests for 'check_moved()'. Signed-off-by: John M. Horan <johnmhoran@gmail.com>
The basics of this have been implemented with the merge of #42. Closing. |
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
This will need some thinking, but we will want some way to tell if a file has been 'moved' between the new and old scans of some codebase.
This means the sha1 should be matching, but the path would not be. There are also cases where the same file could be present in multiple locations.
We may need to index the files by sha1, similar to what we did in
determine_delta
The text was updated successfully, but these errors were encountered: