[WIP] Add reinsert functionality #11

NickLaMuro · 2019-11-13T02:43:39Z

VERY MUCH [WIP] AT THE MOMENT

I expect that I will be moving some commits around here and there, possibly into other branches to reduce this branch's size, and hopefully make the code look a lot better... but this is just a starting place for some feedback and discussion to get started.

Also of note: The first couple of commits are currently based on my previous unmerged PRs:

Simpler viewing can be found here:

https://github.com/NickLaMuro/code-extractor/compare/convert-to-module..extract-to-existing-repo

Changes

This is effectively the "undo" feature of the original project. By that, it allows a user to take a portion (or all) of a previously extracted project, and re-insert it back into the original project from which it came from.

This becomes tricky, as we need to filter out the existing commits that existed previously in the old ("target") repository. As a result, 3 filter-branch passes are done to achieve this change:

After determining all file names that existed for the current extractions, filter commits to down to ones that only include the files we wish to commit. In addition, create a script to move any of those files into a temporary directory, maintaining the directory structure as we go.
Take the temporary directory that was created and do a second filter that makes the new root of the project the temporary directory we created in the previous filter.
After adding the target repository as a remote, filter the commits out that already exist in that remote. Also in this step, a commit is re-written that is shared between both repos that will be the "re-inject" commit, which all of the injected commits will be based off of.

From there, cherry-pick commits onto a new branch that is based off the existing HEAD of the current target remote's branch (most likely master) that will be receiving the "injected" commits. This allows the target repo to have the commits that were created "post extraction" that are based off a commit that re-adds the code in a state where it was originally extracted from.

NickLaMuro · 2019-11-14T21:51:21Z

Just an FYI: I tried running this on bits of the manageiq-gems-pending code that I planned on running this on, and unfortunately part of it doesn't work since that project was extracted prior to this project existing, and some assumptions were made to filter out existing commits that doesn't work without the commit messages including the (transferred from ManageIQ/manageiq@-----) bit.

I have an idea for a fallback, but it is taking me a bit to find a good way to handle that. I will be adding that as a follow up commit as well with a different test case to support the fallback, which will add to the time it will take to impilent.

Fryguy · 2019-11-14T22:06:15Z

@NickLaMuro hard to review because it's too many things in one PR (though I understand as you said you'd be moving things around and things are based on different PRs), so I'll break down my thoughts:

I am 👍 on gemifying...I'll review the other PR.
I am 👍 on testing, but that should be a standalone PR (also hard to tell which tests are for the old stuff or the new stuff)
I am 〰️ on refactoring the original code. It was pretty simple (77 lines) to begin with, and very easy to follow, so moving things out into classes seems like overkill for a simple tool, unless those classes are going to be reused for the new functionality.
I am not sure about reinsert functionality. I can't think of a use-case for it, and even so I'm not sure why we need the complicated logic of matching the tree history vs just grafting in the repo's master branch (or master branch plus a commit to moves things into the right directory locations). I'm more trying to understand what problem you are trying to solve by introducing this direction.

Separately, I'm not sure this feature belongs in this repo. The code-extractor is very single responsibility at the moment. I could see a separate code-inserter tool in the same vein as this repo.

NickLaMuro · 2019-11-14T22:29:39Z

I am 👍 on testing, but that should be a standalone PR (also hard to tell which tests are for the old stuff or the new stuff)

@Fryguy Just an FYI the testing and even part of the "extract into classes" bit were part of the other two PRs mentioned. Best to to use the compare link that I provided in the PR description to see what this PR specifically adds (which admittedly, is still a frick'n lot...)

As I got going with the main purpose of this PR, I realized I had some more refactoring to do to share code between the two processes, but decided not to make too much more "PR-ception" than I already had before I started getting some form of feedback.

I am not sure about reinsert functionality. I can't think of a use-case for it...

As mentioned here, I have one with manageiq-gems-pending and putting some code back into ManageIQ/manageiq. But more on that below.

Separately, I'm not sure this feature belongs in this repo. The code-extractor is very single responsibility at the moment. I could see a separate code-inserter tool in the same vein as this repo.

I am actually fine with that, though I wouldn't mind if you then just scrap all three of these since I don't really see a point in even Gemifying (aka: "making it more complex") if we don't want to add any further scope to the project.

Honestly, the main reason I did this is I wanted a place to put the process that I would be taking with the effort of moving out he "object storage" and miq_ftp_lib.rb that should not exist in the manageiq-gems-pending repo, but don't really deserve standalone gems either. That is the current use case I am working on, but I also attempted something similar past in this PR:

Extract vmdb-logger code from manageiq-gems-pending and insert it into manageiq ManageIQ/manageiq#15358

And even documented it in one of the commits:

ManageIQ/manageiq@3b79b79

But it had a poor result, and was eventually scrapped. Besides, eventually ManageIQ/managiq-loggers came into being, but that was almost 1.5 years later. However, I do realize this feature has very limited uses, but I really don't want others to have to try and figure this out for themselves like I did if we ever have to do it again, and codifying it is the easiest way to do that I think.

Fryguy · 2019-11-14T22:33:44Z

the main reason I did this is I wanted a place to put the process that I would be taking with the effort of moving out he "object storage" and miq_ftp_lib.rb that should not exist in the manageiq-gems-pending repo, but don't really deserve standalone gems either. That is the current use case I am working on

Ah ok. Thanks for the history behind this. That makes more sense to me now. I still think that it deserves a separate tool/repo, and I still like gemifying and testing, and will likely merge those in some form.

I really don't want others to have to try and figure this out for themselves like I did if we ever have to do it again, and codifying it is the easiest way to do that I think.

That's cool. I'm interested to see where this goes, so feel free to rev on it in this PR / repo space.

NickLaMuro · 2019-11-20T00:21:46Z

So a few force pushes later, and I have I think got this working... (still need to test it on my use case though)

Worth noting: Since this is based off of the #9 and #10 PRs, I found a few mistakes in those that I have since fixed and updated in those respective branches, and did some (not all) of the moving around that I said I would with this PR. A few commits were moved into other branches, and I might make an entirely new PR for some of the test_helper.rb/Rakefile changes I have done and re-ordered already. Hard to see with GitHub, since they sort commits by date and not ancestor order, but here is a pretty git log of the current order:

$ git log ...
* 8e0fc61 Nick LaMuro - (HEAD -> extract-to-existing-repo, origin/extract-to-existing-repo) [CodeExtractor] Support merge commits (23 hours ago)
* 52d2a40 Nick LaMuro - [CodeExtractorReinsertTest] Support custom commits (23 hours ago)
* 2059b75 Nick LaMuro - Add Fallback for when code_extractor wasn't used (30 hours ago)
* cef82a2 Nick LaMuro - Add code_extractor_reinsert_test.rb (5 days ago)
* dcc269f Nick LaMuro - [code_extractor_test.rb] Add .update_extraction_hash (5 days ago)
* c20177b Nick LaMuro - [code_extractor_test] Add .apply_new_commits_on_extracted_repo (5 days ago)
* d4c2e80 Nick LaMuro - [code_extractor_test] Add .perform_merges_of_extracted_code (5 days ago)
* a77c492 Nick LaMuro - [code_extractor_test.rb] Add .create_base_repo (5 days ago)
* 2943b5b Nick LaMuro - [CodeExtractor] Add reinsert functionality (7 days ago)
* 06f02bf Nick LaMuro - [test_helper.rb] Add TestRepo.merge (24 hours ago)
* 49d5de8 Nick LaMuro - [test_helper.rb] Add TestRepo#execute (24 hours ago)
* 5b35c07 Nick LaMuro - [test_helper.rb] Add TestRepo#checkout_b (7 days ago)
* cacf177 Nick LaMuro - [test_helper.rb] Add TestRepo.clone_at helper (7 days ago)
* 5351667 Nick LaMuro - [test_helper.rb] Add .set_destination_dir (7 days ago)
* eaaca5c Nick LaMuro - [test_helper.rb] Add assert_commits helper (7 days ago)
* e55817d Nick LaMuro - [Rakefile] Add test:debug task (7 days ago)
* 19ad15f Nick LaMuro - (origin/convert-to-module, convert-to-module) [CodeExtractor] Break out GitProject (3 weeks ago)
* 6020bdf Nick LaMuro - [CodeExtractor] Break out Config to it's own class (3 weeks ago)
* 12da249 Nick LaMuro - [CodeExtractor] Convert to module (3 weeks ago)
* 75534dc Nick LaMuro - (origin/gemify, gemify) [README.md] Update install/usage instructions (3 weeks ago)

(note: commits with [Rakefile] and [test_helper.rb] would get their own PR)

As another aside: All of this digging into git I have done for this PR has resulted in me finding out a better way to rebase "onto" branches I am working off of and moving commits between the two (the tl;dr for the lengthy blog post I would write about it is git rebase --onto).

Anyway, I will be testing out what I currently have against manageiq-gems-pending to so how that goes. Hopefully I have covered all the bases at this point.

This could have been done as part of the "Gemify" effort, but the intent of that branch was to make as few changes to the core script before starting to make sweeping changes. By converting `CodeExtractor` to a `module`, that file and `lib/code_extractor/version.rb` can be loaded at the same time (yes, they couldn't before...), and it is just a better practice for dealing with namespaces in general. The `Runner` class was a necessary side effect, but I think it is mostly understandable what it's role is and should be a straight forward change. Fun Fact -------- The `attr_reader :extraction` is actually never used. Kept it as a part of `CodeExtractor::Runner`, but really, it is never used.

This is just a refactor commit to start the process of breaking out the pieces of the code so that we can more easily distinguish certain pieces of the script.

This commit separates the concerns of the runner and the GitProject to define methods that make sense for a generic GitProject. In the future, the intent is that this lib should be able to move code between two separate and already existing `GitProject` instances, and having this abstraction layer in place will allow code to be shared more easily. This also still attempts to retain the "script-like" nature of the original project in the `Runner`, but hopefully creating this generic `GitProject` construct will facilitate future improvements.

A convenience task for running specs and do some diagnosis on the generated repos following the test run. Basically avoids cleaning up the sandbox git directory after each test.

Used to test certain commits exist in the destination repo for the current branch.

Used to reset the location of the destination_repo directory and clear out any related variables.

Used to clone a repo (first arg) to a new directory (second arg). `TestRepo#clone` is added as well to assit with this as well.

A instance method helper for TestRepo which performs a checkout out of a new branch based off another (or the current branch if no second arg is passed). Mirrors the usage of `git checkout -b [new_branch] [base_branch]`

Shared method for running DSL methods (basically calls just calls `instance_eval`)

Adds a helper function for handling merges via rugged. Of note: A `.checkout_head` call was added in addition to what was there previously. After doing a `commit` in rugged with a merge, the git tree is left in a odd state (probably since the merge functionality isn't "porcelain"), so this makes sure the git tree state is on HEAD.

This is effectively the "undo" feature of the original project. By that, it allows a user to take a portion (or all) of a previously extracted project, and re-insert it back into the original project from which it came from. This becomes tricky, as we need to filter out the existing commits that existed previously in the old ("target") repository. As a result, 3 `filter-branch` passes are done to achieve this change: 1. After determining all file names that existed for the current extractions, filter commits to down to ones that only include the files we wish to commit. In addition, create a script to move any of those files into a temporary directory, maintaining the directory structure as we go. 2. Take the temporary directory that was created and do a second filter that makes the new root of the project the temporary directory we created in the previous filter. 3. After adding the target repository as a remote, filter the commits out that already exist in that remote. Also in this step, a commit is re-written that is shared between both repos that will be the "re-inject" commit, which all of the injected commits will be based off of. From there, cherry-pick commits onto a new branch that is based off the existing HEAD of the current target remote's branch (most likely `master`) that will be receiving the "injected" commits. This allows the target repo to have the commits that were created "post extraction" that are based off a commit that re-adds the code in a state where it was originally extracted from.

Meant as a seed method for building up the same sample repo, and reducing boiler plate in a few of the tests. Currently isn't made terribly flexible, but if that is required, it can be added in the future.

Another helper method for breaking up some of the setup code for the "unextract" test, which handles the portion of the setup that replicates the process of merging the original extraction. Not the best implementation, since `@bare_repo_dir` gets set as part of the new method, and is used later in the test itself as the `target_remote` in the `extraction_hash`, but for now it is a quick solution.

Another helper method that includes boilerplate for updating the newly extracted repo with some new commits for use when "unextracting" Once again, instance variables were used here in a less than graceful fashion, but it allowed for moving a bit more code out of the main testing method that was mostly trivial.

Another helper methods for updating common changes to the extraction_hash when doing a re-insert. This, and the previous commits, will make more sense in a follow up commit handling a different edge case with `re-insert`.

This extracts the "unextract" (aka: "reinsert") test into it's own test, with the intent to later fill out other test cases in future commits. Note: The `create_base_repo` needed to be added to the `test_helper.rb` since it was needed by both test files. Not a huge change, but worth nothing.

Allows reinsert to work when code_extractor wasn't used to originally extract the commits

In .apply_new_commits_on_extracted_repo, add the ability to pass a block, which will then instead be used instead of what was used by default in the method. This allows for taking advantage of the new `.merge` DSL for representing so more real-world extracted repos

When using cherry-pick, it wouldn't handle merges properly, since it requires a `-m 1` argument for it to work properly. By switching to using `git rebase --root --onto`, we can take the branch that now is a "orphan" branch, and apply the commits onto the target remote's target branch. Left extra code in there to show other possibilities (since this isn't getting merged in away)

Improves the fallback approach for `.previously_extracted_commits` to check against the entirety of the commit message. For those cases where it is "fixes tests" or "Review Comments".

When updating the commit that will be the root commit for the injected commits, update the committer and author values so that it is more obvious who re-added the commit. As a result, also include the original committer/author info in the updated commit message. Formatting has been tweaked in this commit as well to make it a little easier to see what is being done.

Makes it easier to try test things out from post "pruning" step.

This was a mistake to have this. Since the commits will work just fine as they are, and don't need to have the content of master to do so, this can be removed entirely. This step is much simpler now, as it is just updating the commits we are bringing over and then applying them with a `git rebase --onto ...`.

The upstream_name should be used here, and the target name should be used for the filter.

This could be a "[FIXUP]" commit, but I think it is worth pointing out what I did here. Since there is multiple `git-filter-branch` calls, the original commit SHAs that existed are re-written after each. So delaying writing the updated commit message will only produce references to commits that don't exist since they were temporary. So moving `--msg-filter` to the first `git-filter-branch` was the first logical step. However, this caused another issue. Since we allow a fallback for finding the commit messages that existed in the previous repository, that would now be incorrect (in it's previous form) since it relied on matching the exact messages created. To handle this in most cases, and lines of `(transferred from ...@` are filtered out from the commit messages before comparing. * * * The remaining changes are just to support this change. - Adjust some method signatures to support this change. - Update `assert_commits` to confirm the commit messages are pointing to a proper commit - Add a `reference_repo_dir` and `reference_repo` to help with the `assert_commits` changes (set in one of the helper methods)

Prior, when doing `build_prune_script`, the branch was left at what ever state the previous step had left it at. However, this caused and issue when running `build_prune_script` as it was attempting to find the history for files that were just deleted in the `extract_branch` step (the branch that will be pushed to the `upstream_repo` that "deletes code"). This step ensures a `git checkout #{source_branch}` prior to building the script to ensure the files and their previous versions still exist. And additional tweak was also added to split up the initialization of the `file_and_ancestors` variable to make it easier to debug.

In a method above, "3 steps" was used to described how the prune script would be used. This change updates the second explaination to match.

This is a test to also include grafting as a feature (currently disabled). The intent is that a `graft_parent_commit` could be provided in the config yaml, and that would be used in the changes here. The steps in exampled in order: 1. Create a new branch with a HEAD at the "graft" point 2. Get back on the "inject_branch" (possibly unnecessary) 3. Rebase ("graft") the inject_branch onto the "old_branch" 4. Update the now grafted inject_branch to have the latest upstream changes Step two is only "possibly unnecessary" if making use of another argument of `git rebase` which I have yet to test. Besides that though, the changes are pretty minimal to work with, so this ends up being a pretty simple section to wrap your head around (once you wrap your head around getting to this point in the first place...). Of note: Since `inject_branch` is a branch with no parent commit for it's root commit, the `--root` option is used here for `git rebase`, which is important since the second arg is then the tip of `inject_branch` (the branch name), and not the base of the branch that you might normally provide when using `git rebase`. This is important since providing the base commit otherwise would "drop" that commit as part of the rebase, and cause some other issues as a result (merge conflicts).

juliancheal · 2020-02-07T00:11:20Z

👍 @NickLaMuro this looks like an awesome feature!

NickLaMuro force-pushed the extract-to-existing-repo branch from 72d319c to 8e0fc61 Compare November 20, 2019 00:06

NickLaMuro mentioned this pull request Nov 22, 2019

Re-insert MiqFileStorage and MiqFTPLib ManageIQ/manageiq#19547

Closed

NickLaMuro added 3 commits December 16, 2019 17:17

[CodeExtractor] Break out Config to it's own class

72ba047

This is just a refactor commit to start the process of breaking out the pieces of the code so that we can more easily distinguish certain pieces of the script.

NickLaMuro force-pushed the extract-to-existing-repo branch from 0bbc2c1 to eabcfa0 Compare December 17, 2019 00:52

NickLaMuro added 19 commits December 16, 2019 18:54

[Rakefile] Add test:debug task

eecb699

A convenience task for running specs and do some diagnosis on the generated repos following the test run. Basically avoids cleaning up the sandbox git directory after each test.

[test_helper.rb] Add assert_commits helper

8754a0e

Used to test certain commits exist in the destination repo for the current branch.

[test_helper.rb] Add .set_destination_dir

fbfac90

Used to reset the location of the destination_repo directory and clear out any related variables.

[test_helper.rb] Add TestRepo.clone_at helper

f909737

Used to clone a repo (first arg) to a new directory (second arg). `TestRepo#clone` is added as well to assit with this as well.

[test_helper.rb] Add TestRepo#checkout_b

365eff2

A instance method helper for TestRepo which performs a checkout out of a new branch based off another (or the current branch if no second arg is passed). Mirrors the usage of `git checkout -b [new_branch] [base_branch]`

[test_helper.rb] Add TestRepo#execute

3fbac22

Shared method for running DSL methods (basically calls just calls `instance_eval`)

[code_extractor_test.rb] Add .create_base_repo

75bbfb6

Meant as a seed method for building up the same sample repo, and reducing boiler plate in a few of the tests. Currently isn't made terribly flexible, but if that is required, it can be added in the future.

[code_extractor_test.rb] Add .update_extraction_hash

c50bfd0

Another helper methods for updating common changes to the extraction_hash when doing a re-insert. This, and the previous commits, will make more sense in a follow up commit handling a different edge case with `re-insert`.

Add Fallback for when code_extractor wasn't used

5665198

Allows reinsert to work when code_extractor wasn't used to originally extract the commits

[CodeExtractor] Better .previously_extracted_commits

361b72c

Improves the fallback approach for `.previously_extracted_commits` to check against the entirety of the commit message. For those cases where it is "fixes tests" or "Review Comments".

[FIXUP][CodeExtractor] spacing for git-filter-branch cmd

7de0f9d

NickLaMuro added 10 commits December 16, 2019 18:54

[CodeExtractor] Use inject_branch for injecting commits

f0eed26

Makes it easier to try test things out from post "pruning" step.

[FIXUP] Fix inject "initial commit"

f7b6e3f

The upstream_name should be used here, and the target name should be used for the filter.

[FIXUP] Clean up debugging puts

af8f2f5

[FIXUP] Moar better commit_msg_filter

73fd210

[CodeExtractor] Better/Consistent documentation

bd0b361

In a method above, "3 steps" was used to described how the prune script would be used. This change updates the second explaination to match.

[CodeExtractor] Document committer_data info

b6720f5

NickLaMuro mentioned this pull request Mar 5, 2021

Inject postgres admin into appliance console ManageIQ/manageiq-appliance_console#153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add reinsert functionality #11

[WIP] Add reinsert functionality #11

NickLaMuro commented Nov 13, 2019 •

edited

Loading

NickLaMuro commented Nov 14, 2019

Fryguy commented Nov 14, 2019

NickLaMuro commented Nov 14, 2019 •

edited

Loading

Fryguy commented Nov 14, 2019

NickLaMuro commented Nov 20, 2019

juliancheal commented Feb 7, 2020

[WIP] Add reinsert functionality #11

Are you sure you want to change the base?

[WIP] Add reinsert functionality #11

Conversation

NickLaMuro commented Nov 13, 2019 • edited Loading

Changes

NickLaMuro commented Nov 14, 2019

Fryguy commented Nov 14, 2019

NickLaMuro commented Nov 14, 2019 • edited Loading

Fryguy commented Nov 14, 2019

NickLaMuro commented Nov 20, 2019

juliancheal commented Feb 7, 2020

NickLaMuro commented Nov 13, 2019 •

edited

Loading

NickLaMuro commented Nov 14, 2019 •

edited

Loading