-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add reinsert functionality #11
base: master
Are you sure you want to change the base?
[WIP] Add reinsert functionality #11
Conversation
Just an FYI: I tried running this on bits of the I have an idea for a fallback, but it is taking me a bit to find a good way to handle that. I will be adding that as a follow up commit as well with a different test case to support the fallback, which will add to the time it will take to impilent. |
@NickLaMuro hard to review because it's too many things in one PR (though I understand as you said you'd be moving things around and things are based on different PRs), so I'll break down my thoughts:
|
@Fryguy Just an FYI the testing and even part of the "extract into classes" bit were part of the other two PRs mentioned. Best to to use the compare link that I provided in the PR description to see what this PR specifically adds (which admittedly, is still a frick'n lot...) As I got going with the main purpose of this PR, I realized I had some more refactoring to do to share code between the two processes, but decided not to make too much more "PR-ception" than I already had before I started getting some form of feedback.
As mentioned here, I have one with
I am actually fine with that, though I wouldn't mind if you then just scrap all three of these since I don't really see a point in even Gemifying (aka: "making it more complex") if we don't want to add any further scope to the project. Honestly, the main reason I did this is I wanted a place to put the process that I would be taking with the effort of moving out he "object storage" and And even documented it in one of the commits: But it had a poor result, and was eventually scrapped. Besides, eventually |
Ah ok. Thanks for the history behind this. That makes more sense to me now. I still think that it deserves a separate tool/repo, and I still like gemifying and testing, and will likely merge those in some form.
That's cool. I'm interested to see where this goes, so feel free to rev on it in this PR / repo space. |
72d319c
to
8e0fc61
Compare
So a few force pushes later, and I have I think got this working... (still need to test it on my use case though) Worth noting: Since this is based off of the #9 and #10 PRs, I found a few mistakes in those that I have since fixed and updated in those respective branches, and did some (not all) of the moving around that I said I would with this PR. A few commits were moved into other branches, and I might make an entirely new PR for some of the
(note: commits with As another aside: All of this digging into Anyway, I will be testing out what I currently have against |
This could have been done as part of the "Gemify" effort, but the intent of that branch was to make as few changes to the core script before starting to make sweeping changes. By converting `CodeExtractor` to a `module`, that file and `lib/code_extractor/version.rb` can be loaded at the same time (yes, they couldn't before...), and it is just a better practice for dealing with namespaces in general. The `Runner` class was a necessary side effect, but I think it is mostly understandable what it's role is and should be a straight forward change. Fun Fact -------- The `attr_reader :extraction` is actually never used. Kept it as a part of `CodeExtractor::Runner`, but really, it is never used.
This is just a refactor commit to start the process of breaking out the pieces of the code so that we can more easily distinguish certain pieces of the script.
This commit separates the concerns of the runner and the GitProject to define methods that make sense for a generic GitProject. In the future, the intent is that this lib should be able to move code between two separate and already existing `GitProject` instances, and having this abstraction layer in place will allow code to be shared more easily. This also still attempts to retain the "script-like" nature of the original project in the `Runner`, but hopefully creating this generic `GitProject` construct will facilitate future improvements.
0bbc2c1
to
eabcfa0
Compare
A convenience task for running specs and do some diagnosis on the generated repos following the test run. Basically avoids cleaning up the sandbox git directory after each test.
Used to test certain commits exist in the destination repo for the current branch.
Used to reset the location of the destination_repo directory and clear out any related variables.
Used to clone a repo (first arg) to a new directory (second arg). `TestRepo#clone` is added as well to assit with this as well.
A instance method helper for TestRepo which performs a checkout out of a new branch based off another (or the current branch if no second arg is passed). Mirrors the usage of `git checkout -b [new_branch] [base_branch]`
Shared method for running DSL methods (basically calls just calls `instance_eval`)
Adds a helper function for handling merges via rugged. Of note: A `.checkout_head` call was added in addition to what was there previously. After doing a `commit` in rugged with a merge, the git tree is left in a odd state (probably since the merge functionality isn't "porcelain"), so this makes sure the git tree state is on HEAD.
This is effectively the "undo" feature of the original project. By that, it allows a user to take a portion (or all) of a previously extracted project, and re-insert it back into the original project from which it came from. This becomes tricky, as we need to filter out the existing commits that existed previously in the old ("target") repository. As a result, 3 `filter-branch` passes are done to achieve this change: 1. After determining all file names that existed for the current extractions, filter commits to down to ones that only include the files we wish to commit. In addition, create a script to move any of those files into a temporary directory, maintaining the directory structure as we go. 2. Take the temporary directory that was created and do a second filter that makes the new root of the project the temporary directory we created in the previous filter. 3. After adding the target repository as a remote, filter the commits out that already exist in that remote. Also in this step, a commit is re-written that is shared between both repos that will be the "re-inject" commit, which all of the injected commits will be based off of. From there, cherry-pick commits onto a new branch that is based off the existing HEAD of the current target remote's branch (most likely `master`) that will be receiving the "injected" commits. This allows the target repo to have the commits that were created "post extraction" that are based off a commit that re-adds the code in a state where it was originally extracted from.
Meant as a seed method for building up the same sample repo, and reducing boiler plate in a few of the tests. Currently isn't made terribly flexible, but if that is required, it can be added in the future.
Another helper method for breaking up some of the setup code for the "unextract" test, which handles the portion of the setup that replicates the process of merging the original extraction. Not the best implementation, since `@bare_repo_dir` gets set as part of the new method, and is used later in the test itself as the `target_remote` in the `extraction_hash`, but for now it is a quick solution.
Another helper method that includes boilerplate for updating the newly extracted repo with some new commits for use when "unextracting" Once again, instance variables were used here in a less than graceful fashion, but it allowed for moving a bit more code out of the main testing method that was mostly trivial.
Another helper methods for updating common changes to the extraction_hash when doing a re-insert. This, and the previous commits, will make more sense in a follow up commit handling a different edge case with `re-insert`.
This extracts the "unextract" (aka: "reinsert") test into it's own test, with the intent to later fill out other test cases in future commits. Note: The `create_base_repo` needed to be added to the `test_helper.rb` since it was needed by both test files. Not a huge change, but worth nothing.
Allows reinsert to work when code_extractor wasn't used to originally extract the commits
In .apply_new_commits_on_extracted_repo, add the ability to pass a block, which will then instead be used instead of what was used by default in the method. This allows for taking advantage of the new `.merge` DSL for representing so more real-world extracted repos
When using cherry-pick, it wouldn't handle merges properly, since it requires a `-m 1` argument for it to work properly. By switching to using `git rebase --root --onto`, we can take the branch that now is a "orphan" branch, and apply the commits onto the target remote's target branch. Left extra code in there to show other possibilities (since this isn't getting merged in away)
Improves the fallback approach for `.previously_extracted_commits` to check against the entirety of the commit message. For those cases where it is "fixes tests" or "Review Comments".
When updating the commit that will be the root commit for the injected commits, update the committer and author values so that it is more obvious who re-added the commit. As a result, also include the original committer/author info in the updated commit message. Formatting has been tweaked in this commit as well to make it a little easier to see what is being done.
Makes it easier to try test things out from post "pruning" step.
This was a mistake to have this. Since the commits will work just fine as they are, and don't need to have the content of master to do so, this can be removed entirely. This step is much simpler now, as it is just updating the commits we are bringing over and then applying them with a `git rebase --onto ...`.
The upstream_name should be used here, and the target name should be used for the filter.
This could be a "[FIXUP]" commit, but I think it is worth pointing out what I did here. Since there is multiple `git-filter-branch` calls, the original commit SHAs that existed are re-written after each. So delaying writing the updated commit message will only produce references to commits that don't exist since they were temporary. So moving `--msg-filter` to the first `git-filter-branch` was the first logical step. However, this caused another issue. Since we allow a fallback for finding the commit messages that existed in the previous repository, that would now be incorrect (in it's previous form) since it relied on matching the exact messages created. To handle this in most cases, and lines of `(transferred from ...@` are filtered out from the commit messages before comparing. * * * The remaining changes are just to support this change. - Adjust some method signatures to support this change. - Update `assert_commits` to confirm the commit messages are pointing to a proper commit - Add a `reference_repo_dir` and `reference_repo` to help with the `assert_commits` changes (set in one of the helper methods)
Prior, when doing `build_prune_script`, the branch was left at what ever state the previous step had left it at. However, this caused and issue when running `build_prune_script` as it was attempting to find the history for files that were just deleted in the `extract_branch` step (the branch that will be pushed to the `upstream_repo` that "deletes code"). This step ensures a `git checkout #{source_branch}` prior to building the script to ensure the files and their previous versions still exist. And additional tweak was also added to split up the initialization of the `file_and_ancestors` variable to make it easier to debug.
In a method above, "3 steps" was used to described how the prune script would be used. This change updates the second explaination to match.
This is a test to also include grafting as a feature (currently disabled). The intent is that a `graft_parent_commit` could be provided in the config yaml, and that would be used in the changes here. The steps in exampled in order: 1. Create a new branch with a HEAD at the "graft" point 2. Get back on the "inject_branch" (possibly unnecessary) 3. Rebase ("graft") the inject_branch onto the "old_branch" 4. Update the now grafted inject_branch to have the latest upstream changes Step two is only "possibly unnecessary" if making use of another argument of `git rebase` which I have yet to test. Besides that though, the changes are pretty minimal to work with, so this ends up being a pretty simple section to wrap your head around (once you wrap your head around getting to this point in the first place...). Of note: Since `inject_branch` is a branch with no parent commit for it's root commit, the `--root` option is used here for `git rebase`, which is important since the second arg is then the tip of `inject_branch` (the branch name), and not the base of the branch that you might normally provide when using `git rebase`. This is important since providing the base commit otherwise would "drop" that commit as part of the rebase, and cause some other issues as a result (merge conflicts).
👍 @NickLaMuro this looks like an awesome feature! |
VERY MUCH [WIP] AT THE MOMENT
I expect that I will be moving some commits around here and there, possibly into other branches to reduce this branch's size, and hopefully make the code look a lot better... but this is just a starting place for some feedback and discussion to get started.
Also of note: The first couple of commits are currently based on my previous unmerged PRs:
Simpler viewing can be found here:
https://github.com/NickLaMuro/code-extractor/compare/convert-to-module..extract-to-existing-repo
Changes
This is effectively the "undo" feature of the original project. By that, it allows a user to take a portion (or all) of a previously extracted project, and re-insert it back into the original project from which it came from.
This becomes tricky, as we need to filter out the existing commits that existed previously in the old ("target") repository. As a result, 3
filter-branch
passes are done to achieve this change:From there, cherry-pick commits onto a new branch that is based off the existing HEAD of the current target remote's branch (most likely
master
) that will be receiving the "injected" commits. This allows the target repo to have the commits that were created "post extraction" that are based off a commit that re-adds the code in a state where it was originally extracted from.