Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using one repo per thread results in errors #355

Closed
taquitos opened this issue Mar 9, 2018 · 20 comments · Fixed by #673
Closed

Using one repo per thread results in errors #355

taquitos opened this issue Mar 9, 2018 · 20 comments · Fixed by #673

Comments

@taquitos
Copy link
Contributor

taquitos commented Mar 9, 2018

Lots of warning: conflicting chdir during another chdir block when I try to do anything with individual git objects.

I have multiple repos that I've opened using Git.open() and I can only operate on one at a time or else I get warnings

@rvodden
Copy link
Contributor

rvodden commented Mar 29, 2018

Could you expand a bit please? What are the warnings you are getting? Could you perhaps share your code (or an extract) so that we can replicate?

@taquitos
Copy link
Contributor Author

Yup! We're currently checking out a number of git repositories, using Git.open() on them, and doing read-only type actions on them eg: getting a list of branches, or the most recent commit sha. These action are happening on multiple threads at various times, and we seem to be encountering a lot of warning: conflicting chdir during another chdir block. We've added a mutex so that only one git repo can have an action performed on it at a time, and that has solved the problem, but we're really hoping to benefit from some concurrency.

Taking a look at the code in ruby-git it seems like Dir.chdir is used a lot, and so it makes sense we might get this warning. I was wondering, what would you suggest is for best practices when it comes to working with multiple git repositories concurrently with ruby-git?

@perlun
Copy link
Contributor

perlun commented Apr 1, 2018

Taking a look at the code in ruby-git it seems like Dir.chdir is used a lot, and so it makes sense we might get this warning.

This is bad practice on our behalf. We should try to get rid of the Dir.chdir dependencies in our code. Could also be one of the reasons why we're seeing #358; I didn't see any chdir occurrences there but I might have not been looking carefully enough.

@rvodden
Copy link
Contributor

rvodden commented Apr 1, 2018 via email

@taquitos
Copy link
Contributor Author

Heyo @rvodden, I'm not seeing an issue yet for chdirs, would you mind pointing me in the right direction?

@perlun
Copy link
Contributor

perlun commented May 30, 2018

Ping @rvodden - any updates?

@stale
Copy link

stale bot commented Jul 29, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 29, 2018
@taquitos
Copy link
Contributor Author

stale bot should be updated to check linked PRs 😄

@stale stale bot removed the wontfix label Jul 30, 2018
@perlun perlun added the pinned label Aug 13, 2018
@perlun
Copy link
Contributor

perlun commented Aug 13, 2018

@taquitos File a GitHub issue at https://github.com/probot/stale 😉 I pinned this issue for now.

@xdmx
Copy link

xdmx commented Nov 20, 2022

I've just got hit by this. I clone different repositories concurrently in Sidekiq jobs, when there is only one job running it works without any problem, but when there are 2+ jobs everything starts breaking down. Over the last 8 years there have been various attempts (#195, #372, #401, #383) to remove the use of chdir and make this gem thread safe. Is there any chance that will be done? 🙏 /cc @jcouball

If not, it might be worth writing a "red big warning" in the readme to prevent others hitting this limit

@jcouball
Copy link
Member

jcouball commented Nov 20, 2022

Thank you for the update @xdmx!

I am working on an update that I think will fix this. I plan on running the git subprocess with Process.spawn instead of using backticks. This makes collecting the process output more difficult, but it has the following advantages:

  • The environment between different processes (and the process running the command) would be isolated.
  • The current directly would be isolated between processes.
  • stdout and stderr can be separated. Spurious stderr output currently breaks parsing because stdout and stderr are captured together.
  • There are issues with command line escaping that cause git commands to fail (or worse). This can happen when filenames have weird characters.

I looked at less complex alternatives like those found in the Open3 module, but they didn't offer flexibility currently used in this gem.

The one downside to this approach is that Process.spawn on JRuby is just different enough not to work in a few cases and I haven't been able to figure out why. My approach is get a version working via MRI on Mac/Linux/Windows first and worry about JRuby later.

I hope to have something done by the end of the year (no promises -- working in my free time). The changeset will have to touch a LOT of things so I will try to slow roll the update by having a patch release and major version bump.

@xdmx
Copy link

xdmx commented Nov 20, 2022

Thank you @jcouball for your quick answer and for your continue work on this gem, I truly appreciate that.

Feel free to share here when you have something and I'll be happy to try it out at my end. For now I've solved it by putting those jobs in a non-concurrent queue/capsule and it's working (with the obvious limit on throughput).

There are issues with command line escaping that cause git commands to fail (or worse). This can happen when filenames have weird characters.

Just a curiosity about this one, do you think it's safe to run clone/branch/checkout/rebase/apply/commit commands using this gem on untrusty repositories? Given your point I'm afraid that there could be a risk of possible remote executions 😬

@fxposter
Copy link
Contributor

@jcouball I saw that you were working on restructuring cmd calls/etc, but decided to make my PR (#673) anyway. The goal is pretty simple - try to remove Dir.chdir calls without doing any changes to public APIs and doing very little changes to internal ones. Given the fact that Dir.chdir was used not only in places related to shell executions, maybe we can merge it while you are continuing your refactorings?

@a3626a
Copy link

a3626a commented Oct 24, 2023

Here's my another failure example to be considered.
( I decided not to use this gem :( )

In my projects, some parts are using this gem, and some other parts are using Open3 to run git command.
Unfortunately, ruby-git's with_custom_env_variables has some side-effect on other process, because it overwrites environment variables. so it affects any other forked process using Open3

Here's a minimal example I found.

# it takes 2-3 minutes to reproduce the issue.
# there 'b' raises error like:
# "error: cannot open '{---}/temp/git_clone_LV2qU29d-gz36Ae-81gk3w/.git/FETCH_HEAD': No such file or directory\nfatal: not a git repository: '{---}/temp/git_service_clone_LV2qU29d-gz36Ae-81gk3w/.git'\n"

a = Thread.new do
  loop do
    url = 'https://github.com/ruby-git/ruby-git.git'
    branch = 'master'

    dir_name = "temp/git_clone_#{SecureRandom.urlsafe_base64}"

    Git.clone(url, dir_name).checkout(branch) do
      # do nothing
    end
  ensure
    FileUtils.rm_rf(dir_name)
  end
end

b = Thread.new do
  loop do
    dir = '/home/leechanghwan/repo/jce-class-rails'
    # git fetch sometimes fails!
    stdout_and_stderr, status = Open3.capture2e('git fetch', chdir: dir)
    # Environment Variables of the child process corrupted.
    # stdout_and_stderr, status = Open3.capture2e('echo $GIT_DIR', chdir: dir)

    puts stdout_and_stderr unless status.success?
  end
end

a.join
b.join

@fxposter
Copy link
Contributor

Well, with the way it's written - it's obviously the case. And I believe this part can be fixed in the same way I fixed the chdir - by passing env explicitly into the new spawned process.

@fxposter
Copy link
Contributor

@a3626a but the example is a bit contrived still. ;) in most of the cases you won't do shellout to git in one place and using this gem in another place in the same program.

@fxposter
Copy link
Contributor

ca869fe this commit of the #675 that is built on top of the one that I mentioned above should fix this problem as well.

@a3626a
Copy link

a3626a commented Oct 26, 2023

@fxposter

#675 will fix the problem, thank you.


Here is a brief background of my case.

As you mentioned, Yes, it is not a good practice. I translated python code to ruby. And the original code just uses command line interface, so I translated into Open3.

I tried to use ruby-git for the other git-related part of my program.

And the "shared environment variable" problem really caused tons of errors to my users. Most of the errors are not serious, but one of my user actually showed me a serious fault case. That's why I started to investigate this problem deeply.

@fxposter
Copy link
Contributor

Well, the DCO was finally accepted for whatever reason in my PRs and now I see failures on jruby-windows, where IO.popen doesn't support chdir option.

@perlun
Copy link
Contributor

perlun commented Nov 1, 2023

For reference, discussion seems to be ongoing in #673 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants