Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop attempting to close nil Listener #4753

Merged
merged 1 commit into from
Aug 4, 2017

Conversation

ebozduman
Copy link
Contributor

Description

If HTTP server is not up and running and if the Listener is attempted to be closed when the server could not even started up, then the offending panic/crash happens.
To prevent crash from happening, this fix introduces a new HTTP server variable upAndRunning to keep track of the server's state. When we try to shut down the server, we just check if it was up before moving ahead with cleaning things up. This prevents the panic causing close attempt on the non-existing Listener.

Motivation and Context

Crash in FS mode when Erasure coded mode is already running #4715

How Has This Been Tested?

Started Minio in distributed mode and tried to start a 2nd Minio with one of the disks/resources that has been already used by the other Minio instance.
Also tested the opposite case.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@codecov-io
Copy link

codecov-io commented Aug 3, 2017

Codecov Report

Merging #4753 into master will increase coverage by 0.44%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4753      +/-   ##
=========================================
+ Coverage   63.45%   63.9%   +0.44%     
=========================================
  Files         187     187              
  Lines       27472   27737     +265     
=========================================
+ Hits        17433   17725     +292     
+ Misses       8883    8856      -27     
  Partials     1156    1156
Impacted Files Coverage Δ
pkg/http/server.go 24.73% <0%> (-0.83%) ⬇️
cmd/xl-v1.go 87.23% <0%> (+0.27%) ⬆️
cmd/credential.go 96.05% <0%> (+3.59%) ⬆️
cmd/posix.go 81.94% <0%> (+7.19%) ⬆️
cmd/fs-v1-helpers.go 67.65% <0%> (+7.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 108decf...66dc3cf. Read the comment docs.

@ebozduman ebozduman force-pushed the crashFSmode4715 branch 2 times, most recently from b5f36d9 to dae8e02 Compare August 3, 2017 00:32
if atomic.LoadInt32(&srv.upAndRunning) <= 0 {
return nil
}

// Close underneath HTTP listener.
srv.listenerMutex.Lock()
err := srv.listener.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check srv.listener == nil than bringing additional field to the structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking Listener for nil was my first approach, but then I discussed it with @harshavardhana and he suggested to implement the fix this way by introducing additional field in the structure.
As a continuation, the goal is to refactor the code more. So, the plan is to get rid of the mutex logic from Listener code/design. We can discuss it further on the phone if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now @ebozduman checking for listener as nil is fine if we are not resetting or reading the value elsewhere for this fix.

The refactor discussions requires lot more work we can defer that discussion for later date.

Copy link
Contributor Author

@ebozduman ebozduman Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
@balamurugana , @harshavardhana , @donatello ,
Please review it again.
Thanks.

@@ -123,6 +123,10 @@ func (srv *Server) Shutdown() error {
return errors.New("http server already in shutdown")
}

if srv.listener == nil {
return nil
Copy link
Member

@balamurugana balamurugana Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A racy Start() and Stop() would set srv.inShutdown incremented. It would look like the server is in shutdown. You could move this check above if atomic.AddUint32(... block.

  2. How about returning an error server not initialized than nil?

@balamurugana
Copy link
Member

commit message like Stop attempting to close nil Listener much clearer.

@deekoder deekoder merged commit 0aca2ab into minio:master Aug 4, 2017
@balamurugana balamurugana changed the title Stops attempting to close Listener if HTTP server is down Stop attempting to close nil Listener Aug 4, 2017
@ebozduman ebozduman deleted the crashFSmode4715 branch August 4, 2017 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants