Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app_offline.htm file not being consistently removed by Kudu #2949

Closed
codecadwallader opened this issue Mar 19, 2019 · 30 comments
Closed

app_offline.htm file not being consistently removed by Kudu #2949

codecadwallader opened this issue Mar 19, 2019 · 30 comments

Comments

@codecadwallader
Copy link

We started noticing a couple weeks ago that occasionally the app_offline.htm file is not being removed at the end of the Kudu process. This results in the site remaining offline with a 503 Site under construction error.

We mentioned it over at Azure/app-service-announcements#137 and several other people chimed in they were seeing it as well.

The issue is not happening consistently, but it is happening fairly regularly for us (~20% of the time). We've temporarily disabled auto-swap and are manually checking for the app_offline.htm file after a deploy, manually deleting it when present and proceeding with a swap.

Please let me know what information you would like for me to provide (the template didn't seem completely applicable) and thanks for any help you can offer!

@codecadwallader codecadwallader changed the title app_offline.htm file not being removed by Kudu app_offline.htm file not being consistently removed by Kudu Mar 19, 2019
@maiqbal11
Copy link

@ahmelsayed, @suwatch, can we take a look at this? We also received a CRI about this and multiple customers have experienced it.

@jonsagara
Copy link

Count me as another customer affected by this.

@suwatch
Copy link
Member

suwatch commented Mar 22, 2019

We will investigate and update ..

@suwatch
Copy link
Member

suwatch commented Mar 22, 2019

Here is the code here and here creating and removing app-offline during deployment. There should be a trace in the deployment log that we create and remove the app-offline file. There is one condition where we may not remove the app_offline file if content has changed (which should be rare).

If you have the site repro in that state - we would like to have a look and get to the bottom. If you don't really care about this app_offline (you can opt out by adding this appSettings).

@codecadwallader
Copy link
Author

Thanks for the response and code samples. I'll be sure to report in when we next have a site actively in that state (Saturday morning now, but should be early next week).

I have considered avoiding the app_offline file altogether but it's big value is stopping the staging site to reduce file lock errors during deployment. We still get those sometimes even with the app_offline present (perhaps graceful shutdowns are not completed first?) but I'm eager to learn more if there's a better suggestion for increasing consistency of the builds.

@codecadwallader
Copy link
Author

FWIW we haven't seen the issue this week, so we're going back to auto-swap for production. I'd like to leave this issue open since others have seen it too in case anybody else can offer the request from suwatch to see a site that is actively in that state. I'll certainly come back to report it when/if we next see the issue.

@TomStoryAI
Copy link

Hello there,

I am trying to deploy an angular app to azure and I am having difficulties with the app_offline.htm file as well. I feel like there is a little bit of magic going on here, can someone elaborate how app_offline fits into existing kudu-scripts, where is this declared and created?

The angular app I created resides here, as well as the kudu script: https://github.com/BigBubbus/beraterboard-test

How can I instruct Kudu to remove app_offline after deployment, I still go on and delete it manually.

Thanks for every tip here :)
Thomas

@suwatch
Copy link
Member

suwatch commented Apr 19, 2019

The app_offline creation and removal is part of KuduSync.Net. It is supposedly removed after we are done with deploying bit to destination. You can opt out by setting appSettings SCM_CREATE_APP_OFFLINE=0.

We are interested to see why it was left there. Do provide us the UTC time of incident and webapp, we can help taking a look. @ankitkumarr for fyi.

@celkinc
Copy link

celkinc commented Jul 9, 2019

We had this issue last night on our production instance. Even after multiple deploys, it still would not remove w/o us manually removing the file. Regrettably this put us in a severe outage for hours before technical teams thought to check for the file. What can be done to find out why this didn't remove and prevent in the future? Is someone actively looking into this issue?

@mmckechney
Copy link

A customer of mine has also encountered this issue. It took several days of a premier ticket to figure out the cause, but thankfully was in a pre-production environment. What is the resolution/status of this issue?

@frellis
Copy link

frellis commented Aug 9, 2019

To help guide others here to the root cause

Evenlog.xml messages that appear over and over:
Global configuration changed. Change path: MACHINE/WEBROOT/APPHOST/, ApplicationManager: ####### [TID ####] [PID ####]
Recycling application MACHINE/WEBROOT/APPHOST/ [TID ####] [PID ####]
Change in global configuration [TID ####] [PID ####]

FREB Log message:
Module: AspNetInitializationExceptionModule, Notification: BEGIN_REQUEST, Status: 503, SubStatus: 0, HttpReason: Service Unavailable, Win32ErrorCode: 0x00000000, ConfigException:

@mcdurdin
Copy link

We have experienced this as well with an automatic deployment from GitHub, just now.

@ankitkumarr
Copy link
Member

I apologize that this still was an issue. Usually when this happens, it is likely that the deployment itself failed because we were unable to remove that app_offline.htm. But I understand that this leaves the app in the offline state without the user actually informed.
We have a work item to improve this experience. I will be taking a look at it next week. We will update this thread with an ETA.

@mcdurdin
Copy link

@ankitkumarr Thank you for such a quick and helpful response :)

@jslaybaugh
Copy link

Are there any updates on this? This is causing major issues for us every time we deploy our .net core production site. We've tried deploying from VS for mac, VS for windows, and through Devops (including swapping slots) and every time we have this issue. MS support is recommending we set SCM_CREATE_APP_OFFLINE to 0. But as mentioned in the initial post here Azure/app-service-announcements#137, this seems like it could have issues of its own? Also, we're having issues where changing an App Setting is causing the site to be temporarily unavailable. Is this related as well?

@beaubgosse
Copy link

We're also seeing this issue.

@ankitkumarr
Copy link
Member

I apologize for how long it took me to get back. We are starting to work on a fix to avoid such issues and this will be completed by the end of this month.

As Suwat mentioned above, the way kudu deployments work today is that before deployment we create an app_offline.htm to make sure you app becomes offline for any files that may need to be removed. Then, after deployment is complete (failure or success), it removes the app_offline.htm (with retries). If the file was not removed, it indicates that something else could have gone wrong. We saw a case previously where there was a storage fail-over. I would even argue that in some cases it may be the right thing to do to keep the app offline.

We will do our best to make this resilient and work on ways to ensure the file is always cleaned, but it would help to look at case by case on why this file was not removed in the first place. If you do see this issue, please share with us the app name and a time-frame so we can investigate.

Setting SCM_CREATE_APP_OFFLINE to 0 wouldn't have offline related issues. It means that kudu will not create the app_offline.htm. So, if your files have a lock on them such that they cannot be removed while the site up, the deployment may fail.

@nicholastic
Copy link

Me too, please fix soon.

@ankitkumarr
Copy link
Member

ankitkumarr commented Dec 12, 2019

@nicholastic and @badescuga, can you share you app name and the time period you saw the issue?

@djsowa
Copy link

djsowa commented Dec 13, 2019

This is critical issue not solved for months !
There is no sense in adding app_offline file when making deployment to the slot which has 0% of traffic so there is no chance for files being locked.

@suwatch
Copy link
Member

suwatch commented Dec 13, 2019

@djsowa Do consider SCM_CREATE_APP_OFFLINE = 0 on staging slot - make it slot settings.

@djsowa
Copy link

djsowa commented Dec 27, 2019

@suwatch Yes, we are making this change with SCM_CREATE_APP_OFFLINE.
But my clients have more than 100 AppServices, each managed by Terraform.
This mean I have to make more than 100 pull requests :(

@mcdurdin
Copy link

This just happened again on help-keyman.azurewebsites.net (I was notified at 2020-01-11T01:16:11.0610722Z) after a deployment started at 12:09:32 PM GMT+11 (which succeeded) (commit id 2b836a8).

I have manually removed app-offline.htm in order to make sure the site is available, but nothing else has currently changed on the site.

Nothing visible in the logs.

@jonorogers
Copy link

This happened to us on two separate web app services last weekend. Both times were Kudu Azure CI builds from a bitbucket repo. Issue is still happening. Nothing visible in the logs/UI, the UI indicates the build has succeeded as per normal.

Hair-pulling for sure.

@ankitkumarr
Copy link
Member

The fix for this is completed and merged in. The release will be starting this week and should be deployed by the end of next week.
Thank you all for your patience.

@beaubgosse
Copy link

@ankitkumarr Is this fix released now? If so, in what version, or how can we check that we got the fix? Thanks!

@ankitkumarr
Copy link
Member

Yes, the deployment was just completed today. If you go to your kudu, you should see the version = (or >) "85.11226.4297.0"

That means you should have the fix, where leftover "app_offline.htm" by a failed deployment is auto deleted by kudu.

@ankitkumarr
Copy link
Member

Closing as this is resolved and deployed. If you do see this again, please feel free to "@" me here, with necessary information (site name, timestamp) and we can take a look.

@larryeby
Copy link

@ankitkumarr I am seeing this issue still on an Azure Functions app being deployed through ADO with the "Take app offline" box selected.
The Kudu version is currently 85.11226.4297.0.

Our setup is to deploy to a slot, then swap the slots.

@IndifferentDisdain
Copy link

@ankitkumarr This happened for me today. Azure App Service, fl-cc-am-beta.azurewebsites.net. Deployed through CI/CD for years at this point w/o issue. Last update on CI pipeline was 5/21/21, and the last release (ran as part of PR check) did fail w/ ERROR_FILE_IN_USE. Kudu build 94.30524.5227.0, so maybe this was left over when the deployment failed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests