Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitea interface unavailable after deleting member of Org "Owners" team #4362

Closed
2 of 7 tasks
leepfrog-ger opened this issue Jul 4, 2018 · 8 comments
Closed
2 of 7 tasks
Labels

Comments

@leepfrog-ger
Copy link

leepfrog-ger commented Jul 4, 2018

  • Gitea version (or commit ref): 1.4.2
  • Git version: not relevant
  • Operating system: Windows Server 2012 R2
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant
  • Log gist:
2018/07/04 15:35:31 [...routers/org/teams.go:107 TeamsAction()] [E] Action(remove): read tcp GITEA_SERVER_IP:49445->MSSQL_SERVER_IP:1433: i/o timeout
2018/07/04 15:35:31 [...routers/org/teams.go:219 TeamMembers()] [E] GetMembers: get user '0': user does not exist [uid: 0, name: , keyid: 0]
2018/07/04 15:35:31 [D] Template: status/500
...
2018/07/04 15:36:47 [...routers/user/home.go:49 getDashboardContextUser()] [E] GetOrganizations: user does not exist [uid: 0, name: , keyid: 0]

Description

I am using Gitea on Windows with MSSQL as DB backend. By chance I found out that removing the first owner of an organization brings down the whole gitea web interface for us.

Steps to reproduce in our environment:

  1. Create a new Org using UserA
  2. Go into the Owners Team and add UserB as owner
  3. Remove UserA from the owners team

What happes is that the page does not refresh for about 30 seconds and then shows an 500 error.
All subsequent requests to the gitea webinterface then fail with this 500 error, no matter if using an existing session or accessing it without being logged in.

I was only able to get it to work again by restoring a DB backup previous to the deletion.

When deleting UserA the following entries are written to the gitea.log:

2018/07/04 15:35:31 [...routers/org/teams.go:107 TeamsAction()] [E] Action(remove): read tcp GITEA_SERVER_IP:49445->MSSQL_SERVER_IP:1433: i/o timeout
2018/07/04 15:35:31 [...routers/org/teams.go:219 TeamMembers()] [E] GetMembers: get user '0': user does not exist [uid: 0, name: , keyid: 0]
2018/07/04 15:35:31 [D] Template: status/500

each subsequent request to the web interface generates an entry similar to this in the log file:

2018/07/04 15:36:47 [...routers/user/home.go:49 getDashboardContextUser()] [E] GetOrganizations: user does not exist [uid: 0, name: , keyid: 0]
2018/07/04 15:36:47 [D] Template: status/500

I've used SQL profiler to see which queries are executed against the DB when the user is deleted, and there are only two queries that modify the database (all others are SELECT statements):

exec sp_executesql N'DELETE FROM "team_user" WHERE "org_id"=@p1 AND "team_id"=@p2 AND "uid"=@p3',N'@p1 bigint,@p2 bigint,@p3 bigint',@p1=6,@p2=1,@p3=2

exec sp_executesql N'UPDATE "team" SET "num_members" = @p1 WHERE "id"=@p2',N'@p1 bigint,@p2 bigint',@p1=1,@p2=1

The SQL profiler does not pick up any queries with errors or warnings, timeouts or high load when the deletion is performed. As all other queries are exeuted in in less than 10 milliseconds and given that I can reproduce this reliably on each try I do not think this is in any way load or network related, the i/o timeout message might be more of a red herring.

Let me know if you need anything else.

@jonasfranz
Copy link
Member

Could you please post xorm.log if existing. If not set log LEVEL in your app.ini to Info as described here: https://docs.gitea.io/en-us/config-cheat-sheet/#log-log and restart your gitea instance. Now the SQL queries should be shown in console or log.

@leepfrog-ger
Copy link
Author

I already was running in LEVEL = Trace as loglevel but no xorm.log is present.
I've just tried again with Info Level but no avail.
In both cases the gitea.log shows the XORM log level is set correctly:

2018/07/05 11:37:55 [I] Log Mode: File(Info)
2018/07/05 11:37:55 [I] XORM Log Mode: File(Info)

@jonasfranz
Copy link
Member

Change log MODE to console and submit the console output. Maybe this will give us more information.

@jonasfranz
Copy link
Member

Maybe #4282 can help us finding the problem.

@leepfrog-ger
Copy link
Author

leepfrog-ger commented Jul 5, 2018

Some new information: Changing to MODE = file,console indeed leads to the console printing more details which are not present in the log files written to disk (maybe this is an issue in itself?)

The moment the error ocurrs:

[Macaron] 2018-07-05 14:50:06: Started GET /org/testorg23/teams/owners/action/remove?uid=11 for 10.10.10.10
2018/07/05 14:50:06 [I] [SQL] SELECT  TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "id"=? []interface {}{11}
2018/07/05 14:50:06 [I] [SQL] SELECT count(*) FROM "notification" WHERE (user_id = ?) AND (status = ?) []interface {}{11, 0x1}
2018/07/05 14:50:06 [I] [SQL] SELECT  TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "lower_name"=? []interface {}{"testorg23"}
2018/07/05 14:50:06 [I] [SQL] SELECT "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE (org_id=?) ORDER BY CASE WHEN name LIKE 'Owners' THEN '' ELSE name END []interface {}{10092}
2018/07/05 14:50:06 [I] [SQL] BEGIN TRANSACTION
2018/07/05 14:50:06 [I] [SQL] SELECT top 1 * FROM team_user WHERE (org_id=?) AND (team_id=?) AND (uid=?) []interface {}{10092, 10005, 11}
2018/07/05 14:50:06 [I] [SQL] SELECT * FROM "repository" INNER JOIN "team_repo" ON repository.id = team_repo.repo_id WHERE (team_repo.team_id=?) []interface {}{10005}
2018/07/05 14:50:06 [I] [SQL] DELETE FROM "team_user" WHERE "org_id"=? AND "team_id"=? AND "uid"=? []interface {}{10092, 10005, 11}
2018/07/05 14:50:06 [I] [SQL] UPDATE "team" SET "num_members" = ? WHERE "id"=? []interface {}{1, 10005}
2018/07/05 14:50:07 [I] [SQL] SELECT count(*) FROM "team_user" WHERE "org_id"=? AND "uid"=? []interface {}{10092, 11}
2018/07/05 14:50:07 [I] [SQL] SELECT  TOP 1 "id", "uid", "org_id", "is_public" FROM "org_user" WHERE (uid=?) AND (org_id=?) []interface {}{11, 10092}
2018/07/05 14:50:07 [I] [SQL] SELECT  TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "id"=? []interface {}{10092}
2018/07/05 14:50:07 [I] [SQL] SELECT  TOP 1 "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE "org_id"=? AND "name"=? []interface {}{10092, "Owners"}
2018/07/05 14:50:37 [I] [SQL] ROLL BACK
[Macaron] 2018-07-05 14:50:37: Completed GET /org/testorg23/teams/owners/action/remove?uid=11 200 OK in 30.1075173s
2018/07/05 14:50:37 [...routers/org/teams.go:107 TeamsAction()] [E] Action(remove): read tcp GITEA_SERVER_IP:50522->MSSQL_SERVER_IP:1433: i/o timeout

It seems to rollback a transaction after 30 seconds which leaves the database in a non-functional state. After that, pretty much every request to the web interface will lead to the HTTP 500.
E.g. below is the output when trying to browse to explore:

[Macaron] 2018-07-05 14:54:02: Started GET /explore/organizations for 10.10.10.10
2018/07/05 14:54:02 [I] [SQL] SELECT  TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "id"=? []interface {}{2}
2018/07/05 14:54:02 [I] [SQL] SELECT count(*) FROM "notification" WHERE (user_id = ?) AND (status = ?) []interface {}{42, 0x1}
2018/07/05 14:54:02 [I] [SQL] SELECT count(*) FROM "user" WHERE type=? []interface {}{1}
2018/07/05 14:54:02 [...itea/routers/home.go:188 RenderUserSearch()] [E] SearchUsers: Count: sql: expected 37 destination arguments in Scan, not 1
2018/07/05 14:54:02 [D] Template: status/500

@leepfrog-ger
Copy link
Author

I was puzzled how a transaction that is cleanly rolled back both transactions can cause this issue and did the obvious test I missed before: Restarting the service after the error ocurred.

And sure enough the server seems to come up without any errors. So the issue is "runtime only". It seems that some internal component does not handle the rolled back transaction gracefully, maybe some kind of cache corruption.

The question remains of course why the Transaction timeout/rollback is ocurring in the first place.

@leepfrog-ger
Copy link
Author

And another update: I talked to someone more familiar with MSSQL transaction handling than myself and it seems that the reason for this is the following:

Default transaction isolation level for MSSQL is SERIALIZABLE (https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2016).

Statements cannot read data that has been modified but not yet committed by other transactions.

But Gita tries to do that:

2018/07/05 14:50:06 [I] [SQL] BEGIN TRANSACTION
...
2018/07/05 14:50:06 [I] [SQL] UPDATE "team" SET "num_members" = ? WHERE "id"=? []interface {}{1, 10005}
...
2018/07/05 14:50:07 [I] [SQL] SELECT  TOP 1 "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE "org_id"=? AND "name"=? []interface {}{10092, "Owners"}
2018/07/05 14:50:37 [I] [SQL] ROLL BACK

The row for "Owners" in the team table is updated and then read within the same transaction. It cannot be read due to the lock on the record which leads the whole transaction to time out and being rolled back.

So it seems there are two issues:

  1. Gitea is unable to handle a transaction rollback gracefully in that situation
  2. The code needs to be changed so that after the delete and update query a commit takes place OR the transaction isolation level needs set accordingly for those queries

@lunny
Copy link
Member

lunny commented Nov 7, 2018

This is duplicated with #4972 and should be closed by #5214. Please feel free to reopen it.

@lunny lunny closed this as completed Nov 7, 2018
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants