Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy usage of information schema periodically causing server performance issue #569

Closed
badbod99 opened this issue Oct 10, 2018 · 18 comments
Assignees

Comments

@badbod99
Copy link

Every hour or so (sorry, despite watching all day not been able to identify exact pattern), we are seeing our MySQL Server Load Average hit 70+ (32CPU box). From looking at innotop, I can see a massive queue of these sort of queries

SELECT ORDINAL_POSITION, PARAMETER_MODE, PARAMETER_NAME, DATA_TYPE, DTD_IDENTIFIER FROM information_schema.parameters WHERE SPECIFIC_SCHEMA = 'plus' AND SPECIFIC_NAME = 'mysp' ORDER BY ORDINAL_POSITION;

These seem to be triggered by the stored procedure cache logic. My original thought was that the Web Server was recycling the app pool, thus disposing the memory area used by the connection pool and thus the stored procedure cache. So I set the Web Server to AlwaysOn and the IdleTimeout to zero. The issue still occurs.

Is there something which triggers a connection pool clean up or recycle on interval? Monitoring reports that the app pool is not restarting within IIS, so seems it's something on interval within the software.

This is going to be very difficult to simulate in a simple package as we have 8000+ stored procedures and thousands of users on the system at any one point. I wouldn't know where to start to replicate as this issue has only started occurring recently.

@badbod99
Copy link
Author

Also worth noting, we are seeing queries like the above for the same sp in the same schema, where that sp is called very regularly by the application.

@bgrainger
Copy link
Member

The stored procedure cache is stored with the connection pool and is only cleared if MySqlConnection.ClearPool or ClearAllPools is called; are you ever calling that method?

There is no stored procedure cache if Pooling=false is set in the connection string; is this a possibility in your application?

Are you only seeing WHERE SPECIFIC_SCHEMA = 'plus' AND SPECIFIC_NAME = 'mysp' in your query log, or are lots of different stored procedure names being logged (and this is just one example)? If it is just this one name, what does this query return?

SELECT COUNT(*)
FROM information_schema.routines
WHERE ROUTINE_SCHEMA = 'plus' AND ROUTINE_NAME = 'mysp';

@badbod99
Copy link
Author

Not calling ClearPool or ClearAllPools. We have Pooling=True. Connection string is:

datasource=10.0.0.5;username=*****;password=*****;database=plus;Pooling=true;Max Pool Size=40;Min Pool Size=0;default command timeout=60;Convert Zero Datetime=true;

Running 2 web servers each with this same connection string. mysp was just an example. We are seeing many different sps named in the query. Always from one of the 2 web servers and the same application (same DB user).

SELECT COUNT(*)
FROM information_schema.routines
WHERE ROUTINE_SCHEMA = 'plus';

Returns 2995. We have SPs in other Schemas, but the application mainly uses this one and this is where we are seeing these cache queries.

Is the cache per connection? Looking at the code, should be for the whole pool. One of our SPs is called for every page on the application for every user, but we are seeing that turn up in this cache query during a normal working day.

@bgrainger
Copy link
Member

Thanks; that's helpful information.

The cache should be for the whole pool, not per connection. I'll review the code and see if there's something that might be causing cache eviction (or failure to populate the cache, which might be more likely). I can also add some logging in this code path, which may provide additional clues (if you're able to update to the next release once it comes out).

@badbod99
Copy link
Author

We have a test system which we can run a new release on with logging. We won't see the load or backlog of queries on there, but if the cache isn't being populated correctly, should show up in the logging.

@bgrainger
Copy link
Member

I added logging for cached procedures in 0.46.0. Please update to this on a test system and use one of the logging options or MySqlConnectorLogManager.Provider = new ConsoleLoggerProvider(MySqlConnectorLogLevel.Debug);. Make sure to enable Debug level logging.

@badbod99
Copy link
Author

mysqlconnector.log.zip

Find attached logs from our test system. We're not seeing the exact same behavior on this version on test. However you might be able to glean some information from the logs.

We'll have this release out on live towards end of next week and I'll provide a live log also which hopefully will show the issue more clearly.

@bgrainger
Copy link
Member

I see that plus.reportrequestparam_put has 6 queries for SELECT COUNT(*) FROM information_schema.routines ... because this stored procedure gets executed for the "first" time simultaneously 6 times around 17 Oct 2018 07:58:19.

A similar thing happens for plus.template_fields_get (requested simultaneously 4 times) around 17 Oct 2018 05:46:48.

MySqlConnector eagerly fetches information_schema information for a stored procedure, even if it's already being requested on another thread. We could fix this by adding more locks in the client. Depending on how this was done, it might affect the scalability of the stored procedure cache. It would also be nice to know that this definitely causes a problem for MySQL Server and is something worth fixing; does looking up the same schema information really cause problems for MySQL Server, or is it able to effectively cache and return the data?

@bgrainger
Copy link
Member

BTW, you have six connection pools in your application; you might want to try to get this down to 2, to improve the efficiency of client-side caching.

plussystem

ConnectionString: Server=10.0.8.4;User Id=plussystem;Database=plus;Pooling=true;Maximum Pool Size=50;Minimum Pool Size=5;Default Command Timeout=60;Convert Zero Datetime=True

plususer

ConnectionString: Server=10.0.8.4;User Id=plususer;Database=plus;Pooling=true;Maximum Pool Size=200;Minimum Pool Size=50;Default Command Timeout=120;Convert Zero Datetime=true
ConnectionString: Server=10.0.8.4;User Id=plususer;Database=plus;Pooling=true;Maximum Pool Size=80;Minimum Pool Size=50;Default Command Timeout=60;Convert Zero Datetime=true
ConnectionString: Server=10.0.8.4;User Id=plususer;Database=plus;Pooling=true;Maximum Pool Size=80;Minimum Pool Size=50;Default Command Timeout=900;Convert Zero Datetime=true
ConnectionString: Server=10.0.8.4;User Id=plususer;Database=plus;Pooling=true;Maximum Pool Size=80;Minimum Pool Size=50;Default Command Timeout=90;Convert Zero Datetime=true
ConnectionString: Server=10.0.8.4;User Id=plususer;Database=plus;Pooling=true;Maximum Pool Size=200;Minimum Pool Size=50;Default Command Timeout=900;Convert Zero Datetime=true

The differences are in Maximum Pool Size and Default Command Timeout.

@badbod99
Copy link
Author

Thanks for the feedback on connection pools. I'll look into what's causing this.

I noticed the the simultaneous execution of the same SP triggered the call the information_schema. It does seem to be that the multiple simulateous hits on information_schema cause a performance impact.
It might be that this is only seen in a high load scenario. There is an article covering information_schema optimisation at https://dev.mysql.com/doc/refman/5.6/en/information-schema-optimization.html. In short though, information_schema is not a regular set of MySQL tables.

I think it would be an improvement to synchronize the loading of cached tables as a minor performance hit on the client once is preferable to a performance hit on the database server.

We are seeing on live calls for SPs which should have been long since cached. Should have the logging on live from tomorrow night and will provide soon.

@badbod99
Copy link
Author

badbod99 commented Nov 14, 2018

Right, I've managed to get a log from live during the time this issue was occurring. This time it maxed out our MySql server for a few minutes. The site has been running without restart for days so there should be no new SPs to cache, yet we are still seeing these schema queries. See attached. These are from 2 front end webservers.

After a while, these queries stopped appearing as frequently. Certainly the problem is that these queries seem to perform poorly when many are run at once. This is a 32 CPU server. More of the problem though is that these schema queries are being run when those sps should already be in the cache.

mysqlconnector1.log.zip
mysqlconnector2.log.zip

@bgrainger
Copy link
Member

I'm seeing evidence of seven different connection pools in these logs (but I don't have logs for when they're created, so I don't know what the different connection strings are). As before, I would strongly recommend trying to get this down to 2 (one per UserId).

As far as I can tell, most sprocs in the log are fetched from information_schema.routines just once (112 sprocs). There are 35 that are fetched twice or more, with the most being seven, across two connection pools. Based on the timestamps, it appears that a flurry of requests come in very close together, and all threads servicing those requests individually cache the SP. I mentioned this limitation in a previous comment:

MySqlConnector eagerly fetches information_schema information for a stored procedure, even if it's already being requested on another thread.

I'm unable to evaluate your claim that these "sps should already be in the cache." Given that you have 3000 sprocs (in the plus schema), it may take quite some time for the cache to be populated. I'll add some logging that dumps the size of the cache; then we could verify that its size is monotonically increasing and estimate the cache hit rate.

Finally, from this blog post it sounds like "INFORMATION_SCHEMA is slow" is a known issue in MySQL Server that they're starting to address in MySQL Server 8.0.

When we look at the history of INFORMATION_SCHEMA there have been a number of complaints about the performance of certain queries, particularly in the case that there are many database objects (schemas, tables etc).

There's also this "fun" comment in the Connector/NET source:

// we want to avoid using IS if  we can as it is painfully slow

https://github.com/mysql/mysql-connector-net/blob/68c54371821c87ff40a773acc127ce357b46a5ae/Source/MySql.Data/ISSchemaProvider.cs#L356

It looks like Connector/NET falls back to parsing the source code of a SHOW CREATE statement to extract the parameters, because that's faster than using INFORMATION_SCHEMA. We may have to fall back to using mysql.proc and parsing SQL for MySQL versions prior to 8.0 if this is an unavoidable performance bottleneck in MySQL Server.

@bgrainger
Copy link
Member

I timed the following on a local MySQL Server 5.7 that has 21 user-defined routines:

MySqlConnector current code

SELECT COUNT(*) FROM information_schema.routines WHERE ROUTINE_SCHEMA = @s AND ROUTINE_NAME = @n;
SELECT ORDINAL_POSITION, PARAMETER_MODE, PARAMETER_NAME, DATA_TYPE, DTD_IDENTIFIER FROM information_schema.parameters WHERE SPECIFIC_SCHEMA = @s AND SPECIFIC_NAME = @n ORDER BY ORDINAL_POSITION;

8ms

Using mysql.proc

select * from mysql.proc WHERE db = @s AND specific_name = @n;
SHOW CREATE PROCEDURE @s.@n;

0.5ms

It appears that bypassing INFORMATION_SCHEMA is at least 10× faster.

In MySQL Server 8.0, INFORMATION_SCHEMA takes about 1.2ms. Still not as fast as the old way, but mysql.proc is no longer available in 8.0.

@badbod99
Copy link
Author

badbod99 commented Nov 15, 2018

I've been reviewing the Event log for the servers in question around the time of the issue. I think the cache was cleared due to an App Pool recycle. That makes sense. I've reconfigured the recycling interval so hopefully the cache will get time to "warm up". So looks like not an issue with the cache behavior itself, just the performance of these information_schema queries.

Our previous code (with Connector/Net) used mysql.proc to cache, and this did perform well. So I agree that switching over to that for <MySql 8.0 would be a good move. Like many others, there is little prospect of us upgrading to MySql 8.0 in foreseeable.

One other thing we used to do was load the whole cache on start rather than on demand. Even with 3000 sps, this was very quick against mysql.proc. This way you won't get uncached behaviour during operational hours.

You don't actually need to use SHOW CREATE PROCURE, as the parameters list is available in mysql.proc. Our previous code just used:

SELECT name, param_list as params FROM mysql.proc WHERE db='plus' AND type = 'PROCEDURE';

This code handled it:

private void SetupStoredProcs(HttpContextBase context)
{
	if (factory.Config.DataCache)
	{
		log.Debug("Set up stored procs from cache");
		object cachedData = context.Cache[storedProcedureCacheName]; // This as SQL above

		if (cachedData != null)
		{
			Sps = cachedData as Dictionary<string, SP>;
		}
	}

	if (Sps == null)
	{
		log.Debug("Set up stored procs from db");
		Sps = new Dictionary<string, SP>();
		DataSet ds = GetSystemData(storedProcedureStoredProcedureName, null);
		if (ds != null && ds.Tables.Count > 0 && ds.Tables[0].Rows.Count > 0)
		{
			foreach (DataRow row in ds.Tables[0].Rows)
			{
				Dictionary<string, Param> dictParams = new Dictionary<string, Param>();
				string spName = row[0].ToString();
				log.DebugFormat("Adding stored procedure: {0}", spName);
				
				DataColumn column = ds.Tables[0].Columns[1];
				string spParamsData;
				if (column.DataType == typeof(string))
					spParamsData = row[1].ToString();
				else
					spParamsData = Encoding.UTF8.GetString((byte[])(row[1]));

				// remove comment block (/* */) from string
				spParamsData = Regex.Replace(spParamsData, @"/\*(.*?)\*/", 
					me => { if (me.Value.StartsWith("/*") || me.Value.StartsWith("//")) return me.Value.StartsWith("//") ? Environment.NewLine : "";
						return me.Value;
					}, RegexOptions.Singleline);

				string[] spParams = spParamsData.Split(',');
				foreach (string param in spParams)
				{
					string spParam = param;

					// skip singleline comment (--) from param array
					if (spParam.Contains("--"))
						continue;

					spParam = Regex.Replace(spParam, @"\bINOUT\b", string.Empty);
					
					Regex regexParamParts = new Regex(@"\W*(\w+)\W+(\w+)\W*(\d*)\W*");
					Match match = regexParamParts.Match(spParam);
					if (match.Success)
					{
						string spParamName = match.Groups[1].Value;
						string spParamType = match.Groups[2].Value;
						int spParamLength = 0;
						if (match.Groups.Count > 3)
							Int32.TryParse(match.Groups[3].Value, out spParamLength);
						Param p = new Param();
						p.Name = spParamName;
						p.Type = spParamType.ToLowerInvariant();
						p.Length = spParamLength;
						dictParams[p.Name] = p;
					}
				}

				SP sp = new SP();
				sp.Name = spName;
				sp.Params = dictParams;
				Sps.Add(sp.Name, sp);
			}
		}
		if (factory.Config.DataCache)
			context.Cache.Insert(storedProcedureCacheName, Sps, null, DateTime.MaxValue, TimeSpan.Zero, CacheItemPriority.NotRemovable, null);
	}
}

@bgrainger
Copy link
Member

I'm working on some code that will use the mysql.proc table to get stored procedure information. This will involve writing parsing code that needs to handle arbitrary user input. It occurs to me that your database could provide a large amount of real-world testing data. If you don't mind, could you export the results of this query (CSV or any machine-readable format should be fine) and email them to me privately?

SELECT name, type, CONVERT(param_list USING utf8mb4), CONVERT(returns USING utf8mb4) FROM mysql.proc;

I assume the CONVERT( USING ) should work fine; if it doesn't, I will need to find a workaround for that; please let me know.

@bgrainger
Copy link
Member

I pushed a commit that prefers mysql.proc if it's available: c3ef3fe

If you're willing to run a beta version of the library on one of your web nodes (this would basically be the only change since 0.47.1), I could release a beta on NuGet for your testing and feedback.

@bgrainger bgrainger self-assigned this Nov 17, 2018
@bgrainger
Copy link
Member

I have a beta version that uses mysql.proc for procedure metadata; please test it and let me know if it works, and reduces load: https://www.nuget.org/packages/MySqlConnector/0.48.0-beta1

@bgrainger
Copy link
Member

Fixed in 0.48.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants