Performance problem when testing instance with a lot of databases #316
To reproduce try:
Run a single database test against a remote instance, something with some network latency. Wait, what seems like forever.
I think there might be two issues causing it. First of all the
It might be that Get-DbaDatabase is not as efficiante as it could be, or there is a reason for this complexity, but then perhaps it is too heavy to use it in simple tests in dbachecks.
The text was updated successfully, but these errors were encountered:
The problem is with SMO and the Databases collection. When it is used in
I can see the value of having an SMO Database object returned by Get-DbaDatabase in dbatools, I see how it makes tests easy to write and look pretty, but the same thing makes the tests painfully slow on scale and over network.
It probably would be possible that get somewhat better performance with using
I would prefer to avoid using OMS alltogether and instead have something like
Let me know what you think. In the meantime I'll follow this path and see where it takes me.
i feared that was the case
we do use
we should probably switch to connect-dbainstance then execute a t-sql command
With hundreds of databases it looks like i'm 'severely slowed' in every test I tried. So here is what I have done:
Created a function in
I have updated only three tests so far: DatabaseCollation,SuspectPage,ValidDatabaseOwner. Previously the longest I waited for a single test to finish was ~90 minutes before I killed it. With the changes the three tests were done on all 700 databases in 43 seconds.
Please have a look at let me know what you think
Thank you @michalporeba
I would not want the function in the test, we would move that to the internal functions folder but this is a great start
I would be interested to know your thoughts as to the better performing (at a larger scale) option
1, Replace calls to Get-DbaDatabase with an internal helper function returning an object with verifiable properties that we can test.
2 Call to Get-DbaDatabase at the top of the script providing an object that can be tested thorughout the tests
This (number 2 above ) is probably my least favourite
3 Internal helper function (using get-sqlinstance) to get the names of the databases on the server and each test uses that to gather the correct information for each test
Pros - enables lightweight calls to each instance and quicker response
3a - If we were to use 3 create helper functions for gathering the information for each test these could use SMO, T-SQL or magic to get the required information - the only requirement is that is accurate and performant
Pros - Enables each test to be improved in its own right for performance
I suffered this problem in get-dbadatabaseState and already tried a workaround: worked well.
Was thinking to add the same to get-dbadatabase as well because more and more commands are impacted if the targeted instance has, let's say, an offline db, and you target a single database, you still see SMO trying to enumerate something on the not-targeted database.
The workaround would filter based on the query and build SMO infos just for the required databases.
In the end,
That being said, get-dbadatabase is going to have a speedup when "preselecting" databases, but if downlevel you need the full SMO for each and every database .... it's going to not be that faster.
So for me the option #2 doesn't solve much, as even a single execution of
Personally I would strongly prefer option #1 with cashing. Get the information once and run all the necessary tests locally. (Perhaps a couple of times for different tests if it makes sense to have multiple functions)
As to the placement of the helper function, I started with the internal/functions but then I realized it is really part of the test, and it might need to be modified with every new test. That is why I moved it to the test file, I wanted to avoid changes to the 'internal' stuff with every new test we write. I see it as data collection for the test, nothing else. But we can move it back if you feel strongly about it.
I know the general advice is on change per branch, per PR, but here the change is in how the tests collect the information, so for the existing tests I think it makes sense to make all the changes in one branch, and one bigger PR.
I'll keep pushing today the way I started to see if I find any other issues.
not to push this down but maybe wait a day or two for that PR to be in.
Adding this in here as it was discussed in slack
mostly we are looping through the databases and calling a dbatools command
Which would be a start in reducing the time
as @wsmelton says
he only problem with that is commands being called that iterate over that Database class in SMO...which Find-DbaDuplicateIndex does. Anything that touches
So I think we will find that we will also want ot look at the paces where dbatools commands use that class
So having read all of your wise comments again and again I'm getting a bit confused so... I'll explain my favourite solution without necessarily referencing to individual ideas we have discussed (and numbered) previously.
But here is a disclaimer first: I'm not looking for a 'quick fix'. I don't mind putting in the hours if that means the checks are accurate and fast. Also, any use of SMO objects in this context adds, in my opoinion, unnecessary delays and memory requirements.
Problem 1: How to get the data for the checks
I hear you @SQLDBAWithABeard when you say that the tests should be independent. Why would we pull all that information in if we want to run only a single test. But let's think about it. What are use cases for running a single test, say
In my testing against large number of databases I find the performance benefit of such caching approach significant even when compared to T-SQL scripts for each test.
Problem 2: Support of wide range of SQL Server versions
Said that we will have to accept that probably there will be tests which we cannot perform efficiently with T-SQL on a specific version (DMVs are added with every version) and in those cases we should be smart enough (it is easy to check the version) we could revert to the SMO.
For me it would be very important to tag the tests which might be slow (
Problem 3: Checks for the latest features
I propose that in those cases we skip the test (using the
If we accept that it is not difficult to imagine, that we can further make a dynamic distinction weather the test on the specific version is implemented in a 'fast' or 'slow' way. We could have a configuration option, or a parameter (which would be useful only for big instances) where we could say
Problem 4: Beauty and Simplicity
And of course there are always ways to hide complexity from people who are just curious what the tests do, and would just like to read them without necessarily seeing all the details.
Next Steps (as I see it)
That is how I see it. At least the big picture and a proposed way to quickly, but with necessary consideration and scrutiny, to move to a more flexible and better performing solution.
TL;DR: Don't worry. Just let me do it, and we will figure out the details as and when we get there. PR at a time.
I like your approach and yes I agree totally that using SMO is MUCH better for different versions.
If you are happy to run with this I am very happy for you to do so. I agree with the direction that are taking and it is a much better solution overall than my suggestion.
Please carry on doing this awesome work, we are delighted that you are here (note - you may now become our test performance guru :-) )
I don't see that as the norm. If I've run the whole dbachecks against my enterprise and I've gone through maintenance window for many of the servers. The sole purpose would be to do a spot check to see if you've fixed the major issues. It would be based on how the data from running all checks is handled...because I would not run all the checks to just verify a few items (that is what spots checks would be for).
dbatools is a tool for people to use in which ever method that they wish.
dbachecks is a tool for spot checks,
These are just the ones that we know about right now a couple of weeks in and I am sure there will be other use cases as people think about how they can do this :-)
But lets get back to the issue at hand. We need to start to move forward with an identified issue which is the poor performance of dbachecks against instances with many databases.
Following the main points of @michal above which I in general agree with I would like us to begin to move forward this issue iteratively and agile
I propose that this can be done in this manner.
We can leave all of the database tests as they are and pick the one that is taking the longest and alter that to use the new Get-DbcDatabase command make sure it does fix the issue, commit, test, release and move onto the next one.
I would much rather do this and get feedback quickly than refactoring in a massive way requiring a lot of regression for a lot of tests
Annoyingly I can access GitHub from the plane but not Slack :-(
We also have this user voice opened, please feel free to upvote - https://feedback.azure.com/forums/908035-sql-server/suggestions/33535612-smo-enumerations-slow-with-hundreds-of-databases
Which replaces Get-DbaDatabase with Connect-DbaInstance where we just pass the database name to the dbatools command.
When I just get the database names the difference is significant over 10 runs against a local instance
0 Using Connect-DbaInstance - Total Milliseconds = 914.1239
When I test this branch by running
Invoke-DbcCheck -Check Database -ExcludeCheck TestLastBackupVerifyOnly -Show Fails
10 times against 4 local instances the results are not as significant but still worthwhile
With Get-DbaDatabase and Show Fails - Total Milliseconds 75709.562
With Connect-DbaInstance and Show Fails - Total Milliseconds 53653.0855
The reason for this is to improve performance quickly and also because I have been asked both by clients and other users how to run just one or a couple of checks. Some people are building configurations for just a couple of checks so that if
"Incident like Issue 54321 occurs - ImportDbcConfig Thethingthatcaused54321test.json"
which only has 3 checks in it for quick resolution
I have just noticed how much more testing you have done. I am trying the Connect-DbaInstance approach, but so far it's been running for close to 10 minutes and still no results while my approach delivers test results of all updated checks in under 100 seconds.
Also, the last comment might be on a wrong issue but it gave me an idea. I'll create a new issue for it.
EDIT: it has now finished. 12:55 (so 775 seconds)