Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: inconsistencies in sql*Requests queries #10553

Merged
merged 2 commits into from
Mar 30, 2022
Merged

fix: inconsistencies in sql*Requests queries #10553

merged 2 commits into from
Mar 30, 2022

Conversation

spaghettidba
Copy link
Contributor

@spaghettidba spaghettidba commented Feb 1, 2022

Required for all PRs:

  1. unified and fixed inconsistencies for sql*Requests queries for all types of input (on-prem, sqlDB and sqlMI)
    • removed #blockingSessions and incorporated in the main query with a COUNT() OVER(), in order to avoid inconsistencies between the main query and the snapshot saved in the temp table.
    • standardized the retrieval of session_db_name, blocking_session_id and request_id
  2. added more precise version check for query "sqlServerDatabaseIO" in sqlserverqueries.go

Fixes: #10741

@spaghettidba spaghettidba changed the title Fixed sql*Requests queries fix: sql*Requests queries Feb 2, 2022
@spaghettidba spaghettidba changed the title fix: sql*Requests queries fix: inconsistencies in sql*Requests queries Feb 2, 2022
@spaghettidba
Copy link
Contributor Author

@powersj @srebhan @denzilribeiro @m82labs
Hey folks! How do we move this forward? I appreciate your feedback

@powersj
Copy link
Contributor

powersj commented Feb 25, 2022

fixed inconsistencies
more precise

Do these changes in this PR make changes to the resulting metrics that these queries were creating?

@telegraf-tiger
Copy link
Contributor

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

🥳 This pull request decreases the Telegraf binary size by -3.56 % for linux amd64 (new size: 137.8 MB, nightly size 142.9 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_i386.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz
static_linux_amd64.tar.gz

@spaghettidba
Copy link
Contributor Author

Do these changes in this PR make changes to the resulting metrics that these queries were creating?

Yes. In particular, current version of the code saves blocking sessions to a #temp table, but when the main query runs, those queries might not be executing or blocking any more. I got wrong results with this code multiple times.
The new version of the code evaluates blocking at the same time as the outer query.

@powersj
Copy link
Contributor

powersj commented Feb 25, 2022

Would you please file a bug with an example?

The biggest hesitation to take code changes like this is when it makes changes to how existing metrics are working and then it breaks other users. If you can document how this improves the current situation and preferrable how it fixes an existing bug that users may be hitting, it is much more likely to get looked at.

@spaghettidba
Copy link
Contributor Author

There you go #10741

@powersj
Copy link
Contributor

powersj commented Feb 25, 2022

@dupuyjs - do you have any thoughts on making this change?

@spaghettidba
Copy link
Contributor Author

Hi folks! Is there anything I can do to move this forward? Tests? Code reviews?

@Trovalo
Copy link
Collaborator

Trovalo commented Mar 3, 2022

Hi @powersj, @spaghettidba

About those

Do these changes in this PR make changes to the resulting metrics that these queries were creating?

The biggest hesitation to take code changes like this is when it makes changes to how existing metrics are working and then it breaks other users. If you can document how this improves the current situation and preferrable how it fixes an existing bug that users may be hitting, it is much more likely to get looked at.

I had a look at it and it's not a structurally breaking change as the structure of final output does not change, the only difference is how the result is calculated and users won't even notice it.
The change just achieves the desired result (tracking blocking sessions) in a better way.

Now:
There is a 2 steps process

  1. a "list" is fetched and saved (#blockingSessions)
  2. the list is used to consider additional rows in the final result

The problem of having 2 steps is that it makes the extraction async, the list (point 1) might be outdated even after 1ms... meaning the final result can be imprecise (missing some data or having too many)

PR::
The PR turns the extraction query to just one step, removing any async related issue

I'm definitely in favor of this change

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trovalo thanks for taking a look, that breakdown gives me the confidence I need.

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Mar 3, 2022
@reimda
Copy link
Contributor

reimda commented Mar 29, 2022

I don't have sqlserver set up so I can't run the integration tests or try this out. @Trovalo and @spaghettidba have you run the tests and/or run telegraf to exercise this new query code? Thanks!

@spaghettidba
Copy link
Contributor Author

have you run the tests and/or run telegraf to exercise this new query code? Thanks!

This code has been in use for a couple of months here on over 200 SqlServer instances and it is more correct and more efficient than the previous code.
Thanks!

@reimda
Copy link
Contributor

reimda commented Mar 30, 2022

This code has been in use for a couple of months here on over 200 SqlServer instances

Sounds like sufficient testing! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL Server requests query is inconsistent regarding blocking
4 participants