Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APPSERV-14 Slow SQL tracing views #4422

Merged
merged 31 commits into from Jan 22, 2020
Merged

Conversation

jbee
Copy link
Contributor

@jbee jbee commented Jan 10, 2020

Summary

Adds on server side:

  • a general data annotation mechanism to data collection (see details below)
  • SQL max execution time metric based on SQL tracing
  • a watch for SQL max execution time with threshold from pool configuration
  • extends watch conditions
  • adds series tag name wildcards, ala ?:foo (any tag name with value foo including no tag) to allow explicit selection of family of series where some have further tags and others have not like used for lists of health check alerts or a general alert list
  • REST API for data requests supports better selection of the needed type of data (points, alerts, annotations, watches) to avoid sending data between client and server that isn't used by the client
  • extends request tracing metric so it does allow adding useful watch
  • adds a watch to cause alerts in case request tracing is above threshold
  • adds "compaction" for alert frames to limit memory consumption and client-server data exchange

Adds on client side:

  • shows thresholds of multiple watches in charts as stacked threshold indicator lines/areas
  • fixes threshold indicator gradient and line alignment
  • fixes alert condition formatting
  • adds annotations to alerts with setting to hide them
  • adds new widget type Annotations that lists just the matching annotations in a list or table
  • adds new widget setting Mode to switch between Table or List layout for Annotations widgets
  • widgets now have a page wide unique id so that same series (pattern) can be used multiple times on the same page as done for SQL, request tracing or alerts.
  • adds a new SQL preset page that show slow SQL tracing information
  • fixes alert table vertical overflow by using a scrollbar when needed
  • adds list of tracing data and tracing metric with alerts to request tracing page
  • adds a general Alerts page preset that shows all alerts organised in groups

Data Annotations

Annotations are a new concept in collection of metrics to attach key-value data to a particular metric at a particular point in time. Such annotation should only be done in situations that are in some way noteworthy such as a metric's value exceeding a particular threshold. In general annotations are event like information that are stored per series in a queue of fixed size. Newer annotations eventually replace older annotations to avoid number of annotations growing out of control.
Current limit is set at 20 per metric series.

The annotation data is used by the client to enhance views such as lists of alerts with more data for the period of the alert or to allow widgets of type annotation that for example can list the slow SQL meta data in table.

Annotations provide a general mechanism that makes such values available for any metric and from all instances. To verify the general nature of the concept it has also been applied to request tracing which allows to also list the trace span attributes in a table.

Testing

New unit tests were added for

  • extended variants of Condition logic
  • extended Series patterns using the added ? wild-card
  • Alert.Frame compaction

Testing Performed

Manual testing of various features. This PR does change general parts and extends others. It makes sense to test usual workflows even though they were not an explicit target of the changes.

Testing Instructions

General Setup:

  1. build, install and start the server
  2. use set-monitoring-console-configuration --enabled=true to deploy MC
  3. open MC at http://localhost:8080/monitoring-console/
  4. make sure browser cache for JS/CSS is cleared for MC's domain
  5. check that following pages do exist: SQL and Alerts (if not most likely a browser cache issue - or get in touch)

Testing slow SQL tracing:

  1. Open admin console at http://localhost:4848/
  2. navigate to Resources => JDBC => JDBC Connection Pools
  3. select H2Pool tab Advanced
  4. change Slow Query Log Threshold from -1 (disabled) to 0.001 (1ms)
  5. open page SQL in MC
  6. deploy e.g. https://github.com/javaee/tutorial-examples/tree/master/persistence/order (some app using the default pool)
  7. deployment should already cause alert(s) and annotations being shown on the SQL page.
  8. use the order app some and see how "slow" queries cause alerts and SQL is available in the table on SQL page

NOTE: the order app causes errors during deploy on server restart. Undeploy the app and redeploy with server running. AFAIK this is a shortcoming of the app's setup/scripts.

Testing Request Tracing (again):

  1. Open admin console at http://localhost:4848/
  2. navigate to Configurations => server-config => Request Tracing
  3. check Enabled
  4. set Target Count to 2, Time Value to 20, Time Unit to SECONDS
  5. set Threshold Value to somewhere between 5 to 20, Threshold Unit to MILLISECONDS
  6. save changes
  7. open MC Request Tracing page. The polling of MC itself should be enough to cause some data to appear in the widgets

Testing Health Checks (again):

  1. Open admin console at http://localhost:4848/
  2. navigate to Configurations => server-config => HealthCheck
  3. open tab CPU Usage , check Enabled and save changes
  4. eventually open other tabs to enable more checks, the Enabled checkbox on General tab does not have to be checked.
  5. open Health Checks page in MC and check that the graphs for the enabled checks do show data and thresholds
  6. go back to admin console configuration and change a threshold, e.g. heap usage to a level that is in between the low and high point currently present for your GC cycle (see graph).
  7. check that changed threshold is updated in graph and that alerts are caused when heap usage is exceeded (note that the alert condition requires the threshold to be exceeded for average of last 15 points)

Further things in MC to test (again)

  • create a new page, name it, add some widgets, remove some widgets
  • reset preset pages after changes
  • change settings of widgets
  • change colour scheme
  • clear local storage for the page and reload the page (full reset or client side settings)

@jbee jbee added 3:DevInProgress PR: DO NOT MERGE Don't merge PR until further notice labels Jan 10, 2020
@jbee jbee self-assigned this Jan 10, 2020
@jbee
Copy link
Contributor Author

jbee commented Jan 15, 2020

jenkins test please

@jbee jbee changed the title [WIP] APPSERV-14 Slow SQL tracing views APPSERV-14 Slow SQL tracing views Jan 15, 2020
@jbee
Copy link
Contributor Author

jbee commented Jan 15, 2020

jenkins test please

public void setSlowQueryThresholdInSeconds(String seconds) {
spec.setDetail(DataSourceSpec.SLOWSQLLOGTHRESHOLD, seconds);
double threshold = Double.parseDouble(seconds);
if (threshold > 0) {
if (sqlTraceDelegator == null) {
sqlTraceDelegator = new SQLTraceDelegator(getPoolName(), getApplicationName(), getModuleName());
}
sqlTraceDelegator.registerSQLTraceListener(new SlowSQLLogger((long)(threshold * 1000), TimeUnit.MILLISECONDS));
Copy link
Contributor Author

@jbee jbee Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. This was not needed and even could cause outdated threshold to be used as listeners installed will not be replaced when a listener of same class is attempted to be installed again (which would happen later when a connection is created).

@@ -506,7 +511,7 @@ public Object getConnection(Subject sub, javax.resource.spi.ConnectionRequestInf

if (sqlTraceDelegator == null) {
if ((requestTracing != null && requestTracing.isRequestTracingEnabled())
|| (connectionPool != null && isSlowQueryLoggingEnabled())) {
|| (isSlowQueryLoggingEnabled())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. isSlowQueryLoggingEnabled includes the null check.

*/
public class SQLTraceStoreAdapter implements SQLTraceListener {

private static ThreadLocal<SQLQuery> currentQuery = new ThreadLocal<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. Sadly this cannot share code with SQLTraceLogger which does almost the same as this has to be its own thread local otherwise the manipulation of the query happens multiple times if both listeners are installed which messes up the query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a simplified version of this comment to the code


@Service
@Singleton
public class SQLTraceStoreImpl implements SQLTraceStore, MonitoringDataSource, MonitoringWatchSource {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. Most interesting part is the class location. It had to be in a module that supports HK2 stuff and which ideally already depends on modules dealing with SQL tracing.

.amber(600L, 3, true, 600L, 5, false)
.green(-400L, 1, false, null, null, false);
addWatch(watch);
addWatch(new Watch("Metric Collection Duration", new Metric(new Series("ns:monitoring CollectionDuration")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. The watch for a metric provided by InMemoryMonitoringDataRepository had to be added here as this would otherwise cause a cyclic dependency between the two.

@@ -123,12 +126,12 @@ void changedConfig(boolean enabled) {
if (!enabled) {
checker.stop();
} else {
checker.start(executor, 2, SECONDS, this::checkWatches);
checker.start(executor, 1, SECONDS, this::checkWatches);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. Initially I thought evaluating alerts every two seconds is good enough. While this is true in general this does allow to miss value spikes that should cause an alert in case of most basic condition that only looks at the latest value. So it was changed to 1 second to make sure each value is considered.

@jbee
Copy link
Contributor Author

jbee commented Jan 17, 2020

jenkins test please

import org.jvnet.hk2.annotations.Contract;

@Contract
public interface SQLTraceStore {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. Main reason this interface is needed is because of module dependencies.

public MonitoringDataCollector annotate(CharSequence metric, long value, String... attrs) {
prefixed.setLength(prefix.length());
self.annotate(prefixed.append(metric), value, attrs);
return this;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. It was a bug to not return this for any of the methods of MonitoringDataCollector because when the instance is used with chaining it should be the wrapper that is called (so it does its prefix thing), not the instance returned by self (that would not do the prefix thing). In practice this bug never had an effect as the only usage would not use chaining but for future this now works properly.

@@ -102,7 +102,7 @@ private static MemoryUsage getMemoryUsage() {

@Override
public void collect(MonitoringWatchCollector collector) {
collectUsage(collector, "ns:health HeapUsage", "Heap Usage", 10, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. 10 turned out to be too flaky so I increased it.

@jbee jbee removed 3:DevInProgress PR: DO NOT MERGE Don't merge PR until further notice labels Jan 17, 2020
Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 😃
Nothing jumps out to me as egregious, just a few nit-picks.

I ran the JPA tests in EE7 samples against it to trigger the slow SQL rather than your provided app.
I haven't checked it all works against an instance other than the DAS yet, I'll get round to that in a bit.

*/
public class SQLTraceStoreAdapter implements SQLTraceListener {

private static ThreadLocal<SQLQuery> currentQuery = new ThreadLocal<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a simplified version of this comment to the code

@jbee
Copy link
Contributor Author

jbee commented Jan 22, 2020

jenkins test please

@jbee
Copy link
Contributor Author

jbee commented Jan 22, 2020

@Pandrex247 addressed your comments.

I also fixed a potential memory leak I noticed. When SQL traces would hit the new store but monitoring would be disabled the queues could grow without limit. I applied same technique I used for request tracing where the queue is limited to 50 entries. Should it be that size when new entry is added the oldest entry is thrown away.

@jbee jbee merged commit fe00e88 into payara:master Jan 22, 2020
@jbee jbee added this to the 5.201 milestone Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants