Skip to content

Latest commit

 

History

History
170 lines (151 loc) · 19.6 KB

004.monitoring.SQLServer.DMV.with.elasticsearch.md

File metadata and controls

170 lines (151 loc) · 19.6 KB

Troubleshooting SQLServer with Elasticsearch: Send DMV query output to Elasticsearch

This example shows how ship to elasticsearch a set of linux system perfomance indicators using top, vmstat, iostat and netstat.

  • The information will be obtained from periodical execution of DMV queries.
  • Data will be indexed in Elasticsearch.
  • Insight on the data will be attained with Kibana

Something like this:

You will need to have installed:

We will now:

  1. Setup pertinent queries: Locks, Wait events, Clients, Performance counters, CPU usage, Query/trigger statistics
  2. Define query gather input
  3. Define elasticsearch output
  4. Run probespawner
  5. Gain database insight through kibana
1. Setup pertinent queries

Take a quick look at: https://msdn.microsoft.com/en-us/library/ms188754.aspx

Not much juice but with further reading you'll find the following resources very profitable: http://dba.stackexchange.com/questions/50899/how-do-you-see-which-database-on-a-server-uses-most-resources https://sqlserverperformance.wordpress.com/tag/dmv-queries/

In this example, we're going to setup queries to find out if locks occur, what wait events are going on (what is the database/some query waiting for), who is accessing the database, how to get performance counters from selecting the DMV, report on the database's CPU usage and setup the road for some query profiling.

This is just a subset of many queries used to troubleshoot and measure database performance but it'll hint you for many others.

1.1 Locks
SELECT CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'locks' as class, t1.resource_type AS [locktype], DB_NAME(resource_database_id) AS [database], t1.resource_associated_entity_id AS [blkobject],t1.request_mode AS [lockreq], t1.request_session_id AS [waitersid], t2.wait_duration_ms AS [waittime], (SELECT REPLACE(REPLACE(coalesce([text],''), CHAR(13),'' ),CHAR(10),'') FROM sys.dm_exec_requests AS r WITH (NOLOCK) CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) WHERE r.session_id = t1.request_session_id) AS [waiter_batch], (SELECT REPLACE(REPLACE(coalesce(SUBSTRING(qt.[text],r.statement_start_offset/2, (CASE WHEN r.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), qt.[text])) * 2 ELSE r.statement_end_offset END - r.statement_start_offset)/2),''), CHAR(13),'' ),CHAR(10),'') +CONVERT(varchar(255),r.[sql_handle],1) FROM sys.dm_exec_requests AS r WITH (NOLOCK) CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS qt WHERE r.session_id = t1.request_session_id) AS [waiter_stmt], t2.blocking_session_id AS [blockersid], (SELECT REPLACE(REPLACE(coalesce([text],''), CHAR(13),'' ),CHAR(10),'') +CONVERT(varchar(255),p.[sql_handle],1) FROM sys.sysprocesses AS p CROSS APPLY sys.dm_exec_sql_text(p.[sql_handle]) WHERE p.spid = t2.blocking_session_id) AS [blockerstmt] FROM sys.dm_tran_locks AS t1 WITH (NOLOCK) INNER JOIN sys.dm_os_waiting_tasks AS t2 WITH (NOLOCK) ON t1.lock_owner_address = t2.resource_address OPTION (RECOMPILE)
1.2 Wait events
select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'waitevents' as class,  wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms, SUM (wait_time_ms) OVER() total_wait_time_ms FROM sys.dm_os_wait_stats WITH (NOLOCK)  WHERE [wait_type] NOT IN (N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP',N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE',N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'REQUEST_FOR_DEADLOCK_SEARCH',N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY', N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES',N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_HOST_WAIT',N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT') AND waiting_tasks_count > 0
1.3 Clients
SELECT CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', 'sessions' as class, c.session_id as session_id, c.net_transport as net_transport, c.encrypt_option as encrypt_option, c.auth_scheme as auth_scheme, s.host_name as hostname, s.program_name as program, s.client_interface_name as interface, s.login_name as login, s.nt_domain as domain, s.nt_user_name as username, s.original_login_name as loginname, CONVERT(nvarchar(30), c.connect_time, 127) as connecttime, CONVERT(nvarchar(30), s.login_time, 127) as logintime FROM sys.dm_exec_connections AS c JOIN sys.dm_exec_sessions AS s ON c.session_id = s.session_id
1.4 Performance counters
select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'perfcounters' as class, rtrim(coalesce(counter_name,' ')) as counter_name, cntr_value FROM sys.dm_os_performance_counters with (nolock) where counter_name in ('Checkpoint pages/sec','SQL Compilations/sec','SQL Re-Compilations/sec','Batch Requests/sec','Memory Grants Pending','User Connections','Lazy Writes/sec') or (OBJECT_NAME like '%Buffer Node%' AND counter_name in ('Page life expectancy')) union all select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'perfcounters' as class, 'Buffer Cache Hit Ratio', cast(p1.cntr_value/p2.cntr_value*100 as numeric(36,2))  FROM sys.dm_os_performance_counters p1 with (nolock), sys.dm_os_performance_counters p2 with (nolock) where p1.counter_name in ('Buffer cache hit ratio') and p2.counter_name in ('Buffer cache hit ratio base')
1.5 Cpu usage
select top(1) CONVERT(nvarchar(30), DATEADD(ms, -1 * ((SELECT cpu_ticks/(cpu_ticks/ms_ticks) FROM sys.dm_os_sys_info) - [timestamp]), GETUTCDATE()), 127) as '@timestamp',  @@SERVERNAME as servername, @@SERVICENAME as servicename, 'cpuusage' as class, SQLProcessUtilization as sqlserver_cpu_usage, SystemIdle as system_idle, (SystemIdle - SQLProcessUtilization) as system_other_cpu from (SELECT record.value('(./Record/@id)[1]', 'int') AS record_id, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle],record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]','int') AS [SQLProcessUtilization], [timestamp] FROM (SELECT [timestamp], CONVERT(xml, record) AS [record] FROM sys.dm_os_ring_buffers WITH (NOLOCK) WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' AND record LIKE N'%<SystemHealth>%') AS x) as y order by record_id DESC
1.6 Query trigger profiling

Here we show the JSON block for the queries (not the queries themselves).
Notice the parameters on these, at each interval we'll only get data for the dorment period of probespawner.

    {
    "statement": "select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'querystatistics' as class,  x.*, REPLACE(REPLACE(coalesce(SUBSTRING(qt.TEXT,x.statement_start_offset/2 +1, (CASE WHEN x.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.TEXT)) * 2 ELSE x.statement_end_offset END - x.statement_start_offset)/2),''), CHAR(13),'' ),CHAR(10),'') AS query_text from ( select qs.sql_handle,query_hash  ,query_plan_hash, statement_start_offset, statement_end_offset, sum(qs.execution_count) as execution_count, sum(qs.total_rows) as total_rows, sum(qs.total_elapsed_time) as total_elapsed_time, sum(total_worker_time) as total_worker_time, sum(total_logical_reads) as total_logical_reads,  sum(total_physical_reads) as total_physical_reads  from sys.dm_exec_query_stats qs with (nolock) where qs.last_execution_time > CONVERT(varchar(23),?) and qs.last_execution_time <= GETDATE()) group by qs.sql_handle,query_hash,query_plan_hash, statement_start_offset, statement_end_offset) as x CROSS APPLY sys.dm_exec_sql_text(x.sql_handle) AS qt",
    "parameter": [ "$cycle.laststartdt" ]
    },
    {
    "statement": "select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'processtriggerstats' as class,  x.*, REPLACE(REPLACE(qt.[text], CHAR(13),'' ),CHAR(10),'') AS query_text from ( select db_name(database_id) as dbname,type_desc, ps.sql_handle, ps.plan_handle, sum(ps.execution_count) as execution_count,sum(ps.total_elapsed_time) as total_elapsed_time, sum(total_worker_time) as total_worker_time, sum(total_logical_reads) as total_logical_reads,  sum(total_physical_reads) as total_physical_reads from sys.dm_exec_procedure_stats ps with (nolock) where ps.last_execution_time >  CONVERT(varchar(23),?) and ps.last_execution_time <= GETDATE() group by db_name(database_id),type_desc,ps.sql_handle,plan_handle) as x CROSS APPLY sys.dm_exec_sql_text(x.sql_handle) AS qt union all select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'processtriggerstats' as class,  x.*, REPLACE(REPLACE(qt.[text], CHAR(13),'' ),CHAR(10),'') AS query_text from ( select db_name(database_id) as dbname,type_desc, ps.sql_handle, ps.plan_handle, sum(ps.execution_count) as execution_count,sum(ps.total_elapsed_time) as total_elapsed_time, sum(total_worker_time) as total_worker_time, sum(total_logical_reads) as total_logical_reads,  sum(total_physical_reads) as total_physical_reads from sys.dm_exec_trigger_stats ps with (nolock) where ps.last_execution_time > CONVERT(varchar(23),?) and ps.last_execution_time <= GETDATE() group by db_name(database_id),type_desc,ps.sql_handle,plan_handle) as x CROSS APPLY sys.dm_exec_sql_text(x.sql_handle) AS qt",
    "parameter": [ "$cycle.laststartdt", "$cycle.laststartdt" ]
    }
2. Define query gather input

Refer to the JSON file here.
The queries stated above are all there. Please pay atention to the ones with the parameters set for the two prepared statements (on "query trigger profiling").

3. Define elasticsearch output
    "ElasticsearchOutput": {
      "class": "elasticsearch",
      "cluster" : "fuckup",
      "host" : "127.0.0.1",
      "port" : 9300,
      "bulkActions": 1000,
      "indexPrefix": "sqlserver",
      "indexSuffix": "@timestamp",
      "type": "jdbc",
      "index_settings": {
        "index" : {
          "analysis" : {
            "analyzer" : {
              "default" : {
                "type" : "standard",
                "filter": ["standard", "asciifolding","lowercase"]
              },
              "sortable": {
                "tokenizer": "keyword",
                "filter": ["lowercase"]
              }
            }
          }
        }
      },
      "type_mapping": {
        "jdbc": { 
          "properties" : { 
            "@timestamp" : { "type" : "date" },
            "class" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "query_hash" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "query_plan_hash" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "query_text" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "servername" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "servicename" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "sql_handle" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "auth_scheme" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "hostname" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "interface" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "login" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "loginname" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "net_transport" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "program" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "wait_type" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "dbname" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "plan_handle" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "type_desc" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "counter_name" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "blockerstmt" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "lockreq" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" },
            "locktype" : { "type" : "string", "index" : "not_analyzed", "omit_norms": "true" }
          } 
        }
      }
   }
4. Run probespawner

With the JSON configuration file assembled, and Microsoft's JDBC driver, run probespawner.
We've stored the JSON file here for your reference (file " microsoftsqlserver.mssql.performance.to.elasticsearch.json").

Run as usual:

[user@localhost probespawner-master]$ ./probespawner.sh microsoftsqlserver.mssql.performance.to.elasticsearch.json
2015-06-08 14:19:28,709 dummyprobe     INFO     Started cycle at 2015-06-08T14:19:28.708+01:00
2015-06-08 14:19:28,746 jelh           INFO     Setting Elasticsearch options: cluster.name = fuckup
Jun 08, 2015 2:19:29 PM org.elasticsearch.plugins.PluginsService <init>
INFO: [Aardwolf] loaded [], sites []
2015-06-08 14:19:32,848 cooldbprobe    INFO     SELECT CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'locks' as class, t1.resource_type AS [locktype], DB_NAME(resource_database_id) AS [database], t1.resource_associated_entity_id AS [blkobject],t1.request_mode AS [lockreq], t1.request_session_id AS [waitersid], t2.wait_duration_ms AS [waittime], (SELECT REPLACE(REPLACE(coalesce([text],''), CHAR(13),'' ),CHAR(10),'') FROM sys.dm_exec_requests AS r WITH (NOLOCK) CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) WHERE r.session_id = t1.request_session_id) AS [waiter_batch], (SELECT REPLACE(REPLACE(coalesce(SUBSTRING(qt.[text],r.statement_start_offset/2, (CASE WHEN r.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), qt.[text])) * 2 ELSE r.statement_end_offset END - r.statement_start_offset)/2),''), CHAR(13),'' ),CHAR(10),'') +CONVERT(varchar(255),r.[sql_handle],1) FROM sys.dm_exec_requests AS r WITH (NOLOCK) CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS qt WHERE r.session_id = t1.request_session_id) AS [waiter_stmt], t2.blocking_session_id AS [blockersid], (SELECT REPLACE(REPLACE(coalesce([text],''), CHAR(13),'' ),CHAR(10),'') +CONVERT(varchar(255),p.[sql_handle],1) FROM sys.sysprocesses AS p CROSS APPLY sys.dm_exec_sql_text(p.[sql_handle]) WHERE p.spid = t2.blocking_session_id) AS [blockerstmt] FROM sys.dm_tran_locks AS t1 WITH (NOLOCK) INNER JOIN sys.dm_os_waiting_tasks AS t2 WITH (NOLOCK) ON t1.lock_owner_address = t2.resource_address OPTION (RECOMPILE)
2015-06-08 14:19:37,483 cooldbprobe    INFO     Finished query q0 cycle in 4633
2015-06-08 14:19:37,483 cooldbprobe    INFO     select CONVERT(nvarchar(30), GETUTCDATE(), 127) as '@timestamp', @@SERVERNAME as servername, @@SERVICENAME as servicename, 'waitevents' as class,  wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms, SUM (wait_time_ms) OVER() total_wait_time_ms FROM sys.dm_os_wait_stats WITH (NOLOCK)  WHERE [wait_type] NOT IN (N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP',N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE',N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'REQUEST_FOR_DEADLOCK_SEARCH',N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY', N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES',N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_HOST_WAIT',N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT') AND waiting_tasks_count > 0
2015-06-08 14:19:38,184 cooldbprobe    INFO     Finished query q1 cycle in 699
...
...
5. Gain database insight through kibana

Probespawner instance is indexing query data to some elasticsearch instance.
Configure kibana for your elasticsearch instance and setup a dashboard for index: [sqlserver-]YYYY.MM.DD Below a few kibana3 screenshots with the visualizations on MSSQL server's DMV data.

From here on, exploratory data analysys comes intuitivelly.
If you pay close atention to the dashboard above first row/first panel has the event histogram.
On the second panel we'll add terms for field "class" to select which domain (from our queries) we'll want to explore.
After this, add any histogram with the pertinent value field to gain awareness of your database condition.

As an example, consider the number of sessions as given from the "performancecounters" class. We select the given class and add another terms panel with the "counter_name".

Lastly, we add another histogram panel for "counter_value" field and we click on "user sessions" to see how many our database has:

From here on, it's just kibana proficiency.