Skip to content

Commit

Permalink
sync public
Browse files Browse the repository at this point in the history
  • Loading branch information
Songmu committed Mar 15, 2019
1 parent a9db3cc commit 935613f
Show file tree
Hide file tree
Showing 8 changed files with 570 additions and 0 deletions.
46 changes: 46 additions & 0 deletions content/blog/entry/anomaly-detection-for-roles/about.md
@@ -0,0 +1,46 @@
---
Title: 'New feature・How to use Anomaly Detection for roles '
Date: 2019-03-08T10:36:25+09:00
URL: https://mackerel.io/blog/entry/anomaly-detection-for-roles/about
EditURL: https://blog.hatena.ne.jp/mackerelio/mackerelio.hatenablog.mackerel.io/atom/entry/17680117126989848371
---

Hello. Mackerel Team Director id:daiksy:detail here.

The beta version of Mackerel’s new feature ‘Anomaly Detection for roles’ which uses machine learning is now being offered. You might have heard about the development of this feature at Meetup and other past events.

[https://mackerel.io/docs/entry/howto/anomaly-detection-for-roles:embed:cite]

Anomaly detection differs slightly from the way monitoring has been used up until now. In this article, we’ll take a look at what anomaly detection is and how it can be used.

### What is Anomaly Detection for roles?

‘Anomaly Detection for roles’ is a function that uses machine learning to detect abnormalities in the server without having to set special monitoring items for hosts within a role in Mackerel.

Up until now, a substantial amount of experience and know-how regarding server monitoring was needed to be able to configure monitors precisely. Let’s say you want an alert issued when the CPU load gets high, but it’s actually quite difficult to determine what percentage of CPU usage is considered high-load, or what thresholds should be set for which items when detecting for application abnormalities. In order to be able to make these kinds of decisions, operational experience and technical knowledge are needed. On top of this, the idiosyncrasies of applications change daily, and if left alone, monitor configurations can become obsolete, so regular maintenance is a must.

‘Anomaly Detection for roles’ can help with these types of monitoring complications.

With Mackerel, [it’s recommended that you organize your servers into roles](https://mackerel.io/docs/entry/howto/create-services-and-roles). The role being the role that server plays in a service. By appropriately setting roles, you can classify groups of servers with similar load trends such as "application servers" or "database servers". Mackerel's ‘Anomaly Detection for roles’ feature uses machine learning to learn a server's "normal state" from past trends of metrics over the entire role. Newly posted metrics are monitored against the learned results, and anything that is outside of the "normal state" is regarded as an anomaly, and an alert occurs. In other words, the ‘Anomaly Detection for roles’ feature detects server abnormalities without having to configure individual monitors.

### Role configuration is vital to improving detection precision

With "Anomaly Detection for roles", roles are specified as the monitoring target. Mackerel uses past system metrics from hosts that are registered in the specified role to learn trends. As previously mentioned, it is recommended that roles be categorized by the role a server plays in a service, such as application servers and database servers, because of this we can assume that a role contains servers with similar metric trends, and trends can be learned from the entire role. Consequently, if a role contains servers with significantly different trends, or those with extremely different specifications, accuracy will fall. For example, when "active" and "standby" servers coexist for a long period of time, servers with different trends get mixed together in the role.

Therefore, In order to increase the precision of Mackerel's Anomaly Detection, it is important to first properly categorize servers by roles.

### How to use Anomaly Detection for roles

‘Anomaly Detection for roles’ learns trends from past metrics. If newly posted metrics are determined to be outside of those trends, an alert will occur. This alert notification will also display the metrics that were determined to be abnormal. From this information, the user who receives the alert can estimate what kind of anomaly is occurring in the server. For example, the alert may show an increase in memory usage outside of the normal trend.

![](https://cdn-ak.f.st-hatena.com/images/fotolife/a/andyyk/20190308/20190308102821.png)

However, even if the alert shows that the detected anomaly is based on memory usage metrics, there is no guarantee that the cause of the issue applies to memory. This point requires careful attention.

‘Anomaly Detection for roles’ performs a combination of learning and judgment for trends of system metrics of hosts configured in a role. When an issue occurs in a server, more often than not several metrics are affected by the primary cause of the issue. For example, when the amount of data written to a disk increases, the network's transfer load needed to send that data also increases at the same time, and as a result, this may also affect memory usage. Even if trends change in such a complex manner, only the one metric used as the basis of detection is recorded in alerts of ‘Anomaly Detection for roles’. So, you need to be able to see the role graph transversely when an alert occurs.

Due to the nature of this feature, it’s slightly difficult to define a typical troubleshooting response such as "restart this server when this monitoring alert occurs". We recommend using ‘Anomaly Detection for roles’ to quickly detect the rare case anomalies in auxiliary monitoring applications while also configuring monitors with thresholds based on your past operational experience. You might also consider an operation cycle where you add a threshold based monitor upon receiving an alert with anomaly detection.

### Current limitations with Anomaly Detection for roles

At the moment, ‘Anomaly Detection for roles’ is only supported for Linux environments with mackerel-agent installed. Windows and Integration environments are not currently supported.
81 changes: 81 additions & 0 deletions content/blog/entry/weekly/20190304.md
@@ -0,0 +1,81 @@
---
Title: Release of Anomaly Detection for roles and more
Date: 2019-03-08T18:54:05+09:00
URL: https://mackerel.io/blog/entry/weekly/20190304
EditURL: https://blog.hatena.ne.jp/mackerelio/mackerelio.hatenablog.mackerel.io/atom/entry/17680117126990058940
---

Mackerel team CRE Miura (id:missasan:detail) here.

Thank you to everyone who came out to the Meetup last weekend. I hope everyone had a good time. The event report will be coming out soon, so be sure to keep a lookout for that.

Last week we finally released Anomaly Detection for roles, our new feature that uses machine learning. The feature is scheduled for an official release in May, but until then, you can try out the Anomaly Detection for roles feature as part of our free promotion. Be sure to give it a try and let us know what you think.

Now on to the this week’s update information.

## Release of Anomaly Detection for roles (with free promotional campaign)

Our new feature Anomaly Detection for roles (beta version), which uses machine learning, has been released. For more on how to use the feature, be sure to check out the page linked below.

[https://mackerel.io/blog/entry/anomaly-detection-for-roles/about:embed:cite]

For more details, refer to the help page linked below as well.

[https://mackerel.io/docs/entry/howto/anomaly-detection-for-roles:embed:cite]

#### We are currently offering a free promotional campaign!

During the feature’s beta period, Anomaly Detection for roles can be used for free (with no additional charges). This feature is for ‘Standard’ and ‘Trial’ plans. Be sure to take advantage of this free period and try out the new feature in a variety of different environments. We’re looking forward to your feedback!

Please note, following the feature’s official release scheduled for May, environments that have Anomaly Detection for roles enabled will automatically switchover to incur charges.

## Help and other Mackerel documents made open-source

Mackerel's Help pages and other documents are now open-source.

[https://github.com/mackerelio/documents:embed:cite]

If there are any parts that need correction regarding the Help or FAQ, we are accepting pull requests. Japanese only and pull requests in Japanese are also welcome.

[https://mackerel.io/docs/:embed:cite]

We look forward to your pull requests!

## check-log plugin now supported for log read timeout

With the release of go-check-plugins v0.28.0, the check-log plugin is now supported for log read timeouts. Up until now, if a timeout occurred while a log was being read, it would sometimes result in an error. With this release, improvements were made and timeouts during log reading can now be handled normally.

If the command configuration in the Mackerel agent configuration file is not set to array specifications (for character string specifications), there is a possibility that timeouts can not be handled normally depending on the environment. Therefore, it is recommended that you set the command configuration to array specifications.

## Operation Monitoring Solution Seminar with Cloud portal x SIOS Coati x Mackerel on March 12th (Tues)!

Together, Hatena, SIOS TECHNOLOGY, Inc., and Sony Network Communications, Inc. will be holding a seminar in Tokyo.

We’ve heard from quite a few companies that have had operational issues with the introduction of AWS. This seminar will introduce application management tools to help you automate as much as possible and ensure that AWS is on track! We’ll go over the best practices for managing AWS with Hatena's Mackerel, SIOS Technology’s SIOS Coati, and Sony Network Communications' Managed Cloud Portal.

#### Event Details
- Date and time:Tuesday, March 12, 2019 from 3:00 p.m. - 5:30 p.m.(Reception starts at 2:30 p.m.)
- Venue:Akihabara UDX 4F Next-2 (2 min. walk from Akihabara station) [[MAP](https://udx-s.jp/access/)]
- Admission:Free
- Sponsors:SIOS TECHNOLOGY, Inc., and Sony Network Communications, Inc., and Hatena, Inc.

#### Apply here

[https://www.bit-drive.ne.jp/managed-cloud/news/seminar_20190312/:embed:cite] (Japanese only)

## DevOps Hands-on ~Building a safe and secure DevOps environment with AWS and Mackerel~ on March 15th (Fri) !

Hatena and Classmethod, Inc. will hold a hands-on seminar at the Shibuya Hikarie on Friday March 15th.

At this event, Hatena (Mackerel) and Classmethod, both of which who have earned DevOps competency certifications in the AWS partner system, will explain hands-on how to build CI/CD pipeline environments that combine Mackerel and AWS Code series. This is a great opportunity to learn more about the latest DevOps environments that combine monitoring and CI/CD pipelines.

#### Event Details
- Date and time:Friday, March 15, 2019 from 2:00 p.m. - 4:30 p.m. (Reception starts at 1:30 p.m.)
- Venue:Shibuya Hikarie 11th floor Sky Lobby Hikarie Conference Room C [[MAP](https://www.google.co.jp/maps/place/%E6%B8%8B%E8%B0%B7%E3%83%92%E3%82%AB%E3%83%AA%E3%82%A8/@35.6590249,139.7012843,17z/data=!3m2!4b1!5s0x60188b58f894f891:0x4ceb5b05dd6f9d0b!4m5!3m4!1s0x60188b5850e5a83f:0x70297507b32efce5!8m2!3d35.6590249!4d139.703473?hl=ja)]
- Capacity:20 people
- Addmission:Free
- Sponsors:Classmethod, Inc. and Hatena, Inc.

#### Apply here

[https://go.pardot.com/l/304611/2019-02-11/bfdhf:embed:cite] (Japanese only)
1 change: 1 addition & 0 deletions content/docs/entry/howto/container-agent.md
Expand Up @@ -88,6 +88,7 @@ If not using the plugin, the agent can be used with just environment variable co
| MACKEREL_ROLES | Sets tasks, pod services, and roles. |
| MACKEREL_AGENT_CONFIG | Sets the agent configuration file. Details for this will be described later. |
| MACKEREL_IGNORE_CONTAINER | Sets the name of the container to be excluded from monitoring with regular expressions. |
| MACKEREL_HOST_STATUS_ON_START | When set, the host status changes to the specified value upon startup of the agent. Valid values are "standby", "working", "maintenance", and "poweroff".|

### Using the configuration file

Expand Down
95 changes: 95 additions & 0 deletions content/docs/entry/howto/mkr/wrap.md
@@ -0,0 +1,95 @@
---
Title: Monitoring batch jobs that use cron etc. with mkr wrap
Date: 2019-02-19T12:22:14+09:00
URL: https://mackerel.io/docs/entry/howto/mkr/wrap
EditURL: https://blog.hatena.ne.jp/mackerelio/mackerelio-docs.hatenablog.mackerel.io/atom/entry/17680117126970795193
---

Using `mkr wrap`, you can monitor the success or failure of a program that is executed in fixed intervals by cron etc.

```
% mkr wrap -- /path/to/your-batch ...
```

By executing a command like the one described above, if the command fails (non-zero exit), an alert will be generated in Mackerel.

The alert is registered as a check monitoring alert of the execution host. The host ID and the API key needed to post information to Mackerel are automatically obtained from the configuration file (mackerel-agent.conf), but can be explicitly specified as well.

## Registering in crontab

For example, a batch that is executed every hour at 11 minutes would only need the following specification.

```
11 * * * * mkr wrap -- /path/to/your-batch ...
```

## Alert status

If execution of a command fails, then an alert will occur in Mackerel as shown in the screenshot below.

![](https://cdn-ak.f.st-hatena.com/images/fotolife/m/mackerelio/20190219/20190219115516.png)

Here, the failed command and its contents will be displayed. For this reason, please be careful not to include any sensitive information that you wouldn't want leaked, such as passwords, in the command or its options.

## Detailed usage

As previously mentioned, `mkr wrap` can easily be used without any special specifications needed.

```
% mkr wrap [options] -- /path/to/your-batch ...
```

Write the batch job command to be executed after `--`. If needed, several options can be used before `--`.

### `-n, --name` - Monitor name

```
% mkr wrap -n your-check-monitor -- /path/to/your-batch
```

Specify the check monitoring name for the job. If this option is not specified, the name will be automatically created from the argument (something alongs the lines of "mkrwrap - {{command name}} - {{hash 6 digit value}}"), but we recommend that you explicitly specify this for the sake of clarity.

Since this is treated as a check monitor, the name must be unique to the host. Be sure not to use a name that overlaps with other check monitor names.

### `-d, --detail` - Command output transmission

By default, command output is not sent to Mackerel when sending monitoring information, but it can transmitted to and checked in Mackerel using this option. However, outputs of more than 1024 characters will be cut off.

When using this option, please be careful regarding handling sensitive information similar to the command contents.

### `-N, --note` - Notes

Specify a note. When an alert occurs, the content of this note will be included in the alert message.

### `-H, --host` - Specify the Host ID

`mkr wrap` automatically obtains the host ID from the configuration file, but if you want to specify it explicitly, use this option.

### `-w, --warning` - Set alerts to WARNING

By default, `mkr wrap` alerts command failure as CRITICAL, but alerts can be set to WARNING by specifying this option.

### `-a, --auto-close` - Automatically close alerts upon command success

If there are previously reported alerts when a succeeding batch job of the same configuration succeeds, those alerts will automatically close. Because alerts do not auto-close by default, they must be closed manually.

This option saves the result in a temporary area and reads it at the next execution, so be sure to use it in an environment with persistent disk space.

### `-I, --notifictation-interval` - Specify the notification retransmission interval

By default, notifications are only sent when an alert occurs or when the status of an alert changes, but you can have notifications sent at fixed intervals for open alerts using this option.

The interval is specified in the form of `15m`, `1h`, `1h30m`. The shortest interval that can be specified is `10m` (10 minutes). Any specification shorter than that will be set to 10 minutes.

### `MACKEREL_APIKEY` environment variable - API key specification

Explicitly specify the API key. This can not be specified with the command line option or environments variables only.

## Points of note

Although it does not occur in most cases, obtaining the API key or host ID from the id file that stores mackerel-agent.conf and host ID may fail due to permissions.

This can only occur if you have independently set the permissions of those files as 0600 etc. This is because the execution user of the batch job is often different from the execution user of mackerel-agent.

In this case, adjust the permissions and user authorities of these files, or explicitly specify the API key and host ID in the command.

0 comments on commit 935613f

Please sign in to comment.