Permalink
Browse files

Document new parsers and redo tags docs

  • Loading branch information...
anagrius committed Nov 17, 2018
1 parent 9197d52 commit ef4417967720f43d11befbb75f12d0c3b3a6b18f
@@ -55,7 +55,7 @@ The `fields` section is used to specify fields that should be added to each of t
It is possible to send events of different types in the same request. That is done by adding a new element to the outer array in the example above.
Tags can be specified in the parser pointed to by the `type` field

#### Events
#### Events {#events}

When sending events, you can set the following standard fields:

@@ -191,6 +191,7 @@ This request contains three events. The first two are tagged with `server1` and


#### Tags

Tags are key-value pairs.

Events are stored in data sources. A repository has a set of Data Sources.
@@ -199,7 +200,7 @@ matching its tags. If no data source with the exact tags exists it is created.
Tags are used as query boundaries when searching
Tags are provided as a JSON object containing key-value pairs. Keys and values
must be strings, and at least one tag must be specified.
See the [tags documentation]({{< ref "tags.md" >}}) for more information.
See the [tags documentation]({{< ref "tagging.md" >}}) for more information.

#### Events

@@ -87,7 +87,7 @@ Name | Type | Required | Description
`dateTimeFields` | Array | Yes | Specifies the fields which contain the timestamp of the event. <br /><br />You can specify multiple fields, for example, a date field and a time field. The values of these fields are concatenated with whitespaces. <br /> <br /> Humio parses these fields with the format that you specify in the `dateTimeFormat` attribute.
`dateTimeFormat` | String | No | The format string that Humio should use to parse the fields identified by the `dateTimeFields` attribute. <br /><br />This attribute uses the [Java DateTimeFormatter syntax](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). <br /><br />The default value is the ISO-8601 format, for example, `yyyy-MM-dd'T'HH:mm:ss.SSSZ`, with milliseconds as an optional addition.
`timezone` | String | No | This field is only used if the timestamp of the event is in localtime and does not have a timezone. <br /> <br />In that case, you can use it to set a timezone. <br /><br />Do not use this field if the timezone is part of the `dateTimeFormat`.<br /><br /> Examples: `UTC`, `Z`, or `Europe/Copenhagen`.
`tagFields` |Array | No | Specify fields in events generated by this parser that should be turned into [tags]({{< ref "tags.md" >}}).<br/> For example it could be specified that the host field in the events from this parser should be treated as a tag.
`tagFields` |Array | No | Specify fields in events generated by this parser that should be turned into [tags]({{< ref "tagging.md" >}}).<br/> For example it could be specified that the host field in the events from this parser should be treated as a tag.


**Response**
@@ -14,4 +14,4 @@ _Here is a suggested reading order for some core concepts:_
1. [Events]({{< ref "events.md" >}})
1. [Queries]({{< ref "queries.md" >}})
1. [Live Queries]({{< ref "live-queries.md" >}})
1. [Tags]({{< ref "tags.md" >}})
1. [Datasources]({{< ref "datasources.md" >}})
@@ -2,22 +2,37 @@
title: Datasources
---

A Data Source is a set of [Events]({{< ref "events.md" >}}) that have the same [Tags]({{< ref "tags.md" >}}).
Humio divides each [Repository]({{< ref "repositories.md" >}}) into more than one Data Source.
A _datasource_ is a set of [events]({{< ref "events.md" >}}) that share
the the same [tags]({{< ref "tagging.md" >}}).

Humio creates Data Sources automatically when it encounters a new combination of Tags. Users cannot create Data Sources directly.
Each datasource is stored separately on disk and restricting searches to a subset
means Humio does not have to traverse all data - which can make searches much faster.

Data Sources are the smallest unit of data that you can delete.
You cannot delete individual Events in a Data Source beyond expiration.<!--GRW: I'm not sure what 'beyond expiration' means. -->
## Viewing a repository's datasources

Humio represents each Data Source internally as a dedicated directory within the Repository directory.
In the settings page of repository you can see the list of datasources that have
been created for during ingest.

{{% notice note %}}
We recommend that you _do not create more than 1,000 separate tags_, or combinations of tags.
If you need more combinations we recommend that you use attributes on individual
events to differentiate them and select them separately.
{{% /notice %}}
Datasources are created automatically based on the tags assigned to incoming
events - whenever a new combination of tags is encountered during ingest a new datasource is created.

Each Data Source requires Java heap for buffering while building the
next block of data to be persisted. This amount to roughly 5 MB each. If you have 1,000 Data Sources (across all repositories, in total) on your Humio server, you will need at least 5GB of heap for that on top of the other heap being used. In a clustered environment, only the share of Data sources that are being "digested" on the node need heap for buffers. So more servers can accommodate more Data Sources in the cluster.
## Deleting a datasource

Datasources are the smallest unit of data that you can delete in Humio.
You can delete an entire datasource from the repository setting page.

## Memory usage

We recommend that you limit the number of datasources per Humio server to
a maximum of 1000 distinct datasource across all repositories.

Each datasource requires Java heap space for buffering events while building the
blocks of data to be persisted to disk - roughly 5 MB per datasource of memory.

In a clustered environment, only the share of datasources that
are being "digested" on this particular node need heap for buffers.
So more servers can accommodate more datasources in the cluster.

__Example__
If you have 1000 datasources (across all repositories, in total) on a single
Humio server, you will need at least 5GB of heap just for buffering events.
@@ -55,7 +55,7 @@ described in detail below.

### Tag Fields {#tags}

[Tags]({{< ref "tags.md" >}}) fields that define how events are physical stored and indexed. They are
[Tags]({{< ref "tagging.md" >}}) fields that define how events are physical stored and indexed. They are
used for speeding up queries.

Users can associate custom tags as part of the parsing and ingestion process.

This file was deleted.

Oops, something went wrong.
@@ -1,6 +1,6 @@
---
title: "FAQ"
weight: 7
weight: 1000
aliases: ["appendix/faq", "ref/faq"]
---

@@ -95,12 +95,12 @@ You must make the following changes to the sample configuration:
If the log files use special, non-ASCII characters, then set the encoding here. For example, `utf-8` or `latin1`.

* If all your events are fairly small, you can increase `bulk_max_size` from the default of 200. The default of 200 is fine for most use cases.
The Humio server does not limit the size of the ingest request.
The Humio server does not limit the size of the ingest request.
But keep bulk_max_size low, as you may get the requests timed out if they get too large. In case of timeouts filebeat will back off, thus getting worse performance then with a lower bulk_max_size.
(Note! The Humio cloud on cloud.humio.com does limit requests to 32 MB. If you go above this limit, you will get "Failed to perform any bulk index operations: 413 Request Entity Too Large"
if a request ends up being too large, measured in bytes, not in number of events. If this happens, lower bulk_max_size as filebeat will otherwise keep retrying that request and not move on to other events.)

* You may want to increase the number of worker instances (`worker`) from the default of 1 to (say) 4 to achieve more throughput if filebeat is not able to keep up with the inputs. If increasing bulk_max_size is possible then do that instead, or increase both.
* You may want to increase the number of worker instances (`worker`) from the default of 1 to (say) 4 to achieve more throughput if filebeat is not able to keep up with the inputs. If increasing bulk_max_size is possible then do that instead, or increase both.

## Running Filebeat {#running-filebeat}

@@ -138,9 +138,10 @@ For example, when sending a web server access log file to Humio, you can use the

### Parsing JSON data

Humio supports [JSON parsers]({{< ref "json-parsers.md" >}}).
We DO NOT recommend that you use the JSON parsing built into Filebeat, instead Humio has it's own JSON support.
Filebeat processes logs line by line, so JSON parsing will only work if there is one JSON object per line.
Customize a JSON parser in Humio (do not use the JSON parsing built into Filebeat).
By using the [built-in json parser]({{< ref "json.md" >}}) you can get JSON fields extracted during ingest.
You can also [create a custom JSON parser]({{< ref "creating-a-parser.md" >}}) to get more control over the fields that are created.


## Adding fields
@@ -158,7 +159,7 @@ To avoid having the `@host` and `@source` fields, specify `@host` and `@source`
## Tags

Humio saves data in Data Sources. You can provide a set of Tags to specify which Data Source the data is saved in.
See [the section on tags]({{< relref "concepts/tags.md" >}}) for more information about tags and Data Sources.
See [the section on tags]({{< ref "tagging.md" >}}) for more information about tags and Data Sources.
The `type` configured in Filebeat is always used as tag. Other fields can be used
as tags as well by defining the fields as `tagFields` in the
[parser]({{< relref "parsers/_index.md" >}}) pointed to by the `type`.
@@ -242,4 +243,4 @@ default tag in Humio, if you do not provide a "@tags" field.
type: $NAME_OF_PARSER
"@tags": ["type", "@host"]
[...]
```
```
@@ -14,7 +14,7 @@ together. In this scenario, you use Logstash as the log collection and
parsing agent, and instruct it to send the data to Humio.

{{% notice tip %}}
__Humio supports the ElasticSearch bulk insertion API__ Just [point the Elastic outputter to Humio](#configuration).
__Humio supports the ElasticSearch bulk insertion API__ Just [point the Elastic outputter at Humio](#configuration).
{{% /notice %}}


@@ -62,7 +62,9 @@ output{
```

### Adding tags to events
Please read [the section on tags]({{< relref "concepts/tags.md" >}}) before adding tags to your events. Add tags by including them in the "inputs/exec" section:

Please read [the section on tags]({{< ref "tagging.md" >}}) before adding tags
to your events. Add tags by including them in the "inputs/exec" section:

```
input{
@@ -19,22 +19,25 @@ The following pattern is used:

`REPOSITORY/YEAR/MONTH/DAY/TAG_KEY_1/TAG_VALUE_1/../TAG_KEY_N/TAG_VALUE_N/START_TIME-SEGMENT_ID.gz`

Read more about [Tags]({{< relref "concepts/tags.md" >}}).
Read more about [Tags]({{< ref "tagging.md" >}}).

## Format

The default archiving format is [NDJSON](http://ndjson.org) and optionally raw log lines. When using NDJSON the parsed fields will be available along with the raw log line. This incurs some extra storage cost compared to using raw log lines but gives the benefit of ease of use when processing the logs in an external system.

## On-premise Setup

For an on-premise installation of Humio the following configuration is neccesary:
For an on-premise installation of Humio the following configuration is necessary:

```shell
S3_ARCHIVING_ACCESSKEY=$ACCESS_KEY
S3_ARCHIVING_SECRETKEY=$SECRET_KEY
```

The keys are used for authenticating against the S3 service. In other words the authenticated principal needs to have write access to the S3 buckets involved in the archiving. For guidance on how to retrieve S3 access keys see [AWS access keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys).
The keys are used for authenticating against the S3 service. In other words
the authenticated principal needs to have write access to the S3 buckets
involved in the archiving. For guidance on how to retrieve S3 access keys
see [AWS access keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys).


## Cloud Setup
@@ -46,15 +49,17 @@ Enabling Humio Cloud to write to your S3 bucket means setting up AWS cross-accou
3. Click on Permissions.
4. Click Add User.
5. Enter the canonical ID for Humio( `f2631ff87719416ac74f8d9d9a88b3e3b67dc4e7b1108902199dea13da892780` ) and check the Object Access Read and Write permission check boxes and click Save.
6. In Humio go to the repository you want to archive and select S3 Archiving under Settings. Configure by giving bucket name, region etc. and Save.
6. In Humio go to the repository you want to archive and select S3 Archiving
under Settings. Configure by giving bucket name, region etc. and Save.

## Tag Grouping

If tag grouping is defined for a repository the segment files will by split by each unique combination of tags present in a file. This results in a file in S3 per each unique combination of tags. The same layout pattern is used as in the normal case. The reason for doing this is to make it easier for a human operator to determine whether a log file is relevant.
If tag grouping is defined for a repository the segment files will by split by
each unique combination of tags present in a file. This results in a file in S3
per each unique combination of tags. The same layout pattern is used as in the
normal case. The reason for doing this is to make it easier for a human operator
to determine whether a log file is relevant.

{{% notice note %}}
S3 archiving is only available in version >= 1.1.27.
{{% /notice %}}



@@ -164,9 +164,9 @@ that do not have the field `foo`.
### Tag Filters {#tag-filters}

Tag filters are a special kind of field filter. They behave in the same way as
regular [filters]({{< ref "#field-filters" >}}).
regular [filters]({{< relref "#field-filters" >}}).

In the example shown in the previous section ([Basic Query Components]({{< ref "#pipeline" >}})),
In the example shown in the previous section ([Basic Query Components]({{< relref "#pipeline" >}})),
we have separated the tag filters from the rest of the query by a pipe character `|`.

We recommend that you include the pipe character before tag filters in your
@@ -177,7 +177,7 @@ recognize tag filters, and use this
information to narrow down the number of data sources to search.
This feature decreases query time.

See the [tags documentation]({{< ref "tags.md" >}}) for more on tags.
See the [tags documentation]({{< ref "tagging.md" >}}) for more on tags.

## Logical Operators: `and`, `or`, `not`, and `!` {#logical-operators}

Oops, something went wrong.

0 comments on commit ef44179

Please sign in to comment.