Template length exceeds flowset length, skipping #205

jlixfeld · 2018-11-03T17:29:47Z

Hi there,

I don't think this is an EF issue, but I'd like to poke the crowd to see if anyone has any ideas before I but the LS folks.

I started seeing my Elastiflow graphs jump up across all metrics from, for example, tens of Gbps to hundreds of Tbps. Obviously bad data.

I correlated it with these logs starting to appear:

[2018-10-31T11:24:17,297][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>260, :template_length=>65, :record_length=>60}
[2018-10-31T11:24:47,739][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-10-31T11:24:47,740][INFO ][logstash.filters.translate] refreshing dictionary file
[2018-10-31T11:24:52,784][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>266, :template_length=>109, :record_length=>104}
[2018-10-31T11:24:55,245][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>260, :template_length=>65, :record_length=>60}
[2018-10-31T11:25:12,174][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>260, :template_length=>65, :record_length=>60}
[2018-10-31T11:25:18,456][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>266, :template_length=>109, :record_length=>104}
[2018-10-31T11:25:18,732][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>266, :template_length=>109, :record_length=>104}
[2018-10-31T11:25:20,861][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>266, :template_length=>109, :record_length=>104}
[2018-10-31T11:25:22,720][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>266, :template_length=>109, :record_length=>104}
[2018-10-31T11:25:50,674][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>266, :template_length=>109, :record_length=>104}
[2018-10-31T11:25:57,563][WARN ][logstash.codecs.netflow  ] Template length exceeds flowset length, skipping {:template_id=>260, :template_length=>65, :record_length=>60}

I've seem posts in various forums about bugs in netflow codecs, but those bugs are from months ago, all of which seem to have been fixed(?). I just verified that I'm running the latest coded:

root@ls1:/var/log/logstash# /usr/share/logstash/bin/logstash-plugin update logstash-codec-netflow
Updating logstash-codec-netflow
No plugin updated
root@ls1:/var/log/logstash#

Does this look familiar to anyone?

Thanks!

The text was updated successfully, but these errors were encountered:

robcowart · 2018-11-03T19:20:21Z

Most likely you have devices that are sending the same template ID (in this case 260) but the content of the flow records is different. The codec is trying to decode the flows from one device with the template from the other.

This is an issue with the codec. logstash-plugins/logstash-codec-netflow#9

jlixfeld · 2018-11-03T19:52:39Z

I think the subtext in the issues you cite suggest a workaround could be to use a different dport on the exporter. Am I reading into that correctly, or no?

jlixfeld · 2018-11-05T16:49:10Z

Hi Rob,

So it seems the only workaround is to stand up multiple LS instances. To that end, I've duplicated the elastiflow directories and added them as additional pipelines in logstash.yml per this link:

https://www.elastic.co/guide/en/logstash/6.2/multiple-pipelines.html

I changed the port in each instance of elastiflow${copy_number}/conf.d/10_input_netflow_ipv4.logstash.conf from port => "${ELASTIFLOW_NETFLOW_IPV4_PORT:2055}" to simply port => 205x (The port is unique across each of the 5 config files).

First time I tried to restart, I was getting Java out-of-memory errors, even though I've tripled the amount of memory allocated to the VM from 12GB to 32GB, so then I increased the heaps to

-Xms20g
-Xmx20g

in /etc/logstash/jvm.options.

Now, logstash seems to be taking an hour to start, and it's still not done.

Last entry in the log was from an hour ago:

root@ls1:/etc/logstash# date
Mon Nov  5 11:33:25 EST 2018
root@ls1:/etc/logstash# tail -1 /var/log/logstash/logstash-plain.log
[2018-11-05T10:29:50,173][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.2"}
root@ls1:/etc/logstash#

Java is running:

root@ls1:/etc/logstash# ps xa|grep java
  840 ?        SNsl  70:51 /usr/bin/java -Xms20g -Xmx20g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0 -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/urandom -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-codec-1.11.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.0.18.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/gradle-license-report-0.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/guava-22.0.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.1.13.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.commands-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.contenttype-3.4.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.expressions-3.4.300.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.filesystem-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.jobs-3.5.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.resources-3.7.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.runtime-3.7.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.app-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.common-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.preferences-3.4.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.registry-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.jdt.core-3.10.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.osgi-3.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.text-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.25.jar org.logstash.Logstash --path.settings /etc/logstash
root@ls1:/etc/logstash#

top - 11:34:15 up  1:23,  1 user,  load average: 1.04, 1.01, 1.00
Tasks: 158 total,   1 running, 157 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy, 25.0 ni, 75.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32951964 total, 22114460 free, 10656812 used,   180692 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 21934460 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  840 logstash  39  19 23.585g 0.010t  19716 S 100.3 31.9  70:26.07 java

I'm going to assume you've had to take similar measures in some of your other deployments for folks who have been running a homogenous network with similar type exports with similar SW versions, etc. Have you seen anything like this before or do you just spin up entirely separate logstash servers? Any pointers?

robcowart · 2018-11-05T17:02:19Z

The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...

collector---\
collector-- redis -- processor -- Elasticsearch
collector---/

Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.

You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.

Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.

geriko2019 · 2019-01-14T10:18:03Z

The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\
collector-- redis -- processor -- Elasticsearch
collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.

You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.

Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.

Hi rob,
I am getting the same error, with a single router configured.
How else can Redis be configured? By default Redis listens to tcp traffic (on port 6379 by default).
My challenge is that the "logstash-codec-netflow" can't be updated any further.

while using the following simple logstash configuration input & output i get an output:

input {
udp {
port => 2055
codec => netflow
type => "netflow"
queue_size => 10000
workers => 4
tags => [ "cisco_logs" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-%{+YYYY.MM.dd}"
document_type => "netflow_logs"
}

    stdout { codec => rubydebug }

}

      "host" => "XX.XX.XX.XX",
   "netflow" => {
        "l4_src_port" => 53,
          "direction" => 1,
            "in_pkts" => 1,
             "src_as" => 0,
       "flow_seq_num" => 4798,
           "in_bytes" => 88,
           "protocol" => 17,
    "flow_sampler_id" => 0,
      "ipv4_src_addr" => "XX.XX.XX.XX",
             "dst_as" => 0,
     "first_switched" => "2019-01-14T09:11:27.999Z",
      "ipv4_next_hop" => "XX.XX.XX.XX",
         "flowset_id" => 256,
        "l4_dst_port" => 54678,
            "version" => 9,
      "last_switched" => "2019-01-14T09:11:27.999Z",
           "src_mask" => 24,
        "output_snmp" => 1,
            "src_tos" => 0,
         "input_snmp" => 5,
          "tcp_flags" => 16,
           "dst_mask" => 26,
      "ipv4_dst_addr" => "XX.XX.XX.XX"
},
"@timestamp" => 2019-01-14T09:11:42.000Z,
      "type" => "netflow",
      "tags" => [
    [0] "cisco_logs"
],
  "@version" => "1"

How do i get my Elastiflow graphs to work

Note:
I used the following instruction to setup Elastiflow:
(URL: https://pandaways.com/elastiflow-with-mikrotik-and-centos-7/)

robcowart · 2019-01-14T10:26:19Z

If all that you have is one source of flows, and you are seeing this error, then something else is wrong. It is possible that the template the vendor is sending doesn't match the flow records. I would have to look at a PCAP which contains both the template and a flow record in order to investigate.

geriko2019 · 2019-01-14T10:31:03Z

elastiflow.zip
I hope the PCAP is sufficient (find attached)

robcowart · 2019-01-14T10:43:33Z

You have two different devices sending flows. Both send a template with ID 256, but they contain different fields. This will cause issues as one template will be used to decode both sources, making at one of them always wrong. The only work around for this at the moment is to customize ElastiFlow for multiple collectors, with a common processing instance and some for of messaging queue in between.

geriko2019 · 2019-01-14T11:12:32Z

thanks rob

geriko2019 · 2019-01-16T08:15:58Z

hi rob,
I've re-installed Elastiflow and configured one device. the following error seems to persist(latest pcap file attached):

[2019-01-16T09:57:02,462][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:57:13,469][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:57:26,944][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:57:38,409][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:57:49,407][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:57:56,394][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:58:07,392][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:58:19,388][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:58:31,389][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
[2019-01-16T09:58:43,387][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.

netflow-20190116.zip

geriko2019 · 2019-01-16T08:17:20Z

I'm using ElastiFlow v2.1.0

robcowart · 2019-01-16T08:24:14Z

How long has it been running. Some devices are really slow at sending templates. Fortinet in particular can take 15-30 minutes.

geriko2019 · 2019-01-16T08:28:39Z

Over an hour.

geriko2019 · 2019-01-18T09:22:31Z

issue was resolved. Thanks

ThatMorneGuy · 2019-02-20T15:10:10Z

The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\
collector-- redis -- processor -- Elasticsearch
collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.

You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.

Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.

@robcowart Do you have any guides on how to implement this?

odgeza · 2019-05-20T11:50:22Z

Out of interest, does the elasticsearch elastiflow netflow plugin also have the same problem with templates overlapping?

robcowart · 2019-05-20T14:54:48Z

The root cause is related to where input codecs (which decode the raw flow data) are located. So it is a Logstash-specific issue. Any solution based on Logstash will have the same challenge.

sgreszcz · 2019-11-11T20:59:11Z

Most likely you have devices that are sending the same template ID (in this case 260) but the content of the flow records is different. The codec is trying to decode the flows from one device with the template from the other.

This is an issue with the codec. logstash-plugins/logstash-codec-netflow#9

@robcowart looks like there is a patch for this: logstash-plugins/logstash-codec-netflow#9 (comment)

But with Logstash 7.4.2 being broken with netflow codec: #427 I had to roll back to 7.3.2 (for some reason 7.4.2 works with collectd over UDP) I'm wondering if I should start looking at something else for processing netflow....

Filebeat also has its problems, especially around packets seemingly not being processes - I'll need to look at this tomorrow...

I'm assuming if I get stuff like this in my Logstash netflow codec logs then my netflow data isn't being processes properly?

[2019-11-11T20:57:59,701][WARN ][logstash.codecs.netflow ] Template length exceeds flowset length, skipping {:template_id=>257, :template_length=>91, :record_length=>52}

robcowart · 2019-11-14T19:44:47Z

This issue will be addressed once the following PRs are merged and released for the...

Logstash UDP Input: logstash-plugins/logstash-input-udp#46
Logstash Netflow Codec: logstash-plugins/logstash-codec-netflow#187

efbertos · 2019-12-02T18:16:37Z

The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\
collector-- redis -- processor -- Elasticsearch
collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.
You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.
Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.
@robcowart Do you have any guides on how to implement this?

I am wondering the same.

robcowart · 2020-02-23T17:52:51Z

Unfortunately the Elastic team declined to merge UDP input changes (see... logstash-plugins/logstash-input-udp#46). This leaves no other option than to continue to recommend the workaround of multiple instances of the ElastiFlow pipeline.

robcowart · 2020-12-18T12:51:18Z

This is no longer a problem for the new ElastiFlow Unified Flow Collector. In particular the new collector properly handles templates across devices and observation domains. More details are available... HERE.

robcowart closed this as completed Jan 18, 2019

robcowart mentioned this issue Jan 24, 2019

Elastiflow search filters not showing correct results when receiving streams from multiple clients #242

Closed

ThatMorneGuy mentioned this issue Mar 6, 2019

Deployment Example for Redis #271

Closed

robcowart mentioned this issue May 18, 2019

[netflow.event_time_msec] illegal_argument_exception Invalid format #330

Closed

odgeza mentioned this issue May 20, 2019

LOOKING FOR INPUT! Include Redis as part of standard ElastiFlow deployment? #305

Closed

robcowart mentioned this issue Jun 18, 2019

weird flow data (not even exist in the network) #353

Closed

robcowart reopened this Nov 14, 2019

robcowart closed this as completed Feb 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Template length exceeds flowset length, skipping #205

Template length exceeds flowset length, skipping #205

jlixfeld commented Nov 3, 2018

robcowart commented Nov 3, 2018

jlixfeld commented Nov 3, 2018

jlixfeld commented Nov 5, 2018

robcowart commented Nov 5, 2018

geriko2019 commented Jan 14, 2019

robcowart commented Jan 14, 2019

geriko2019 commented Jan 14, 2019

robcowart commented Jan 14, 2019

geriko2019 commented Jan 14, 2019

geriko2019 commented Jan 16, 2019

geriko2019 commented Jan 16, 2019

robcowart commented Jan 16, 2019

geriko2019 commented Jan 16, 2019

geriko2019 commented Jan 18, 2019

ThatMorneGuy commented Feb 20, 2019

odgeza commented May 20, 2019

robcowart commented May 20, 2019

sgreszcz commented Nov 11, 2019

robcowart commented Nov 14, 2019

efbertos commented Dec 2, 2019

robcowart commented Feb 23, 2020

robcowart commented Dec 18, 2020

Template length exceeds flowset length, skipping #205

Template length exceeds flowset length, skipping #205

Comments

jlixfeld commented Nov 3, 2018

robcowart commented Nov 3, 2018

jlixfeld commented Nov 3, 2018

jlixfeld commented Nov 5, 2018

robcowart commented Nov 5, 2018

geriko2019 commented Jan 14, 2019

robcowart commented Jan 14, 2019

geriko2019 commented Jan 14, 2019

robcowart commented Jan 14, 2019

geriko2019 commented Jan 14, 2019

geriko2019 commented Jan 16, 2019

geriko2019 commented Jan 16, 2019

robcowart commented Jan 16, 2019

geriko2019 commented Jan 16, 2019

geriko2019 commented Jan 18, 2019

ThatMorneGuy commented Feb 20, 2019

odgeza commented May 20, 2019

robcowart commented May 20, 2019

sgreszcz commented Nov 11, 2019

robcowart commented Nov 14, 2019

efbertos commented Dec 2, 2019

robcowart commented Feb 23, 2020

robcowart commented Dec 18, 2020