Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input netif always output value 0 if raw value is bigger than some value? #2366

Closed
HorseLuke opened this issue Jul 17, 2020 · 6 comments
Closed

Comments

@HorseLuke
Copy link

Bug Report

Describe the bug

Some servers online many years, and stats of network traffic reach TB bytes.

If use Input netif, it always output value 0.

I am not familiar with C, does the raw value which bigger than unsigned long make function strtoul in plugins/in_netif/in_netif.c parse error? If so, how to fix it?

Maybe mitigation method is bring down and up network card, but I cann't do that.

To Reproduce

  • Rubular link if applicable: N/A
  • Example log message if applicable:
[0] netif.0: [1595009328.000250499, {"eth0.rx.bytes"=>0, "eth0.rx.packets"=>0, "eth0.rx.errors"=>0, "eth0.tx.bytes"=>0, "eth0.tx.packets"=>0, "eth0.tx.errors"=>0}]
[1] netif.0: [1595009329.000214817, {"eth0.rx.bytes"=>0, "eth0.rx.packets"=>0, "eth0.rx.errors"=>0, "eth0.tx.bytes"=>0, "eth0.tx.packets"=>0, "eth0.tx.errors"=>0}]
[2] netif.0: [1595009330.000228601, {"eth0.rx.bytes"=>0, "eth0.rx.packets"=>0, "eth0.rx.errors"=>0, "eth0.tx.bytes"=>0, "eth0.tx.packets"=>0, "eth0.tx.errors"=>0}]
[0] netif.0: [1595009331.000356457, {"eth0.rx.bytes"=>0, "eth0.rx.packets"=>0, "eth0.rx.errors"=>0, "eth0.tx.bytes"=>0, "eth0.tx.packets"=>0, "eth0.tx.errors"=>0}]
[1] netif.0: [1595009332.000211857, {"eth0.rx.bytes"=>0, "eth0.rx.packets"=>0, "eth0.rx.errors"=>0, "eth0.tx.bytes"=>0, "eth0.tx.packets"=>0, "eth0.tx.errors"=>0}]
  • Steps to reproduce the problem:
    • Compile fluent-bit
    • run command to get raw data at where fluentbit lookup:
      cat /proc/net/dev
    • run command to run fluent-bit :
      /opt/fluent-bit-1.3.2/bin/fluent-bit -i netif -p interface=eth0 -o stdout

Expected behavior
rx/tx/packets values is not 0 when network traffic increases.

Screenshots
(Copyed from terminal and delete some sensitive info)

微信截图_20200718025428

Your Environment

  • Version used: 1.3.1
  • Configuration: Default
  • Environment name and version (e.g. Kubernetes? What version?): Bare Metal Server
  • Server type and version: Bare Metal Server
  • Operating System and version: CentOS 6.5 (But I think this problem is not from this)
  • Filters and plugins: netif

Additional context
N/A

@HorseLuke
Copy link
Author

HorseLuke commented Jul 17, 2020

After modify source to debug, seems that problem occur in plugins/in_netif/in_netif.c

modified source to debug (add printf):

static int parse_proc_line(char *line,
                           struct flb_in_netif_config *ctx)
{
    struct mk_list *head = NULL;
    struct mk_list *split = NULL;
    struct flb_split_entry *sentry = NULL;

    printf("parse_proc_line -> %s\n",line);

    int i = 0;
    int entry_num;

    split = flb_utils_split(line, ' ', 256);
    entry_num = mk_list_size(split);

    printf("entry_num : %i vs ctx->entry_len + 1: %i\n", entry_num,ctx->entry_len + 1);


    if (entry_num != ctx->entry_len + 1) {
        flb_utils_split_free(split);
        printf("return -1 \n");
        return -1;
    }

    mk_list_foreach(head, split) {
        sentry = mk_list_entry(head, struct flb_split_entry ,_head);
        if (i==0) {

            printf("run is_specific_interface: %s\n", sentry->value);


            /* interface name */
            if( is_specific_interface(ctx, sentry->value)){
                i++;
                continue;
            }
            else {
                printf("run is_specific_interface faild, return -1\n");

                /* skip this line */
                flb_utils_split_free(split);
                return -1;
            }
        }
        ctx->entry[i-1].now = strtoul(sentry->value ,NULL ,10);
        printf("network value: %s - %lu\n", sentry->value, ctx->entry[i-1].now);
        i++;
    }

    flb_utils_split_free(split);

    return 0;
}

in new virtual machine, it output ok:

parse_proc_line ->  face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed

entry_num : 16 vs ctx->entry_len + 1: 17

......

parse_proc_line ->   eth1: 1092294   11658    0    0    0     0          0         0 50892300   40523    0    0    0     0       0          0

entry_num : 17 vs ctx->entry_len + 1: 17
run is_specific_interface: eth1:
network value: 1092294 - 1092294
network value: 11658 - 11658
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 50892300 - 50892300
network value: 40523 - 40523
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 0 - 0
network value: 0
 - 0


Howeverin in production server, entry_num != ctx->entry_len + 1 made wrong result because eth0:3742334317097 not separate each other:

parse_proc_line ->  face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed

entry_num : 16 vs ctx->entry_len + 1: 17
return -1

......

parse_proc_line ->  face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed

entry_num : 16 vs ctx->entry_len + 1: 17
return -1
parse_proc_line ->     lo:4632653363214 15404785252    0    0    0     0          0         0 4632653363214 15404785252    0    0    0     0       0          0

entry_num : 16 vs ctx->entry_len + 1: 17
return -1
parse_proc_line ->   eth0:3742334317097 43769887866    0    0    0     0          0         0 16850348309399 47266866606    0    0    0     0       0          0

entry_num : 16 vs ctx->entry_len + 1: 17
return -1

@HorseLuke HorseLuke changed the title Input netif always output value 0 if raw value is bigger than unsigned long? Input netif always output value 0 if raw value is bigger than some value? Jul 17, 2020
@nokute78
Copy link
Collaborator

nokute78 commented Jul 17, 2020

@HorseLuke Thank you for reporting issue.

parse_proc_line -> eth0:3742334317097 43769887866 0 0 0 0 0 0 16850348309399 47266866606 0 0 0 0 0 0
This is a legacy kernel issue.
Currently, in_netif doesn't support such format.

The kernel version of CentOS6.X is v2.6.32.xxx.
The format of /proc/net/dev which is the source of in_netif is

	seq_printf(seq, "%6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu "
		   "%8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n",

https://github.com/torvalds/linux/blob/v2.6.32/net/core/dev.c#L3055

The point is %6s:%8lu. There is no whitespace after : .
This causes eth0:3742334317097 and in_netif can't parse it.

On the other hand, v3.10.x the version of CentOS7 is like this.
https://github.com/torvalds/linux/blob/v3.10/net/core/net-procfs.c#L82

	seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %5llu %10llu %9llu "
		   "%8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n",

The format is updated like %6s: %7llu.
Whitespace is added after :.

@HorseLuke
Copy link
Author

Thanks. I will find other method to do the same work. Closed as won't fix

@HorseLuke
Copy link
Author

HorseLuke commented Jul 18, 2020

Sorry to bother @nokute78 .

In plugins/in_netif/in_netif.c I add a function after code #define LINE_LEN 256:

static int parse_proc_line_before_fix_colon_kernel_2_x(char *line)
{
    char *finded = NULL;
    uintptr_t finded_pos;
    size_t line_length;

    finded = strchr(line, ':');
    if(finded == NULL){
        //printf("Character not found\n");
        return -1;
    }

    line_length = strlen(line);

    //printf("parse_proc_line_before_fix_colon_kernel_2_x / sizeof line is %lu\n", sizeof(line));
    //printf("parse_proc_line_before_fix_colon_kernel_2_x / strlen line is %lu\n", line_length);

    if(line_length + 2 > LINE_LEN){
        //printf("strlen reach max\n");
        return -2;
    }

    finded_pos = finded - line;

    //printf("parse_proc_line_before_fix_colon_kernel_2_x / finded_pos is %lu\n", finded_pos);
    memmove(line + finded_pos + 1, line + finded_pos, line_length - finded_pos);
    line[line_length + 1] = '\0';
    line[finded_pos + 1] = ' ';

    return 0;

}

Then change code parse_proc_line(line, ctx); in function in_netif_collect_linux to:

        if(parse_proc_line_before_fix_colon_kernel_2_x(line) != 0){
             //printf("Character not found\n");
             continue;
        }

        parse_proc_line(line, ctx);

This patch solve legacy kernel issue. But Is that OK? Or does it have some vuls like Buffer overflow?

Because CentOS 6 will be EOL, I will not make a PR. Just paste code for those who needed. Thanks.

HorseLuke added a commit to HorseLuke/fluent-bit that referenced this issue Jul 19, 2020
@nokute78
Copy link
Collaborator

@HorseLuke

Because CentOS 6 will be EOL, I will not make a PR. Just paste code for those who needed. Thanks.

I see. The patch is LGTM.

memmove(line + finded_pos + 1, line + finded_pos, line_length - finded_pos);

line + finded_pos = finded, so it will be

memmove(finded+ 1,  finded, line_length - finded_pos);

I think it is easy to read.

@HorseLuke
Copy link
Author

@nokute78 Sorry for slow reply. Thanks for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants