Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serial numbers to chart names #12067

Merged
merged 3 commits into from
Feb 21, 2022
Merged

Conversation

vlvkobal
Copy link
Contributor

@vlvkobal vlvkobal commented Feb 1, 2022

Summary

When charts a created with different ids which differ only by symbols that are not '_' or alphanumeric and no unique names are provided, Netdata crashes with a segmentation fault.

#0  0x00007fe374de1515 in __strlen_avx2 () from /usr/lib/libc.so.6
#1  0x00005615a2bd5efd in compute_chart_hash (st=st@entry=0x7fe368e42000) at database/sqlite/sqlite_functions.c:1662
#2  0x00005615a2bcd7f2 in rrdset_create_custom (host=0x5615a501fb10, type=<optimized out>, id=0x5615a5337a53 "job_num_CT_S2000", name=<optimized out>, family=<optimized out>, context=<optimized out>, title=<optimized out>,
    units=<optimized out>, plugin=<optimized out>, module=<optimized out>, priority=<optimized out>, update_every=<optimized out>, chart_type=<optimized out>, memory_mode=<optimized out>, history_entries=<optimized out>)
    at database/rrdset.c:938
#3  0x00005615a2bb56ee in pluginsd_chart_action (user=0x5615a5336f90, type=<optimized out>, id=<optimized out>, name=<optimized out>, family=<optimized out>, context=<optimized out>,
    title=0x5615a5337a68 "Active job number of destination CT_S2000", units=0x5615a5337a93 "jobs", plugin=0x5615a530fe51 "mycups.plugin", module=0x0, priority=100006, update_every=1, chart_type=RRDSET_TYPE_STACKED, options=0x0)
    at collectors/plugins.d/pluginsd_parser.c:53
#4  0x00005615a2bb67c6 in pluginsd_chart (words=<optimized out>, user=0x5615a5336f90, plugins_action=<optimized out>) at collectors/plugins.d/pluginsd_parser.c:379
#5  0x00005615a2c9d1a9 in parser_action (parser=parser@entry=0x5615a53379f0, input=0x5615a5337a48 "CHART", input@entry=0x0) at parser/parser.c:298
#6  0x00005615a2bb7a18 in pluginsd_process (host=0x5615a501fb10, cd=cd@entry=0x5615a530fa50, fp=fp@entry=0x5615a5319300, trust_durations=trust_durations@entry=0) at collectors/plugins.d/pluginsd_parser.c:769
#7  0x00005615a2bb4ca5 in pluginsd_worker_thread (arg=0x5615a530fa50) at collectors/plugins.d/plugins_d.c:248
#8  0x00005615a2b82afb in thread_start (ptr=<optimized out>) at libnetdata/threads/threads.c:184
#9  0x00007fe374e56259 in start_thread () from /usr/lib/libpthread.so.0
#10 0x00007fe374d7f5e3 in clone () from /usr/lib/libc.so.6

We will attach serial numbers to the chart names in such cases.

Fixes #12059

Test Plan
  1. Create a mock plugin test-chart-names.plugin in /usr/libexec/netdata/plugins.d/, make it runnable by Netdata.
#!/bin/bash
cat << EOF
CHART cups.job_num_CT-S2000 '' 'Active job number of destination CT-S2000' jobs 'CT-S2000' job_num stacked 100004 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
CHART cups.job_size_CT-S2000 '' 'Active job size of destination CT-S2000' KB 'CT-S2000' job_size stacked 100005 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
CHART cups.job_num_CT_S2000 '' 'Active job number of destination CT_S2000' jobs 'CT_S2000' job_num stacked 100006 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
CHART cups.job_size_CT_S2000 '' 'Active job size of destination CT_S2000' KB 'CT_S2000' job_size stacked 100007 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
BEGIN cups.job_num_CT-S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size_CT-S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_num_CT_S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size_CT_S2000
SET pending = 0
SET held = 0
SET processing = 0
END
CHART cups.job_num_CT/S2000 '' 'Active job number of destination CT_S2000' jobs 'CT/S2000' job_num stacked 100006 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
CHART cups.job_size_CT/S2000 '' 'Active job size of destination CT_S2000' KB 'CT/S2000' job_size stacked 100007 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
BEGIN cups.job_num_CT/S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size_CT/S2000
SET pending = 0
SET held = 0
SET processing = 0
END
CHART cups.dest_state '' 'Destinations by state' dests overview dests stacked 100000 1
DIMENSION idle '' absolute 1 1
DIMENSION printing '' absolute 1 1
DIMENSION stopped '' absolute 1 1
CHART cups.dest_option '' 'Destinations by option' dests overview dests line 100001 1
DIMENSION total '' absolute 1 1
DIMENSION acceptingjobs '' absolute 1 1
DIMENSION shared '' absolute 1 1
CHART cups.job_num '' 'Total active job number' jobs overview job_num stacked 100002 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
CHART cups.job_size '' 'Total active job size' KB overview job_size stacked 100003 1
DIMENSION pending '' absolute 1 1
DIMENSION held '' absolute 1 1
DIMENSION processing '' absolute 1 1
BEGIN cups.dest_state
SET idle = 1
SET printing = 0
SET stopped = 1
END
BEGIN cups.dest_option
SET total = 3
SET acceptingjobs = 1
SET shared = 2
END
BEGIN cups.job_num
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size
SET pending = 0
SET held = 0
SET processing = 0
END
EOF

while [ true ]
do

sleep 1

cat << EOF
BEGIN cups.job_num_CT-S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size_CT-S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_num_CT_S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size_CT_S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_num_CT/S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size_CT/S2000
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.dest_state
SET idle = 1
SET printing = 0
SET stopped = 1
END
BEGIN cups.dest_option
SET total = 3
SET acceptingjobs = 1
SET shared = 2
END
BEGIN cups.job_num
SET pending = 0
SET held = 0
SET processing = 0
END
BEGIN cups.job_size
SET pending = 0
SET held = 0
SET processing = 0
END
EOF

done
  1. Check if Netdata doesn't crash and assigns correct names to charts.

thiagoftsm
thiagoftsm previously approved these changes Feb 2, 2022
Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I used the given plugin with current master, I could replicate the issue. After this I compiled this PR, I got the expected result:

2022-02-02 02:54:52: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: chart name 'cups.job_num_CT_S2000' on host 'hades' already exists.
2022-02-02 02:54:52: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: using name 'cups.job_num_CT_S2000_1' for chart 'cups.job_num_CT_S2000' on host 'hades'.
2022-02-02 02:54:52: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: chart name 'cups.job_size_CT_S2000' on host 'hades' already exists.
2022-02-02 02:54:52: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: using name 'cups.job_size_CT_S2000_1' for chart 'cups.job_size_CT_S2000' on host 'hades'.

and netdata did not crash. LGTM!

ilyam8
ilyam8 previously approved these changes Feb 2, 2022
@ilyam8 ilyam8 closed this Feb 2, 2022
@ilyam8 ilyam8 reopened this Feb 2, 2022
@vkalintiris
Copy link
Contributor

When charts a created with different ids which differ only by symbols that are not '_' or alphanumeric and no unique names are provided, Netdata crashes with a segmentation fault.

I don't think this is the root cause here, because commenting out compute_chart_hash works fine:

no-hash

@vlvkobal
Copy link
Contributor Author

vlvkobal commented Feb 2, 2022

I don't think this is the root cause here,

The root cause is that st->name is still NULL in this case.

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retest after last commits and everything worked as expected:

2022-02-03 13:52:37: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: chart name 'cups.job_num_CT_S2000' on host 'hades' already exists.
2022-02-03 13:52:37: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: using name 'cups.job_num_CT_S2000_1' for chart 'cups.job_num_CT_S2000' on host 'hades'.
2022-02-03 13:52:37: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: chart name 'cups.job_size_CT_S2000' on host 'hades' already exists.
2022-02-03 13:52:37: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: using name 'cups.job_size_CT_S2000_1' for chart 'cups.job_size_CT_S2000' on host 'hades'.
2022-02-03 13:52:37: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: chart name 'cups.job_num_CT_S2000' on host 'hades' already exists.
2022-02-03 13:52:37: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: using name 'cups.job_num_CT_S2000_2' for chart 'cups.job_num_CT/S2000' on host 'hades'.
2022-02-03 13:52:38: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: chart name 'cups.job_size_CT_S2000' on host 'hades' already exists.
2022-02-03 13:52:38: netdata INFO  : PLUGINSD[test-chart-names] : RRDSET: using name 'cups.job_size_CT_S2000_2' for chart 'cups.job_size_CT/S2000' on host 'hades'.

@vkalintiris
Copy link
Contributor

@vlvkobal can we wait until tomorrow before we merge this? I'd like to understand exactly why name remains NULL.

@ilyam8
Copy link
Member

ilyam8 commented Feb 3, 2022

@vkalintiris we were not going to merge it w/o your approval

@vkalintiris
Copy link
Contributor

@vlvkobal @ilyam8

There's no guarantee that a collector (either internal or external) will create charts in a deterministic order and I think this PR will break the hashing optimization done in compute_chart_hash, because a different name will be assigned to a chart each time.

A couple of different ideas on how to address this:

  1. Allow - to appear in chart names (should be fine because the current set of allowed chars is too restrictive as it is). This would require us to update name handling inside backends/, exporters/ and one or two places under web/.
  2. Remove all usages of rrdset_find_byname, ie. start using a set's name only for presenting stuff to the user and use only type.id for searching.
  3. Fix the naming issue in the "offending" collector. ie. instead of assigning incremental names from within netdata, do so at the collector level.

@vlvkobal
Copy link
Contributor Author

vlvkobal commented Feb 9, 2022

this PR will break the hashing optimization

Does EVP_DigestUpdate(evpctx, st->name, strlen(st->name)); make a different unique context hash for every chart with different name? Then it isn't correct. Our protocol allows changing of chart names and a chart with a different name doesn't mean a different chart until it has the same id.

@vlvkobal
Copy link
Contributor Author

Allow - to appear in chart names

It will just lower a bit the probability of clashes which is already extremely low. It won't eliminate the possibility, despite it being very low, of a segmentation fault.

Remove all usages of rrdset_find_byname, ie. start using a set's name only for presenting stuff to the user and use only type.id for searching.

Why should we do that? We can just delete st->name from hash_id. @stelfrag what do you think?

Fix the naming issue in the "offending" collector.

It won't prevent getting the wrong output from any other collector. Netdata shouldn't crash if it gets the wrong chart name in the input.

@vlvkobal vlvkobal merged commit 32cd848 into netdata:master Feb 21, 2022
@vlvkobal vlvkobal deleted the fix-chart-names branch February 21, 2022 14:56
@vkalintiris
Copy link
Contributor

@vlvkobal I think we need to update name in the plugins.d docs as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Segmentation fault when cups server running
5 participants