Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpcp internal error when recording metrics with pmrep #1327

Closed
myllynen opened this issue Jun 18, 2021 · 2 comments
Closed

libpcp internal error when recording metrics with pmrep #1327

myllynen opened this issue Jun 18, 2021 · 2 comments

Comments

@myllynen
Copy link
Contributor

These three commands work:

pmrep -i 1 -o archive -F ./foo proc.memory.vmhwm,,,
pmrep -i 1 -o archive -F ./foo proc.memory.vmhwm,,,MB
pmrep -i 1,2 -o archive -F ./foo proc.memory.vmhwm,,,

But this fails with unexpected error message:

pmrep -i 1,2 -o archive -F ./foo proc.memory.vmhwm,,,MB
Error: _pmi_stuff_value: vset realloc:: malloc(-197760) failed: Cannot allocate memory

I'm not sure is pmrep(1) doing the right thing here but I suspect libpcp error handling could be improved? After that we could see if changes are warranted on the pmrep side as well.

Thanks.

@natoscott
Copy link
Member

Hi Marko,

I think there's two problems here - one in the C library (libpcp_import) and the other in the python wrapper. I'll look into the first but I wonder if you could have a poke around the python library code for the latter in parallel?

The C library problem (and cause of the diagnostic message) appears to be a failure to handle an error code (PM_ERR_CONV) correctly when constructing the pmResult structure being written into the archive.

The python problem is that we're passing a floating point number like "16.78515625" (as a string) for a type u32 metric via the pmiPutValue(3) routine. I expect this is because of the change in scale that the additional "MB" part of your pmrep invocation induces. This may have been a by-product of the changes in integer / floating point handling between python 2 and 3 but that's a wild guess.

$ pminfo 3.24.35 -d

proc.memory.vmhwm
    Data Type: 32-bit unsigned int  InDom: 3.9 0xc00009
    Semantics: instant  Units: Kbyte

natoscott added a commit to natoscott/pcp that referenced this issue Jun 20, 2021
Marko encountered a double-fault failure condition where we
would attempt to use a prior error code in memory allocation
size calculations of a subsequent attempt to add an instance
value.

New C test program added to poke the exact failure condition
and we run it in both with/without valgrind configurations -
so two new tests.

Related to performancecopilot#1327
@myllynen
Copy link
Contributor Author

Thanks for taking care of the libpcp part, I've fixed pmrep/pmconfig.py with:

3806df8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants