Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBSD Ports Patches (WIP) #4192

Closed
wants to merge 16 commits into from

Conversation

williamleuschner
Copy link

@williamleuschner williamleuschner commented Sep 14, 2018

This PR is to track my progress patching netdata for OpenBSD. I'll let you guys know when it's ready to merge, and I'll also comment on any changes that require explanation.

TODOs:

  • Compile without OS-specific plugins and errors
  • Cleanly package from ports Makefile
  • Install and run without issues
  • Compile without OS-specific plugins and warnings
  • Write OpenBSD OS plugin
  • Write PF plugin

src/health.c Outdated
@@ -369,7 +369,7 @@ void *health_main(void *ptr) {
if(unlikely(check_if_resumed_from_suspention())) {
apply_hibernation_delay = 1;

info("Postponing alarm checks for %ld seconds, because it seems that the system was just resumed from suspension."
info("Postponing alarm checks for %lld seconds, because it seems that the system was just resumed from suspension."
Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are compiler warning fixes about the size of time_t. You can reject them if you like.

Copy link
Member

@ktsaou ktsaou Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time_t is 64 bit on 64 bit systems, but I think they are 32 bit on 32 bit systems (they follow size_t). Can you check if %zu gives waning? If it does not, I think we will avoid warnings on 32 bit systems by using %zu.

Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that z was a valid modifier for printf(3) format strings! I'll change all of those lls to zs.

Copy link
Author

@williamleuschner williamleuschner Sep 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Interesting. On my test machine, ssize_t is not the same size as time_t.

health.c:372:15: warning: format specifies type 'ssize_t' (aka 'long') but the argument
      has type 'time_t' (aka 'long long') [-Wformat]
            , hibernation_delay
              ^~~~~~~~~~~~~~~~~
./log.h:76:71: note: expanded from macro 'info'
#define info(args...)    info_int(__FILE__, __FUNCTION__, __LINE__, ##args)
                                                                      ^~~~
health.c:386:24: warning: format specifies type 'ssize_t' (aka 'long') but the argument
      has type 'time_t' (aka 'long long') [-Wformat]
                     , hibernation_delay
                       ^~~~~~~~~~~~~~~~~
./log.h:76:71: note: expanded from macro 'info'
#define info(args...)    info_int(__FILE__, __FUNCTION__, __LINE__, ##args)
                                                                      ^~~~

I think OpenBSD defines time_t as 64 bits, regardless of platform size, to prevent issues in 2038. On my 64-bit Linode (running 64-bit OpenBSD), ssize_t is long, but time_t is long long.

src/ipc.c Outdated
*/
#ifndef HAVE_UNION_SEMUN
#if !defined(HAVE_UNION_SEMUN) && !defined(__OpenBSD__)
Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenBSD doesn't define this macro, but it does have union semun {…} defined already.

Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is the perfect thing to check for in a ./configure script. I’ve never used autotools from a developer’s perspective before, but there’s no time like the present!

src/log.c Outdated
@@ -337,7 +337,7 @@ static const char *strerror_result_string(const char *a, const char *b) { (void)

void error_int( const char *prefix, const char *file, const char *function, const unsigned long line, const char *fmt, ... ) {
// save a copy of errno - just in case this function generates a new error
int __errno = errno;
Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, OpenBSD defines an __errno function as part of the expansion for errno, so this name caused a compilation error.

src/main.c Outdated
@@ -57,13 +57,15 @@ struct netdata_static_thread static_threads[] = {
#elif defined(__APPLE__)
// macOS internal plugins
{"PLUGIN[macos]", CONFIG_SECTION_PLUGINS, "macos", 1, NULL, NULL, macos_main},
#elif defined(__OpenBSD__)
// OpenBSD internal plugins
Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will populate this later, after I copy the FreeBSD plugin and see what needs changing.

src/popen.c Outdated
int options = WEXITED;
#endif
int proc_status;
if(waitpid(pid, &proc_status, options) > 0) {
Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell of the FreeBSD and OpenBSD man pages, this is an appropriate replacement. It's confusing though, so I might not be correct.

Copy link
Member

@ktsaou ktsaou Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... I would be more comfortable if you move this waitpid() code to a new static function and override the whole function for OpenBSD, so that it runs as it was in Linux and FreeBSD.

Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, OpenBSD tries to avoid patching things by adding #ifdefs, because they make the code harder to debug and more error-prone. waitpid() is available on every platform, so if I can confirm that it behaves the same way, it would be a better replacement. After work today, I’ll pull both versions of this function out into a test program and test every code path to make sure they behave the same way.

Copy link
Member

@ktsaou ktsaou Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the source of waitid(): https://code.woboq.org/userspace/glibc/sysdeps/posix/waitid.c.html
It calls waitpid(), but does a few more things:

options = WEXITED may not work for waitpid(). The code removes it explicitly at the waitpid() call:

child = __waitpid (pid, &status,
options
#ifdef WEXITED
&~ WEXITED
#endif
);

Then it uses many macros to set the result we need (more than WIFSIGNALED):

if (WIFEXITED (status))
{
infop->si_code = CLD_EXITED;
infop->si_status = WEXITSTATUS (status);
}
else if (WIFSIGNALED (status))
{
infop->si_code = WCOREDUMP (status) ? CLD_DUMPED : CLD_KILLED;
infop->si_status = WTERMSIG (status);
}
else if (WIFSTOPPED (status))
{
infop->si_code = CLD_STOPPED;
infop->si_status = WSTOPSIG (status);
}
...

Copy link
Author

@williamleuschner williamleuschner Sep 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely forgot that I could read the source code for waitid(2), which makes this way easier!

I based my patch on the original arguments to waitid(2). It wasn't passed anything in the options argument that would trigger the other conditions in that if/elseif/else chain, so I didn't include them. Was the original function meant to hit those code paths?

Copy link
Author

@williamleuschner williamleuschner Sep 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for options = WEXITED, that's my own fault. I realized as I was laying in bed last night that I had misunderstood the manual pages I was consulting, and it wasn't necessary on other platforms. Here's OpenBSD's man page.

src/threads.c Outdated
return (pid_t)syscall(SYS_gettid);

#endif /* __FreeBSD__, __APPLE__*/
return (pid_t)pthread_self;
Copy link
Author

@williamleuschner williamleuschner Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spelunked through old OS X header files to confirm that the pthread_self() function exists prior to 10.6. Assuming it works, pthread_self() is part of the POSIX standard, and this nest of #ifdefs can be entirely removed!

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 14, 2018

Current stopping point:

ipc.c:103:24: error: variable has incomplete type 'struct seminfo'
        struct seminfo seminfo = {.semmni = 0};
                       ^
ipc.c:103:16: note: forward declaration of 'struct seminfo'
        struct seminfo seminfo = {.semmni = 0};
               ^
ipc.c:142:20: error: variable has incomplete type 'struct seminfo'
    struct seminfo seminfo;
                   ^
ipc.c:142:12: note: forward declaration of 'struct seminfo'
    struct seminfo seminfo;
           ^
ipc.c:147:31: error: use of undeclared identifier 'SEM_INFO'
    if(unlikely(semctl (0, 0, SEM_INFO, arg) < 0)) {
                              ^
3 errors generated.

I know less about semaphores than I do about signals and I'm tired, so I'm going to leave this one for me of tomorrow evening to deal with.

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 14, 2018

Please make sure your code compiles without errors/warnings on all systems, when compiled like this:

CFLAGS="-O1 -ggdb -Wall -Wextra -Wformat-signedness -fstack-protector-all -DNETDATA_INTERNAL_CHECKS=1 -D_FORTIFY_SOURCE=2 -DNETDATA_VERIFY_LOCKS=1"

Not all flags may be available on all systems. But try to use as many as possible.

So, on Linux, I expect this to produce no compiler warnings at all:

CFLAGS="-O1 -ggdb -Wall -Wextra -Wformat-signedness -fstack-protector-all -DNETDATA_INTERNAL_CHECKS=1 -D_FORTIFY_SOURCE=2 -DNETDATA_VERIFY_LOCKS=1" ./netdata-installer.sh --enable-plugin-nfacct --enable-plugin-freeipmi --disable-lto --dont-wait

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 14, 2018

also keep in mind, for reference I always use the latest version of gcc, usually on Arch or Manjaro Linux:

# gcc --version
gcc (GCC) 8.2.1 20180831
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and the latest version of clang:

# clang --version
clang version 6.0.1 (tags/RELEASE_601/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

clang does not support -Wformat-signedness. The rest are the same.

(replace format string arguments that substitute size_t or time_t types with %zd)
@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 15, 2018

Please make sure your code compiles without errors/warnings on all systems

Of course! What kind of developer would I be if I didn't bother to check my code for regressions :P

when compiled like this

Thanks for the reference commands, I appreciate it.

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 19, 2018

Current stopping point:

ipc.c causes netdata to fail to build on OpenBSD but succeed on FreeBSD because it isn't compiled on FreeBSD. It's a Linux-specific thing. I thought I found all of the places where an exception was made for FreeBSD, but I guess I missed one…

@CLAassistant
Copy link

@CLAassistant CLAassistant commented Sep 19, 2018

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 19, 2018

It compiles now! It doesn't work yet, and there are no plugins for OpenBSD, so it's kinda useless, but that's still progress.

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 19, 2018

This pull request fixes 4 alerts when merging 32faf7f into e6c118d - view on LGTM.com

fixed alerts:

  • 4 for FIXME comment

Comment posted by LGTM.com

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 20, 2018

One of my earlier comments got hidden by code review folding. OpenBSD defines ssize_t and time_t differently.

health.c:372:15: warning: format specifies type 'ssize_t' (aka 'long') but the argument
      has type 'time_t' (aka 'long long') [-Wformat]
            , hibernation_delay
              ^~~~~~~~~~~~~~~~~
./log.h:76:71: note: expanded from macro 'info'
#define info(args...)    info_int(__FILE__, __FUNCTION__, __LINE__, ##args)
                                                                      ^~~~
health.c:386:24: warning: format specifies type 'ssize_t' (aka 'long') but the argument
      has type 'time_t' (aka 'long long') [-Wformat]
                     , hibernation_delay
                       ^~~~~~~~~~~~~~~~~
./log.h:76:71: note: expanded from macro 'info'
#define info(args...)    info_int(__FILE__, __FUNCTION__, __LINE__, ##args)
                                                                      ^~~~

In order to ensure that there are no rollover problems in 2038, OpenBSD defines time_t as long long. The C standard doesn't define the minimum size of any other integral type large enough to hold 64 bits worth of data. So, although just plain long is large enough to hold 64 bits on a 64 bit platform, that leaves 32 bit systems open to problems (if they're still around in 2038).

While I could maintain a patch in the OpenBSD ports tree to convert the %zus to %llus, I think a better way might be to put %llus everywhere in the source code, and explicitly cast the argument to long long. It shouldn't cause any problems (I can't imagine a scenario where that would be a downcast), and the cast can be removed when other platforms get on board with properly-sized times :P

Thoughts?

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 20, 2018

I think a better way might be to put %llus everywhere in the source code

Yes, I agree.

Please also rebase your PR. netdata evolves and now we have conflicts...

# Conflicts:
#	src/rrd2json_api_old.c
#	src/web_api_old.c
@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 21, 2018

This pull request fixes 4 alerts when merging 42feab4 into 9ca9d49 - view on LGTM.com

fixed alerts:

  • 4 for FIXME comment

Comment posted by LGTM.com

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 21, 2018

This pull request fixes 4 alerts when merging 5f1a469 into 9ca9d49 - view on LGTM.com

fixed alerts:

  • 4 for FIXME comment

Comment posted by LGTM.com

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 21, 2018

Alright, so I've gotten netdata to successfully compile. However, I think I've messed something up, because the first three lines of output the compiled binary produces are:

2018-09-20 22:47:11: netdata ERROR : MAIN : PROCFILE: Cannot open file '/proc/stat' (errno 2, No such file or directory)
2018-09-20 22:47:11: netdata ERROR : MAIN : Cannot open file '/proc/stat'. Assuming system has 1 processors.
2018-09-20 22:47:11: netdata ERROR : MAIN : Cannot open file '/proc/sys/kernel/pid_max'. Assuming system supports 32768 pids. (errno 2, No such file or directory)

Is there someplace else where netdata looks at /proc? I thought I checked, and all of it was in the Linux-specific code I reconfigured it not to compile?

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Sep 21, 2018

This pull request fixes 4 alerts when merging 9ec28ae into 9ca9d49 - view on LGTM.com

fixed alerts:

  • 4 for FIXME comment

Comment posted by LGTM.com

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Sep 22, 2018

Nope, I'm just a bit oblivious. Some functions in src/common.c have OS-switching macros.

@paulfantom paulfantom force-pushed the master branch 4 times, most recently from 9134341 to 70de864 Compare Nov 2, 2018
@vlvkobal vlvkobal added the area/packaging label Nov 19, 2018
@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Dec 12, 2018

Nope, this is still a work-in-progress. I don't know why GitHub thinks I want to request code review.

@Ferroin
Copy link
Member

@Ferroin Ferroin commented Dec 13, 2018

@williamleuschner It's because we've got GitHub configured to automatically request review from people responsible for maintaining the code being modified. You don't have to worry about it as you've made it clear that you're not ready for review yet, but when you are, the people GitHub automatically tagged for review should all ideally take a look at the code.

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Dec 22, 2018

I'm redoing my patches to work with netdata's new architecture. As part of that, I'm revisiting some of the changes I made before, like the one in thread.c:28. My original patch replaced the entire body of this function with just pthread_self(3), which is part of POSIX and is available on all platforms. However, this creates a type problem, since pthread_self(3) returns a pthread_t (struct pthread *), which is an opaque type and is not convertible to pid_t.

According to grep, the only places this function is used are in log messages, and in checking to ensure that the correct thread is accessing the web client cache. In my estimation, the best way to make this cross platform is to replace the gettid function with pthread_self(3), and change the web client cache to compare those values using pthread_equal(3). What are your thoughts?

@cakrit
Copy link
Contributor

@cakrit cakrit commented Dec 24, 2018

the best way to make this cross platform is to replace the gettid function with pthread_self(3), and change the web client cache to compare those values using pthread_equal(3). What are your thoughts?

I see that you can still do printf with a %d and the equality will work if it's changed, so I don't see any problems with this one. @ktsaou any reason it wasn't used before? @mfundul what do you think?

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Dec 27, 2018

I see that you can still do printf with a %d

Well, it's a pointer, so %zd, but yeah.

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Dec 28, 2018

The PID returned by gettid() is useful in a few cases, this is why it is logged.

For example, a netdata thread may enter an infinite loop, or use a lot of CPU for some reason. Using top or htop we can find the offending PID and then grep netdata/error.log for that PID to see which netdata thread is the faulty one.

Of course we can replace gettid() with something better, but please allow the above to still happen. Otherwise, we will have a hard time finding problems like the above.

@cakrit
Copy link
Contributor

@cakrit cakrit commented Dec 28, 2018

How about using gettid() only when compiling in debug mode? We already have several ifdefs with NETDATA_INTERNAL_CHECKS.

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Dec 28, 2018

On OpenBSD, it is impossible to get the ID of a thread in any way other than pthread_self. top and htop can show threads, but the sysctl they use to get that information doesn’t help our situation, because there is no way to match the OS’s thread ID with the pthread_t.

Should I ifdef gettid to return 0 on OpenBSD, and replace the web cache usages with pthread_self?

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Dec 28, 2018

Well, since gettid() is not implemented and the value returned by pthread_self() cannot be associated with the thread id reported by top or htop, I guess it does not matter what we would return. Somehow though, I would prefer to return a unique id per thread, so that people will not be confused while seeing the same ID on all threads.

I see various posts about getting a modulo pthread_self() or even truncating the result of pthread_self(). But I don't know if they are right.

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Dec 28, 2018

Modulo and truncation are probably both not right, but of the two, modulo would be better. It's a pointer value, so if everything else expects the value to be an int, pthread_self()%INT_MAX might work?

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Dec 29, 2018

I also don't know if there's any guarantee that the pointer value returned from pthread_self(3) is the same for every call in a given thread. We're supposed to compare it with pthread_equal(3), so it might not be…

@cakrit
Copy link
Contributor

@cakrit cakrit commented Dec 30, 2018

I also don't know if there's any guarantee that the pointer value returned from pthread_self(3) is the same for every call in a given thread. We're supposed to compare it with pthread_equal(3), so it might not be…

Well, the man page does say that

The pthread_self() function returns the ID of the calling thread. This is the same value that is returned in *thread in the pthread_create(3) call that created this thread.

So that shouldn't be an issue.

It sounds like truncation and modulo are just so that people don't have to log very large strings. You'd only use pthread_equals if you wanted to do anything important with it (which netdata doesn't). Doing a modulo with INT_MAX kind of defeats the purpose of keeping it small. I saw some alternative ways to get something printable here.

To summarize, I think we need the following:

  • A function to return something printable, regardless of whether pthread_self or gettid is used. It would have an ifdef to use pthread_self when we're not in debug mode OR when gettid is not defined. gettid will be preferred when we're in debug mode, unless if it's not defined.

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented Jan 18, 2019

I'm still working on this! My current hangup is figuring out how to mesh my limited understanding of OpenBSD's packing list format with the packaging system's opinionated design choices. The tools complain mightily at my current idea for installing/uninstalling the code-as-configuration thing that netdata does in /etc and /usr/local/lib/netdata.

After I figure out the packaging problems, I'll put together a patch to implement that debug function and remove the non-POSIX parts from critical codepaths.

@cakrit
Copy link
Contributor

@cakrit cakrit commented May 28, 2019

@williamleuschner will you be able to progress further? We could perhaps close this PR and you create a new one when you've progressed sufficiently.

@cakrit
Copy link
Contributor

@cakrit cakrit commented May 28, 2019

@paulkatsoulakis in case you need some help with the packaging part

@williamleuschner
Copy link
Author

@williamleuschner williamleuschner commented May 28, 2019

Yes, I’ll be able to make progress now! The semester ended for me fairly recently, so I'm back to having free time for fun projects. Sorry for the lack of updates…

# Conflicts:
#	database/rrdset.c
#	web/api/formatters/rrdset2json.c
@cakrit
Copy link
Contributor

@cakrit cakrit commented Sep 17, 2019

Closing due to inactivity. Please comment if you wish to continue with this and we'll reopen it.

@cakrit cakrit closed this Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/packaging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants