Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial minimal h2o webserver integration #14585

Merged
merged 43 commits into from May 10, 2023

Conversation

underhood
Copy link
Contributor

@underhood underhood commented Feb 22, 2023

Summary

Uses h2o as webserver for netdata. Currently will run alongside (on port 19998) the old webserver (port 19999) if enabled.
By default disabled. Enable by netdata.conf:

[httpd]
  enable = yes

Recommended to use SSL with:

[httpd]
  ssl = yes

In such case if you do https://127.0.0.1:19998 you will get HTTP 2 support (which works only over SSL).

Currently biggest limitation is no support for streaming. I will have to investigate how to do this without modifying h2o.

TODO:

  • children support
  • strange bug with content type not correct after refresh
  • CI/packaging fixes - i redid it to not use external cmake and build libh2o.a ourselves instead

This also uncovered multiple bugs on local dashboard code e.g.:

  • sometimes there is double slash in URL 127.0.0.1//api/v1/info that is requested by dashboar
  • if child dashboard is requested by url host/uuid instead of host/uuid/ our new webserver handles both url correctly (by sending index page for a chile) but dashboard will send subsequent request wrong if former URL form is used.
Test Plan
Additional Information
For users: How does this change affect me?

@github-actions github-actions bot added area/build Build system (autotools and cmake). area/daemon area/packaging Packaging and operating systems support labels Feb 22, 2023
@stelfrag
Copy link
Collaborator

To enable it as I see its [httpd] section with enabled = yes

@underhood
Copy link
Contributor Author

Yep i have to also fix couple of bugs and child support

@stelfrag
Copy link
Collaborator

Yep i have to also fix couple of bugs and child support

Can you add a small list of pending tasks in the description so that we can follow along?

@underhood
Copy link
Contributor Author

Yep i have to also fix couple of bugs and child support

Can you add a small list of pending tasks in the description so that we can follow along?

added, child support is working now (I redid it with "uberhandler" as opposed to registering paths for every child as I had it working previously, this way it is more effective)

@underhood
Copy link
Contributor Author

underhood commented Feb 25, 2023

@Ferroin looks like I will need a bit of a help with packaging stuff here
I am changing the so that we build the libh2o ourselves instead of calling their cmake build.

@github-actions github-actions bot added area/ci and removed area/packaging Packaging and operating systems support labels Feb 27, 2023
@underhood underhood changed the title [WIP] introduce h2o webserver initial minimal h2o webserver integration Feb 27, 2023
@underhood underhood marked this pull request as ready for review February 27, 2023 06:52
@underhood
Copy link
Contributor Author

underhood commented Feb 27, 2023

the only remaining single failure in CI is Build / Build Static (ppc64le) (pull_request) I am ignoring it because when I install alpine in QEMU emulating ppc64le netdata builds successfuly.
Screenshot from 2023-02-27 18-06-30

@andrewm4894
Copy link
Contributor

@underhood would it make sense to add this to buildinfo or anything like that if there could be some period of time where both old and new webserver can be run in parallel by users?

@Ferroin
Copy link
Member

Ferroin commented Feb 27, 2023

the only remaining single failure in CI is Build / Build Static (ppc64le) (pull_request) I am ignoring it because when I install alpine in QEMU emulating ppc64le netdata builds successfuly.

It’s failing on 64-bit ARM due to undefined function references, not on POWER (the POWER build job is just getting cancelled due to the 64-bit ARM build failing). Link to the failing logs for the most recent build as-of this comment: https://github.com/netdata/netdata/actions/runs/4279728307/jobs/7455048379

That said, this needs to be fixed before this gets merged, period. Without the all the static builds working, we do not publish any static builds for nightlies or releases (or any source tarballs for that matter), due to limitations in how GitHub Actions dependency handling between jobs works.

Copy link
Member

@Ferroin Ferroin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is there some reason the Makefile components need to be in the main Makefile instead of a sub-directory for the new web server?
  • Is there a particular reason that we are not even trying to support using a system copy of libh2o? I get that this may be a bit tricky, but I was under the impression that it was understood at this point that we need to have such support for any dependency that is both widely available and does not involve a custom patches (and AFAICT, libh2o fits both requirements) so that the few cases of third-party packaging that we do functionally support will actually build the relevant features. The fact that libh2o is going to ultimately be a mandatory dependency does not sidestep that issue with vendoring.

.gitmodules Show resolved Hide resolved
@underhood
Copy link
Contributor Author

underhood commented Feb 28, 2023

the only remaining single failure in CI is Build / Build Static (ppc64le) (pull_request) I am ignoring it because when I install alpine in QEMU emulating ppc64le netdata builds successfuly.

It’s failing on 64-bit ARM due to undefined function references, not on POWER (the POWER build job is just getting cancelled due to the 64-bit ARM build failing). Link to the failing logs for the most recent build as-of this comment: https://github.com/netdata/netdata/actions/runs/4279728307/jobs/7455048379

That said, this needs to be fixed before this gets merged, period. Without the all the static builds working, we do not publish any static builds for nightlies or releases (or any source tarballs for that matter), due to limitations in how GitHub Actions dependency handling between jobs works.

EDIT: just noticed it is failing 64-bit arm now, previously it was always failing ppc64le. so yep I will have to investigate this and fix.

Yes I see the failure in CI. That is why I installed Alpine in QEMU on ppc64le and tried to reproduce it. It built successfully. Also that error would indicate missing zlib however then we should fail at:

test "${with_zlib}" = "yes" -a "${have_zlib}" != "yes" && AC_MSG_ERROR([zlib required but not found. Try installing 'zlib1g-dev' or 'zlib-devel'.])

so I do not know why this fails and why only on PPC

@underhood
Copy link
Contributor Author

  • Is there a particular reason that we are not even trying to support using a system copy of libh2o? I get that this may be a bit tricky, but I was under the impression that it was understood at this point that we need to have such support for any dependency that is both widely available and does not involve a custom patches (and AFAICT, libh2o fits both requirements) so that the few cases of third-party packaging that we do functionally support will actually build the relevant features. The fact that libh2o is going to ultimately be a mandatory dependency does not sidestep that issue with vendoring.

I plan to add that in following PR. For now I went with what h2o itself recommends and that is static linking and intgrating h2o into the project

@underhood
Copy link
Contributor Author

underhood commented Mar 1, 2023

@Ferroin how to see here https://github.com/netdata/netdata/actions/runs/4279728307/jobs/7455048379
how netdata was configured (how the ./configure was called) and its output?
In past this was relatively easy to see, but now it is unclear to me what exactly this build step is doing/executing and also looking at .github files it is quite hard to follow.

@underhood
Copy link
Contributor Author

underhood commented Mar 1, 2023

after latest changes I seem to have correct behavior:

arch/distro zlib-dev installed --with-zlib (1) / without (0) build result netdata buildinfo HTTPD Correct result
aarch64(RPi4)
ubuntu
1 1 1 1 ✔️
aarch64(RPi4)
ubuntu
1 0 1 0 ✔️
aarch64(RPi4)
ubuntu
0 1 0
(configure fails with zlib req but not present - not link error)
N/A ✔️
aarch64(RPi4)
ubuntu
0 0 1 0 ✔️
ppc64le(qemu)
alpine-3.17.2
1 1 1 1 ✔️
ppc64le(qemu)
alpine-3.17.2
1 0 1 0 ✔️
ppc64le(qemu)
alpine-3.17.2
0 1 0
(configure fails with zlib req but not present - not link error)
N/A ✔️
ppc64le(qemu)
alpine-3.17.2
0 0 1 0 ✔️

Since I am not really sure what CI is doing here as it is using some caches etc. (by looking at the output it seems it is starting build from middle) and I dont see a way how to actually see commands used to build netdata and/or output of configure script I am not sure how to diagnose this.

@underhood
Copy link
Contributor Author

underhood commented Mar 1, 2023

@Ferroin @tkatsoulas
I am confused here. In qemu PPC64LE i get expected behaviour in all cases (see table above) with/without zlib yet it fails in CI.

I can't really follow the CI process here as it seems to use some kind of cache for the ppc64le. Can the problem be there, how was that cache generated? It looks like we are not doing full build here. How do I reproduce this locally?

@thiagoftsm
Copy link
Contributor

thiagoftsm commented Mar 1, 2023

Hello @underhood ,

I am starting testing your PR today, and I observed that we are having different warning when we compile with CFLAGS:

CFLAGS="-Og -ggdb -Wall -Wextra -fno-omit-frame-pointer -Wformat-signedness -fstack-protector-all -Wformat-truncation=2 -Wunused-result -DHAVE_BACKTRACE=1 -DNETDATA_TRACE_ALLOCATIONS=1 -DNETDATA_DEV_MODE=1 -DNETDATA_VERIFY_LOCKS=1"

In this file you have what was written in my stderr during compilation.

Of course the majority, if not all, are from H2O. Can we hide this?

@underhood
Copy link
Contributor Author

Hello @underhood ,

I am starting testing your PR today, and I observed that we are having different warning when we compile with CFLAGS:

CFLAGS="-Og -ggdb -Wall -Wextra -fno-omit-frame-pointer -Wformat-signedness -fstack-protector-all -Wformat-truncation=2 -Wunused-result -DHAVE_BACKTRACE=1 -DNETDATA_TRACE_ALLOCATIONS=1 -DNETDATA_DEV_MODE=1 -DNETDATA_VERIFY_LOCKS=1"

In this file you have what was written in my stderr during compilation.

Of course the majority, if not all, are from H2O. Can we hide this?

I am already hiding some (mostly ssl 3 deprecation warnings) i can add

httpd/http_server.c Outdated Show resolved Hide resolved
@underhood
Copy link
Contributor Author

@stelfrag fixed merge conflict caused by ML

BUT the WEBRTC PR went in meanwhile and changed the struct web_client so I will have to do some changes

@underhood
Copy link
Contributor Author

ok should be updated

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only could retest this PR after disable ML, but issue is not related to it, LGTM!

@underhood
Copy link
Contributor Author

thanks for review I will merge immediately after new release (to not delay current one in case of trouble)

@underhood underhood merged commit 66c4735 into netdata:master May 10, 2023
132 of 133 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build system (autotools and cmake). area/ci area/daemon area/web
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants