Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of qemu PCIe to PCI bridges on x86 causes multiple problems #1726

Open
dgibson opened this issue Apr 22, 2021 · 18 comments
Open

Use of qemu PCIe to PCI bridges on x86 causes multiple problems #1726

dgibson opened this issue Apr 22, 2021 · 18 comments
Assignees
Labels
area/QEMU Issues specific to the qemu hypervisor area/runtime Issues that impact the runtime (including shimv2) bug Incorrect behaviour enhancement Improvement to an existing feature

Comments

@dgibson
Copy link
Contributor

dgibson commented Apr 22, 2021

Description of problem

On x86 systems, the Kata qemu backend always adds a PCI bridge. This is qemu's pci-bridge device type, which will be a PCI to PCI bridge on the pc machine type and PCIe to PCI on q35. Hotplugged block, net and vhost-user devices always go on this bridge. Hotplugged VFIO devices also go on the bridge by default (if hotplug_vfio_on_root_bus=true is not set).

Using a bridge like this conveniently provides 32 pluggable slots and makes pc and q35 behave more similarly. However, at least with current qemu and guest kernel it forces use of the SHPC hotplug protocol, which has some severe drawbacks.

SHPC is designed with physical devices and human operators in mind, and so has a 5s delay built into the protocol to allow accidental plugs to be reversed. Since a 5s delay in startup isn't acceptable, we work around this in the agent by forcing a PCI rescan which bypasses the proper operation of SHPC and locates the device early. Unfortunately, that workaround causes other problems

  • An SHPC hotplug can sometimes race with the rescan in a way which means that the guest kernel misinterprets the SHPC interrupt as a request to unplug the device, meaning the device appears very briefly then is removed again
  • For passed through VFIO devices, an even more severe error can occur which can put the device into an unusable state until a host reboot.

For these reasons #683 exists to remove the rescan from the agent, however doing so means SHPC hotplugs will have an unacceptable delay.

For VFIO devices, the use of a PCI bridge has additional problems beyond SHPC.

  • If the passed through device is PCI-Express (essentially all likely use cases), then putting it under the bridge means the guest will see it as a plain PCI device instead. Depending on the device this may mean that the guest can't drive it properly.
  • Because pre-Express PCI bridges don't preserve Requestor IDs, all devices behind a PCI bridge will always be in the same IOMMU group in the guest, even if they are in separate groups in the host. That means
    • If the container was intending to use separate VFIO devices from separate userspace drivers (e.g. DPDK), they won't be able to
    • Worse, if the container has both attached block and net devices (managed by the guest kernel) and vfio devices they intended to use with userspace drivers, they won't be able to at all (a whole IOMMU group must belong either to the kernel or to userspace).
@dgibson dgibson added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Apr 22, 2021
@dgibson dgibson self-assigned this Apr 22, 2021
@dgibson
Copy link
Contributor Author

dgibson commented Apr 22, 2021

@marcel-apf @fidencio for your attention.

@marcel-apf
Copy link
Contributor

As discussed offline with @dgibson, since the ACPI hoptlug for the PCIe-PCI bridge solves the problem only partially, it seems the best approach would be to use a single slot with 8 PCIe root ports (functions) and use them for hot-plug.
Two issues with this approach:

  • What if an user wants more devices? It can be configurable, I assume that 8 ports would be OK for most use cases.
  • More root ports increases the boot time: We can try to tackle the issue and speed up the discovery, it will be faster than the SHPC's 5 seconds anyway.

@dgibson
Copy link
Contributor Author

dgibson commented Apr 22, 2021

Right. There are some other things that will have to be sorted out to follow that approach as well:

  • Currently, Kata supports (in fact, defaults to) the pc machine type, which can't take root ports, so we'll need to sort out what to do with that (Kata qemu backend handles machine types poorly #1727 is somewhat related)
  • Neither the runtime nor the agent currently handle non-zero functions properly. That at least isn't too hard to fix now that I've implemented the PCI path stuff

@fidencio
Copy link
Member

Right. There are some other things that will have to be sorted out to follow that approach as well:

* Currently, Kata supports (in fact, defaults to) the `pc` machine type, which can't take root ports, so we'll need to sort out what to do with that (#1727 is somewhat related)

@dgibson, @marcel-apf,
I don't think we should spend much time figuring out what to do with pc machine type, as I think we should rather stop using it in favour of q35.

/cc @devimc @kata-containers/architecture-committee for inputs on moving on and mostly caring about the q35 machine type.

@devimc
Copy link

devimc commented Apr 22, 2021

@fidencio +1 for q35

@dgibson when I implemented this (hotplug devices, etc) in kata a couple of year ago, I wrote a PoC for using root ports,
in addition to the limitations already mentioned by @marcel-apf, memory footprint was also impacted negatively, but the main reason for not using them (root ports) was the limitation in the number of devices supported, we don't know how many devices will be attached to the POD, so I suggest (if this is possible), to continue using PCI(e) bridges for non-VFIO devices (i.e block devices, virtiofs, etc) and rootports for VFIO devices (GPUs, NICs, etc), since we don't know how many VFIO devices will be hot plugged into the POD, the default number of root ports should be 0 and the user should be able to change it through the configuration file and/or annotations, what do you think ?

@marcel-apf
Copy link
Contributor

marcel-apf commented Apr 22, 2021

@fidencio another +1 for q35, I don't see a reason to continue to support multiple machines in the kata, while q35 has features the pc machine doesn't support.

@devimc regarding the issue of not knowing how many devices will be hotplugged, we have the same problem with the pc machine where we have about 16 (20?) empty slots. What if the user needs more? We will need to add another PCI-PCI bridge to have more available slots (at configuration time).

In q35 the problem is we have no empty slots by default, but what if we would? I would re-check the effects of adding a single device (having 8 PCI Root Ports as its PCI functions) - both memory footprint and the boot time. In this way q35 will behave somehow similar to the pc machine, but with less available slots.

About plugging the virtio devices into a PCI slot (Behind a PCIe-PCI bridge), AFAIK we will lose the "Pure Virtio 1.X" mode (disable-legacy=on,disable-modern=off) and we will support only Transitional virtio (disable-legacy=off,disable-modern=off), I can't say I can measure the impact, however the whole point of using q35 is to avoid the legacy mode, and that would do the opposite.

I like the configuration approach, but I do think we should have 8(less? see below) pcie-root-ports available by default and let the user add more, however I like way more the dynamic approach based on annotations, if feasible.

One last note about the concern of using the pcie-root-ports. We can actually plug more than one virtio NIC (or other device type) to a pcie-root-port. We can plug up to 8 virtio-pci-{} functions into a single pcie-root-port. If the virtio devices support ARI (do they?) we can have up to 256 virtio-pci-{} functions into that pcie-root-port. That works also for hotplug.

If we don't care about the hot-unplug granularity (the PCI functions hot-plugged together must be hot-unplugged together), something like this could work:

pcie-root-bus
|
| - slot - pcie-root-port - func 1 virtio-pci...
| ----------------------------- func 2 virtio-pci...
| ----------------------------- ...
| --------- pcie-root-port - empty [can be used to hotplug several vfio/virtio PCI functions]
| --------- ...

In conclusion 1 slot, 8 pcie-root-ports, up to 8 PCI functions per pcie-root-port can give us up to 64 PCI functions.

@devimc
Copy link

devimc commented Apr 22, 2021

@marcel-apf

About plugging the virtio devices into a PCI slot (Behind a PCIe-PCI bridge), AFAIK we will lose the "Pure Virtio 1.X" mode (disable-legacy=on,disable-modern=off) and we will support only Transitional virtio (disable-legacy=off,disable-modern=off), I can't say I can measure the impact, however the whole point of using q35 is to avoid the legacy mode, and that would do the opposite.

it would be great to know/measure what's the impact of this (if any)

I like the configuration approach, but I do think we should have 8(less? see below) pcie-root-ports available by default and let the user add more, however I like way more the dynamic approach based on annotations, if feasible.

we'll be wasting resources (CPU/Memory) if these root ports are only used for VFIO devices, since not all workloads need VFIO

If we don't care about the hot-unplug granularity (the PCI functions hot-plugged together must be hot-unplugged together),

we do care, devices must be hot unplugged when the container is restarted or destroyed, so maybe we should have 1 root port per container?


I'm not against using root ports for hot plugging alll the devices, but If we are going to follow that path, we should at least know the impact of that decision and evaluate it (are we willing to pay for it?)

@marcel-apf
Copy link
Contributor

@devimc

@marcel-apf

About plugging the virtio devices into a PCI slot (Behind a PCIe-PCI bridge), AFAIK we will lose the "Pure Virtio 1.X" mode (disable-legacy=on,disable-modern=off) and we will support only Transitional virtio (disable-legacy=off,disable-modern=off), I can't say I can measure the impact, however the whole point of using q35 is to avoid the legacy mode, and that would do the opposite.

it would be great to know/measure what's the impact of this (if any)

I suspect the impact would be related to some features that may or may not influence the performance. For example leaving disable-legacy=off (transitional mode) will allow the guest to use IO instead of MMIO to communicate with the PCI devices. For kata we control the guest, so I suppose we can check and "fix" if necessary. In some cases IO access can be actually faster, the point remains we need to be aware of it.

I totally agree we need to understand all the implications.

I like the configuration approach, but I do think we should have 8(less? see below) pcie-root-ports available by default and let the user add more, however I like way more the dynamic approach based on annotations, if feasible.

we'll be wasting resources (CPU/Memory) if these root ports are only used for VFIO devices, since not all workloads need VFIO

CPU will be used only on the system boot to configure the devices, after that we are ok. We do increase the memory footprint as both QEMU and the guest will allocate resources for them. It would be really interesting to see how much.
We also waste the IO/MMIO ranges allocated to the pcie-root-ports not being used. IO resources tend to be scarce, but "modern" virtio devices and PCIe devices don't need IO in order to work.

However my intention was to use the pcie-root-ports for all devices, not only VFIO.

If we don't care about the hot-unplug granularity (the PCI functions hot-plugged together must be hot-unplugged together),

we do care, devices must be hot unplugged when the container is restarted or destroyed, so maybe we should have 1 root port per container?

That could work! Sorry for the possibly dumb question, but do we know the pod configuration (containers count, sriov devices for each container, other pertinent information that can help us to compute the number of pcie-root-ports needed) by the time we start the kata vm? If so, we only need to know if we need only 1 pcie-root-port (8 PCI functions) or more for each container.

I'm not against using root ports for hot plugging alll the devices, but If we are going to follow that path, we should at least know the impact of that decision and evaluate it (are we willing to pay for it?)

Agreed.

@devimc
Copy link

devimc commented Apr 22, 2021

@marcel-apf

That could work! Sorry for the possibly dumb question, but do we know the pod configuration (containers count, sriov devices for each container, other pertinent information that can help us to compute the number of pcie-root-ports needed) by the time we start the kata vm? If so, we only need to know if we need only 1 pcie-root-port (8 PCI functions) or more for each container.

we don't, containers are created one at time and the runtime (or kata-shim) only knows the information of the containers that is being created, sadly (I think) user or cluster operator should provide that magic number (root ports) when the sandbox (POD) is created

@marcel-apf
Copy link
Contributor

@devimc

@marcel-apf

That could work! Sorry for the possibly dumb question, but do we know the pod configuration (containers count, sriov devices for each container, other pertinent information that can help us to compute the number of pcie-root-ports needed) by the time we start the kata vm? If so, we only need to know if we need only 1 pcie-root-port (8 PCI functions) or more for each container.

we don't, containers are created one at time and the runtime (or kata-shim) only knows the information of the containers that is being created,

too bad

sadly (I think) user or cluster operator should provide that magic number (root ports) when the sandbox (POD) is created

I think the pc machine "solves" the issue by having about 20 empty slots and plugging each PCI function into a different slot, seeing it as a "whole" PCI device. It will suffer the exact same problem if a pod will have too many containers/devices.

For q35 we can assume we will not need more than 1 pcie-root-port per container (and make it configurable), but how many containers should we assume (and make it also configurable)? ...

I am back to my previous "configuration", one slot, 8 pci-root-ports (assume max 8 containers), 8 PCI functions per container.
The cluster admin should be able to configure both values. I am not convinced yet a VFIO device will require a "whole" pcie-root-port.

All of the above pending the "cost" research.

@devimc
Copy link

devimc commented Apr 22, 2021

@marcel-apf

how many containers should we assume (and make it also configurable)?

this is impossible

I am back to my previous "configuration", one slot, 8 pci-root-ports (assume max 8 containers), 8 PCI functions per container.
The cluster admin should be able to configure both values.

sounds good to me, but this new "limitation" should be discussed (may in the next arch meeting? cc @kata-containers/architecture-committee) and if this is accepted, very well documented.

I am not convinced yet a VFIO device will require a "whole" pcie-root-port.

what about Large Bar Space PCI devices (GPUs > 4GB)? such devices may need the whole root port

@marcel-apf
Copy link
Contributor

@devimc

@marcel-apf

I am back to my previous "configuration", one slot, 8 pci-root-ports (assume max 8 containers), 8 PCI functions per container.
The cluster admin should be able to configure both values.

sounds good to me, but this new "limitation" should be discussed (may in the next arch meeting? cc @kata-containers/architecture-committee) and if this is accepted, very well documented.

Sure.

I am not convinced yet a VFIO device will require a "whole" pcie-root-port.

what about Large Bar Space PCI devices (GPUs > 4GB)? such devices may need the whole root port

In this case the Guest OS will re-trigger resources allocation no matter if there is one or multiple PCI functions hot-plugged into the pcie-root-port. The reason is the firmware will not reserve enough MMIO (for devices with huge BARs) for an empty pcie-root-port. Once the resources re-allocation is triggered, all the PCI functions under the pcie-root-port will be re-evaluated and MMIO space will be allocated.

BTW, not directly related, we do have a way to speed things up and give a hint to the firmware regarding how much MMIO (64 bit) should allocate to a pcie-root-port so the Guest OS will not need to re-allocate:
-device pcie-root-port,pref64-reserve=8G
The IOMMU space of 64 bit VMs is not that expensive and we can preallocate "enough" MMIO to not trigger the Guest OS MMIO re-allocation, speeding things up.

@dgibson
Copy link
Contributor Author

dgibson commented Apr 22, 2021

@dgibson, @marcel-apf,
I don't think we should spend much time figuring out what to do with pc machine type, as I think we should rather stop using it in favour of q35.

I was pretty much hoping that would be the opinion.

@dgibson
Copy link
Contributor Author

dgibson commented Apr 22, 2021

@dgibson when I implemented this (hotplug devices, etc) in kata a couple of year ago, I wrote a PoC for using root ports,
in addition to the limitations already mentioned by @marcel-apf, memory footprint was also impacted negatively, but the main reason for not using them (root ports) was the limitation in the number of devices supported, we don't know how many devices will be attached to the POD, so I suggest (if this is possible), to continue using PCI(e) bridges for non-VFIO devices (i.e block devices, virtiofs, etc) and rootports for VFIO devices (GPUs, NICs, etc), since we don't know how many VFIO devices will be hot plugged into the POD, the default number of root ports should be 0 and the user should be able to change it through the configuration file and/or annotations, what do you think ?

I'm afraid that won't do what we need. The non-VFIO devices will still be using SHPC, which means using the rescan hack to cut down the startup time. The rescan is global (to a whole PCI domain), which means it can mess up VFIO devices even if they're on root ports.

@dgibson
Copy link
Contributor Author

dgibson commented Apr 23, 2021

@marcel-apf using multiple root ports in a single slot via multifunction sounds good to me (though we will need Kata changes to accomodate it). I'm much more hesitant about placing different devices we want to attach as different functions of the same virtual PCI slot. For one thing, I thought qemu's hotplug granularity was effectively whole slot, not just "plugged as a group must be unplugged as a group".

I'm even more hesitant about rearranging functions for VFIO devices. Here we're passing through a real PCI device that would be typically expected to occupy a slot, so I'd be concerned that userspace drivers that aren't expecting it to be a function in a multifunction slot will get confused.

@marcel-apf
Copy link
Contributor

marcel-apf commented Apr 23, 2021

@dgibson

@marcel-apf using multiple root ports in a single slot via multifunction sounds good to me (though we will need Kata changes to accomodate it). I'm much more hesitant about placing different devices we want to attach as different functions of the same virtual PCI slot. For one thing, I thought QEMU's hotplug granularity was effectively whole slot, not just "plugged as a group must be unplugged as a group".

Indeed, there is room for only one PCI device per slot, but a PCI device may have several PCI functions and QEMU fully supports hot-plug of multi-function PCI devices. The hotplug operations are per slot/device and this is the reason we need to hotplug/hot-unplug all the PCI functions as a single operation.

I see no reason why we shouldn't hotplug at least the virtio-pci-{type} as functions of the same device. Doing so will actually save PCI bus numbers, since each PCIe device uses a "whole" PCIe bus and we have only 256 PCIe buses in the system.

I'm even more hesitant about rearranging functions for VFIO devices. Here we're passing through a real PCI device that would be typically expected to occupy a slot, so I'd be concerned that userspace drivers that aren't expecting it to be a function in a multifunction slot will get confused.

I am always concerned when it comes to hot-plugging VFIO devices.
Regarding your specific concern, we are already re-arranging stuff, we take an SR-IOV Virtual function which is s PCI function belonging to a multi-function PCI device and we pass it to the VM as a non multi-function PCIe device.

However, if you think is too risky we can:

  • have the VFIO devices use a whole pcie-root-port (VFs belonging to the same host PCI device can be grouped together)
  • make it configurable and allow the admin to tweak kata configuration if there are issues.

@dgibson
Copy link
Contributor Author

dgibson commented Apr 23, 2021

@dgibson

@marcel-apf using multiple root ports in a single slot via multifunction sounds good to me (though we will need Kata changes to accomodate it). I'm much more hesitant about placing different devices we want to attach as different functions of the same virtual PCI slot. For one thing, I thought QEMU's hotplug granularity was effectively whole slot, not just "plugged as a group must be unplugged as a group".

Indeed, there is room for only one PCI device per slot, but a PCI device may have several PCI functions and QEMU fully supports hot-plug of multi-function PCI devices. The hotplug operations are per slot/device and this is the reason we need to hotplug/hot-unplug all the PCI functions as a single operation.

Ah, I see you mean. That's probably a good end point to aim for, but I don't think it will be great for a first cut. At the moment Kata hotplugs the devices as they're discovered, to implement what you're suggesting we'd have to batch them up and collate them, which adds yet another significant reworking of the code to the several we already have here as prerequisites.

I see no reason why we shouldn't hotplug at least the virtio-pci-{type} as functions of the same device. Doing so will actually save PCI bus numbers, since each PCIe device uses a "whole" PCIe bus and we have only 256 PCIe buses in the system.

I'm even more hesitant about rearranging functions for VFIO devices. Here we're passing through a real PCI device that would be typically expected to occupy a slot, so I'd be concerned that userspace drivers that aren't expecting it to be a function in a multifunction slot will get confused.

I am always concerned when it comes to hot-plugging VFIO devices.
Regarding your specific concern, we are already re-arranging stuff, we take an SR-IOV Virtual function which is s PCI function belonging to a multi-function PCI device and we pass it to the VM as a non multi-function PCIe device.

Yeah, it's almost certainly going to be fine for SR-IOV virtual functions. But Kata doesn't really enforce or care whether we're passing a VF or a PF and it's the case of PFs that I'm more worried about.

However, if you think is too risky we can:

* have the VFIO devices use a whole pcie-root-port (VFs belonging to the same host PCI device can be grouped together)

* make it configurable and allow the admin to tweak kata configuration if there are issues.

@dgibson dgibson closed this as completed Apr 23, 2021
Issue backlog automation moved this from To do to Done Apr 23, 2021
@dgibson dgibson reopened this Apr 23, 2021
Issue backlog automation moved this from Done to To do Apr 23, 2021
@fidencio fidencio added area/QEMU Issues specific to the qemu hypervisor area/runtime Issues that impact the runtime (including shimv2) enhancement Improvement to an existing feature and removed needs-review Needs to be assessed by the team. labels Apr 26, 2021
@marcel-apf
Copy link
Contributor

@dgibson @fidencio @devimc
I am sorry but I cannot make it to today's architecture meeting. Anyway, everything that is being discussed here is pending research before considering it.

@ariel-adam ariel-adam moved this from To do to area agent/runtime in Issue backlog Apr 27, 2021
cmaf added a commit to cmaf/kata-containers that referenced this issue Sep 8, 2021
Update OpenTelemetry from v0.15.0 to v0.20.0.

    Git log

    02d8bdd5 Release v0.20.0 (kata-containers#1837)
    aa66fe75 OS and Process resource detectors (kata-containers#1788)
    7374d679 Fix Links documents (kata-containers#1835)
    856f5b84 Add feature request issue template (kata-containers#1831)
    0fdc3d78 Remove bundler from Jaeger exporter (kata-containers#1830)
    738ef11e Fix flaky global ErrorHandler delegation test (kata-containers#1829)
    e43d9c00  Update Default Value for Jaeger Exporter Endpoint  (kata-containers#1824)
    0032bd64 Fix default merging of resource attributes from environment variable (kata-containers#1785)
    96c5e4ba Add SpanProcessor example for Span annotation on start (kata-containers#1733)
    543c8144 Remove the WithSDKOptions from the Jaeger exporter (kata-containers#1825)
    66389ad6 Update function docs in sdk.go (kata-containers#1826)
    70bc9eb3 Adds support for timeout on the otlp/gRPC exporter (kata-containers#1821)
    081cc61d Update Jaeger exporter convenience functions (kata-containers#1822)
    1b9f16d3 Remove the WithDisabled option from Jaeger exporter (kata-containers#1806)
    6867faa0 Bump actions/cache from v2.1.4 to v2.1.5 (kata-containers#1818)
    a2bf04dc Build context pipeline in Jaeger upload process (kata-containers#1809)
    2de86f23 Remove locking from Jaeger exporter shutdown/export (kata-containers#1807)
    4f9fec29 Add ExportSpans benchmark to Jaeger exporter (kata-containers#1805)
    d9566abe Fix OTLP testing flake: signal connection from mock collector (kata-containers#1816)
    a2cecb6e add support for env var configuration to otlp/gRPC (kata-containers#1811)
    d616df61 Fix flaky OTLP exporter reconnect test (kata-containers#1814)
    b09df84a Changes stdout to expose the `*sdktrace.TracerProvider` (kata-containers#1800)
    04890608 Remove options field from Jaeger exporter (kata-containers#1808)
    6db20e00 Remove the abandoned Process struct in Jaeger exporter (kata-containers#1804)
    086abf34 docs: use test example to document prometheus.InstallNewPipeline (kata-containers#1796)
    d0cea04b Bump google.golang.org/api from 0.43.0 to 0.44.0 in /exporters/trace/jaeger (kata-containers#1792)
    99c477fe Fixed typo for default service name in Jaeger Exporter (kata-containers#1797)
    95fd8f50 Bump google.golang.org/grpc from 1.36.1 to 1.37.0 in /exporters/otlp (kata-containers#1791)
    9b251644 Zipkin Exporter: Use default resouce's serviceName as default serivce name (kata-containers#1777) (kata-containers#1786)
    4d141e47 Add k8s.node.name and k8s.node.uid to semconv (kata-containers#1789)
    5c99a34c Fix golint issue caused by incorrect comment (kata-containers#1795)
    c5d006c0 Update Jaeger environment variables (kata-containers#1752)
    58432808 add NewExportPipeline and InstallNewPipeline for otlp (kata-containers#1373)
    7d8e6bd7 Zipkin Exporter: Adjust span transformation to comply with the spec (kata-containers#1688)
    2817c091 Merge sdk/export/trace into sdk/trace (kata-containers#1778)
    c61e654c Refactor prometheus exporter tests to match file headers as well (kata-containers#1470)
    23422c56 Remove process config for Jaeger exporter (kata-containers#1776)
    0d49b592 Add test to check bsp ignores `OnEnd` and `ForceFlush` post Shutdown` (kata-containers#1772)
    e9aaa04b Record links/events attribute drops independently (kata-containers#1771)
    5bbfc22c Make ExportSpans for Jaeger Exporter honor deadline (kata-containers#1773)
    0786fe32 Add Bug report issue templates (kata-containers#1775)
    3c7facee Add `ExportTimeout` option to batch span processor (kata-containers#1755)
    c6b92d5b Make TraceFlags spec-compliant (kata-containers#1770)
    ee687ca5 Bump github.com/itchyny/gojq from 0.12.2 to 0.12.3 in /internal/tools (kata-containers#1774)
    52a24774 add support for configuring tls certs via env var to otlp/HTTP (kata-containers#1769)
    35cfbc7e Update precedence of event name in Jaeger exporter (kata-containers#1768)
    33699d24 Adds semantic conventions for exceptions (kata-containers#1492)
    928e3c38 Modify ForceFlush to abort after timeout/cancellation (kata-containers#1757)
    3947cab4 Fix testCollectorEndpoint typo and add tag assertions in jaeger_test (kata-containers#1753)
    ecc635dc add website docs (kata-containers#1747)
    07a8d195 Fix Jaeger span status reporting and unify tag keys (kata-containers#1761)
    4fa35c90 add partial support for env var config to otlp/HTTP (kata-containers#1758)
    bf180d0f improve OTLP/gRPC connection errors (kata-containers#1737)
    d575865b Fix span IsRecording when not sampling (kata-containers#1750)
    20c93b01 Update SamplingParameters (kata-containers#1749)
    97501a3f Update SpanSnapshot to use parent SpanContext (kata-containers#1748)
    604b05cb Store current Span instead of local and remote SpanContext in context.Context (kata-containers#1731)
    c61f4b6d Set @lizthegrey to emeritus status (kata-containers#1745)
    b1342fec Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1743)
    54e1bd19 Bump google.golang.org/api from 0.41.0 to 0.43.0 in /exporters/trace/jaeger (kata-containers#1741)
    4d25b6a2 Bump github.com/prometheus/client_golang from 1.9.0 to 1.10.0 in /exporters/metric/prometheus (kata-containers#1740)
    0a47b66f Bump google.golang.org/grpc from 1.36.0 to 1.36.1 in /exporters/otlp (kata-containers#1739)
    26f006b8 Reinstate @paivagustavo as an Approver (kata-containers#1734)
    382c7ced Remove hasRemoteParent field from SDK span (kata-containers#1728)
    862a5a68 Remove setting error status while recording error with Span from oteltest package (kata-containers#1729)
    6defcfdf Remove links on NewRoot spans (kata-containers#1726)
    a9b2f851 upgrade thrift to v0.14.1 in jaeger exporter (kata-containers#1712)
    5a6a854d Bump google.golang.org/protobuf from 1.25.0 to 1.26.0 in /exporters/otlp (kata-containers#1724)
    23486213 Migrate to using go.opentelemetry.io/proto/otlp (kata-containers#1713)
    5d559b40 Remove makeSamplingDecision func (kata-containers#1711)
    e24702da Update the TraceContext.Extract docs (kata-containers#1720)
    9d4eb1f6 Update dates in CHANGELOG.md for 2021 releases (kata-containers#1723)
    2b4fa968 Release v0.19.0 (kata-containers#1710)
    4beb7041 sdk/trace: removing ApplyConfig and Config (kata-containers#1693)
    1d42be16 Rename WithDefaultSampler TracerProvider option to WithSampler and update docs (kata-containers#1702)
    860d5d86 Add flag to determine whether SpanContext is remote (kata-containers#1701)
    0fe65e6b Comply with OpenTelemetry attributes specification (kata-containers#1703)
    88884351 Bump google.golang.org/api from 0.40.0 to 0.41.0 in /exporters/trace/jaeger (kata-containers#1700)
    345f264a breaking(zipkin): removes servicName from zipkin exporter. (kata-containers#1697)
    62cbf0f2 Populate Jaeger's Span.Process from Resource (kata-containers#1673)
    28eaaa9a Add a test to prove the Tracer is safe for concurrent calls (kata-containers#1665)
    8b1be11a Rename resource pkg label vars and methods (kata-containers#1692)
    a1539d44 OpenCensus metric exporter bridge (kata-containers#1444)
    77aa218d Fix issue kata-containers#1490, apply same logic as in the SDK (kata-containers#1687)
    9d3416cc Fix synchronization issues in global trace delegate implementation (kata-containers#1686)
    58f69f09 Span status from HTTP code: Do not set status message if it can be inferred (kata-containers#1681)
    9c305bde Flush metric events prior to shutdown in OTLP example (kata-containers#1678)
    66b1135a Fix CHANGELOG (kata-containers#1680)
    90bd4ab5 Update employer information for maintainers (kata-containers#1683)
    36841913 Remove WithRecord() option from trace.SpanOption when starting a span (kata-containers#1660)
    65c7de20 Remove trace prefix from NoOp src files. (kata-containers#1679)
    e88a091a Make SpanContext Immutable (kata-containers#1573)
    d75e2680 Avoid overriding configuration of tracer provider (kata-containers#1633)
    2b4d5ac3 Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1671)
    150b868d Bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (kata-containers#1667)
    76aa924e Fix the examples target info messaging (kata-containers#1676)
    a3aa9fda Bump github.com/itchyny/gojq from 0.12.1 to 0.12.2 in /internal/tools (kata-containers#1672)
    a5edd79e Removed setting error status while recording err as span event (kata-containers#1663)
    e9814758 chore(zipkin): improves zipkin example to not to depend on timeouts. (kata-containers#1566)
    3dc91f2d Add ForceFlush method to TracerProvider (kata-containers#1608)
    bd0bba43 exporter: swap pusher for exporter (kata-containers#1656)
    56904859 Update the SimpleSpanProcessor (kata-containers#1612)
    a7f7abac  SpanStatus description set only when status code is set to Error (kata-containers#1662)
    05252f40 Jaeger Exporter: Fix minor mapping discrepancies (kata-containers#1626)
    238e7c61 Add non-empty string check for attribute keys (kata-containers#1659)
    e9b9aca8 Add tests for propagation of Sampler Tracestate changes (kata-containers#1655)
    875a2583 Add docs on when reviews should be cleared (kata-containers#1556)
    7153ef2d Add HTTP/JSON to the otlp exporter (kata-containers#1586)
    62e2a0f7 Unexport the simple and batch SpanProcessors (kata-containers#1638)
    992837f1 Add TracerProvider tests to oteltest harness (kata-containers#1607)
    bb4c297e Pre release v0.18.0 (kata-containers#1635)
    712c3dcc Fix makefile ci target and coverage test packages (kata-containers#1634)
    841d2a58 Rename local var new to not collide with builtin (kata-containers#1610)
    13938ab5 Update SpanProcessor docs (kata-containers#1611)
    e25503a0 Add compatibility tests to CI (kata-containers#1567)
    1519d959 Use reasonable interval in sdktrace.WithBatchTimeout (kata-containers#1621)
    7d4496e0 Pass metric labels when transforming to gaugeArray (kata-containers#1570)
    6d4a5e0d Bump google.golang.org/grpc from 1.35.0 to 1.36.0 in /exporters/otlp (kata-containers#1619)
    a93393a0 Bump google.golang.org/grpc in /example/prom-collector (kata-containers#1620)
    e499ca86 Fix validation for tracestate with vendor and add tests (kata-containers#1581)
    43886e52 Make timestamps sequential in lastvalue agg check (kata-containers#1579)
    37688ef6 revent end-users from implementing some interfaces (kata-containers#1575)
    85e696d2 Updating documentation with an working example for creating NewExporter (kata-containers#1513)
    562eb28b Unify the Added sections of the unreleased changes (kata-containers#1580)
    c4cf1aff Fix Windows build of Jaeger tests (kata-containers#1577)
    4a163bea Fix stdout TestStdoutTimestamp failure with sleep (kata-containers#1572)
    bd4701eb Stagger timestamps in exact aggregator tests (kata-containers#1569)
    b94cd4b2 add code attributes to semconv package (kata-containers#1558)
    78c06cef Update docs from gitter to slack for communication (kata-containers#1554)
    1307c911 Remove vendor exclude from license-check (kata-containers#1552)
    5d2636e5 Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1565)
    d7aff473 Vendor Thrift dependency (kata-containers#1551)
    298c5a14 Update span limits to conform with OpenTelemetry specification (kata-containers#1535)
    ecf65d79 Rename otel/label -> otel/attribute (kata-containers#1541)
    1b5b6621 Remove resampling on span.SetName (kata-containers#1545)
    8da52996 fix: grpc reconnection  (kata-containers#1521)
    3bce9c97 Add Keys() method to propagation.TextMapCarrier (kata-containers#1544)
    0b1a1c72 Make oteltest.SpanRecorder into a concrete type (kata-containers#1542)
    7d0e3e52 SDK span no modification after ended (kata-containers#1543)
    7de3b58c Remove extra labels types (kata-containers#1314)
    73194e44 Bump google.golang.org/api from 0.39.0 to 0.40.0 in /exporters/trace/jaeger (kata-containers#1536)
    8fae0a64 Create resource.Default() with required attributes/default values (kata-containers#1507)
    76f93422 Release v0.17.0 (kata-containers#1534)
    9b242bc4 Organize API into Go modules based on stability and dependencies (kata-containers#1528)
    e50a1c8c Bump actions/cache from v2 to v2.1.4 (kata-containers#1518)
    a6aa7f00 Bump google.golang.org/api from 0.38.0 to 0.39.0 in /exporters/trace/jaeger (kata-containers#1517)
    38efc875 Code Improvement - Error strings should not be capitalized (kata-containers#1488)
    6b340501 Update default branch name (kata-containers#1505)
    b39fd052 nit: Fix comment to be up-to-date (kata-containers#1510)
    186c2953 Fix golint error of package comment form (kata-containers#1487)
    9308d662 Bump google.golang.org/api from 0.37.0 to 0.38.0 in /exporters/trace/jaeger (kata-containers#1506)
    1952d7b6 Reverse order of attribute precedence when merging two Resources (kata-containers#1501)
    ad7b4715 Remove build flags for runtime/trace support (kata-containers#1498)
    4bf4b690 Remove inaccurate and unnecessary import comment (kata-containers#1481)
    7e19eb6a Bump google.golang.org/api from 0.36.0 to 0.37.0 in /exporters/trace/jaeger (kata-containers#1504)
    c6a4406a Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1503)
    9524ac09 Update workflows to include main branch as trigger (kata-containers#1497)
    c066f15e Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /internal/tools (kata-containers#1478)
    894e0240 Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1477)
    71ffba39 Bump google.golang.org/grpc from 1.34.0 to 1.35.0 in /exporters/otlp (kata-containers#1471)
    515809a8 Bump github.com/itchyny/gojq from 0.12.0 to 0.12.1 in /internal/tools (kata-containers#1472)
    3e96ad1e gitignore: remove unused example path (kata-containers#1474)
    c5622777 Histogram aggregator functional options (kata-containers#1434)
    0df8cd62 Rename Makefile.proto to avoid interpretation as proto file (kata-containers#1468)
    979ff51f Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 (kata-containers#1453)
    1df8b3b8 Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /exporters/otlp (kata-containers#1456)
    4c30a90a Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /sdk (kata-containers#1455)
    5a9f8f6e Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/stdout (kata-containers#1454)
    7786f34c Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/zipkin (kata-containers#1457)
    4352a7a6 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/otlp (kata-containers#1460)
    6990b3b3 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/metric/prometheus (kata-containers#1461)
    7af40d22 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/jaeger (kata-containers#1463)
    f16f1892 Bump google.golang.org/grpc in /example/otel-collector (kata-containers#1465)
    fe363be3 Move Span Event to API (kata-containers#1452)
    43922240 Bump google.golang.org/grpc in /example/prom-collector (kata-containers#1466)
    0aadfb27 Prepare release v0.16.0 (kata-containers#1464)
    207587b6 Metric histogram aggregator: Swap in SynchronizedMove to avoid allocations (kata-containers#1435)
    c29c6fd1 Shutdown underlying span exporter while shutting down BatchSpanProcessor (kata-containers#1443)
    dfece3d2 Combine the Push and Pull metric controllers (kata-containers#1378)
    74deeddd Handle tracestate in TraceContext propagator  (kata-containers#1447)
    49f699d6 Remove Quantile aggregation, DDSketch aggregator; add Exact timestamps (kata-containers#1412)
    9c949411 Rename internal/testing to internal/internaltest (kata-containers#1449)
    8d809814 Move gRPC driver to a subpackage and add an HTTP driver (kata-containers#1420)
    9332af1b Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1445)
    5ed96e92 Update exporters/otlp Readme.md (kata-containers#1441)
    bc9cb5e3 Switch CircleCI badge to GitHub Actions (kata-containers#1440)
    716ad082 Remove CircleCI config (kata-containers#1439)
    0682db1e Adding Security Workflows to GitHub Actions (2/2): gosec workflow (kata-containers#1429)
    11f732b8 Adding Security Workflows to GitHub Actions (1/2): codeql workflow (kata-containers#1428)
    40f1c003 Add Tracestate into the SamplingResult struct (kata-containers#1432)
    db06c8d1 Flush metric events before shutdown in collector example (kata-containers#1438)
    f6f458e1 Fix golint issue caused by typo in trace.go (kata-containers#1436)
    fe9d1f7e Use uint64 Count consistently in metric aggregation (kata-containers#1430)
    3a337d0b Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1433)
    1e4c8321 cleanup: drop the removed examples in gitignore (kata-containers#1427)
    5c9221cf Unify endpoint API that related to OTel exporter (kata-containers#1401)
    045c3ffe Build scripts: Replace mapfile with read loop for old bash versions (kata-containers#1425)
    2def8c3d Add Versioning Documentation (kata-containers#1388)
    6bcd1085 Bump github.com/itchyny/gojq from 0.11.2 to 0.12.0 in /internal/tools (kata-containers#1424)
    38e76efe Add a split protocol driver for otlp exporter (kata-containers#1418)
    439cd313 Add TraceState to SpanContext in API (kata-containers#1340)
    35215264 Split connection management away from exporter (kata-containers#1369)
    add9d933 Bump github.com/prometheus/client_golang from 1.8.0 to 1.9.0 in /exporters/metric/prometheus (kata-containers#1414)
    93d426a1 Add @dashpole as a project Approver (kata-containers#1410)
    6fe20ef3 Fix small typo (kata-containers#1409)
    b22d0d70 Mention the getting started guide (kata-containers#1406)
    3fb80fb2 Fix duplicate checkout action in GitHub workflow (kata-containers#1407)
    2051927b Correct CI workflow syntax (kata-containers#1403)
    f11a86f7 Fix typo in comment (kata-containers#1402)
    bdf87a78 Migrate CircleCI ci.yml workflow to GitHub Actions (kata-containers#1382)
    4e59dd1f Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /example/otel-collector (kata-containers#1400)
    83513f70 Bump google.golang.org/api from 0.32.0 to 0.36.0 in /exporters/trace/jaeger (kata-containers#1398)
    a354fc41 Bump github.com/prometheus/client_golang from 1.7.1 to 1.8.0 in /exporters/metric/prometheus (kata-containers#1397)
    3528e42c Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /exporters/otlp (kata-containers#1396)
    af114baf Call otel.Handle with non-nil errors (kata-containers#1384)
    c3c4273e Add RO/RW span interfaces (kata-containers#1360)

Fixes kata-containers#2591

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/QEMU Issues specific to the qemu hypervisor area/runtime Issues that impact the runtime (including shimv2) bug Incorrect behaviour enhancement Improvement to an existing feature
Projects
Issue backlog
  
area agent/runtime
Development

No branches or pull requests

4 participants