Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iPXE nic_drv - memory allocation failed in alloc_memblock #593

Closed
dwaddington opened this issue Dec 21, 2012 · 35 comments
Closed

iPXE nic_drv - memory allocation failed in alloc_memblock #593

dwaddington opened this issue Dec 21, 2012 · 35 comments
Assignees

Comments

@dwaddington
Copy link

I am trying to use the iPXE NIC driver on a Dell Optiplex 990 workstation with Intel 82579LM Gigabit NIC card. I am using the l4linux test. 32-bit Fiasco.OC

It seems to start but then goes into an endless cycle of reporting "memory allocation failed in alloc_memblock" (see below).

Any thoughts on how to go about debugging this?

[init -> nic_drv] --- init iPXE NIC
[init -> nic_drv] scan_pci(): Found: 00:19.0 8086:1502 (rev 04) IRQ 05
[init -> nic_drv] probe_pci_device(): using driver 82579lm
[init -> nic_drv] adjust_pci_device(): PCI device 00:19.0 latency timer is unreasonabl.
[init -> nic_drv] ioremap(): bus_addr = e1a00000 len = 20000
[init -> nic_drv] snprintf not implemented
[init -> nic_drv] number of devices: 1
[init -> nic_drv] --- init rx_callbacks
[init -> nic_drv] --- get MAC address
[init -> nic_drv] 18:03:73:28:fffffffa:62
Quota exceeded! amount=4096, size=4096, consumed=4096
[init -> l4linux] upgrade quota donation for SIGNAL session
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock
[init -> nic_drv] memory allocation failed in alloc_memblock

@nfeske
Copy link
Member

nfeske commented Dec 22, 2012

From your log output, I see that dde_ipxe selects the 82579lm driver for your platform. On qemu, where we regularly test nic_drv, the message states using driver 82540em instead. Maybe, the 82579lm driver requires more memory? Have you tried extending the RAM quote for nic_drv (in the run script)?

Generally, for solving such problems, I would grep for the error message ("memory allocation failed in alloc_memblock") in the dde_ipxe repository, print the argument values the function was called with. Maybe the specified size or alignment values are bogus? Or, more likely, the driver ran out of memory? In either case, I would then track back the error from there.

@iloskutov
Copy link
Contributor

I have the same message when I have tried to flood l4linux. After this message l4linux in qemu hangs.
On real hardware I have loop of the same messages. When I have finished flood, message loop doesn't stop.

@dwaddington
Copy link
Author

Increasing the quota for nic_drv does not help.

@nfeske
Copy link
Member

nfeske commented Jan 7, 2013

Maybe @chelmuth has a good idea of how to go about debugging this issue? (he is the original author of dde_ipxe)

@chelmuth
Copy link
Member

chelmuth commented Jan 7, 2013

Please give the following patch a try - maybe the allocator is not able to fulfill alignment requirements?

diff --git a/dde_ipxe/src/lib/dde_ipxe/dde_support.cc b/dde_ipxe/src/lib/dde_ipxe/dde_support.cc
index 33ccecc..a315e0f 100644
--- a/dde_ipxe/src/lib/dde_ipxe/dde_support.cc
+++ b/dde_ipxe/src/lib/dde_ipxe/dde_support.cc
@@ -60,7 +60,8 @@ extern "C" void *alloc_memblock(size_t size, size_t align)
 {
        void *ptr;
        if (allocator()->alloc_aligned(size, &ptr, log2(align)).is_error()) {
-               PERR("memory allocation failed in alloc_memblock");
+               PERR("memory allocation failed in alloc_memblock (size=%zd, align=%zx)",
+                    size, align);
                return 0;
        };
        return ptr;

chelmuth added a commit that referenced this issue Jan 7, 2013
As iPXE header files are not C++ compatible, the implementation missed
proper include directives. For example, alloc_memblock() had a wrong
signature, which was not detected. Now, C wrapper functions are
implemented using a local API to the C++ backend.

Related to #593.
@iloskutov
Copy link
Contributor

@chelmuth last patch doesn't solve the issue. If I try flood network, I have an endless loop of messages again:

[init -> nic_drv.1] memory allocation failed in alloc_memblock (size=2048, align=800, offset=0)
[init -> nic_drv.1] memory allocation failed in alloc_memblock (size=2048, align=800, offset=0)
[init -> nic_drv.1] memory allocation failed in alloc_memblock (size=2048, align=800, offset=0)
...

@dwaddington
Copy link
Author

Driver works ok with lwip stack independent of L4Linux. Tested on same real machine.

@dwaddington
Copy link
Author

I now think this has something to do with LWIP calling LWIP_PLATFORM_DIAG internally. I don't think this gets connected to printf or equivalent.

@nfeske
Copy link
Member

nfeske commented Jan 12, 2013

@iloskutov Thanks for posting the output. The align values are clearly a problem. In alloc_iob_raw, the alignment is specified as a normal value whereas the dde_alloc_memblock expects a log2 value. In the output above, the driver wants to allocate a 2K-aligned block. But the allocator invoked by dde_alloc_memblock gets an alignment of 2^2K as argument. So it rightfully refuses to allocate. The align-argument should be converted to a log2 value, e.g. by using the log2 function in base/include/util/misc_math.h. Would you like to give this a try?

@chelmuth
Copy link
Member

Am I missing something or isn't this done in the following line?

./src/lib/dde_ipxe/dde_support.cc:63:   if (allocator()->alloc_aligned(size, &ptr, log2(align)).is_error()) {

@nfeske
Copy link
Member

nfeske commented Jan 13, 2013

Uh, you are right. Sorry for the wrong track.. :-/

By looking again in the file, I see that the used backing store is always 1M. Would it make sense to make it depend in the available ram quota instead? With the current implementation, my initial proposal to @dwaddington to increase the quota of the nic_drv has no effect on the amount of usable backing store.

@chelmuth
Copy link
Member

Grmpf, so I missed that drawback of the implementation. I would go for dynamically growing backing store until the quota limit is reached with my current knowledge about DDE iPXE: It uses the DDE kit allocator besides dde_alloc_memblock and the usage ratio seems to depend on the used driver. What do you think, @nfeske.

@nfeske
Copy link
Member

nfeske commented Jan 13, 2013

That sounds like a good way to go. :-)

@chelmuth
Copy link
Member

@iloskutov and @dwaddington could you please give the branch above a try?

@iloskutov
Copy link
Contributor

@chelmuth I'm going to work in few days and try it.

@iloskutov
Copy link
Contributor

@chelmuth I have tested on the Genode master branch with your patch. My run script is https://gist.github.com/4546722
Command for flood looks like:

hping3 -V -c 1000000 -d 120 -S -w 64 -p 5555 -s 445 --flood --rand-source 10.76.6.251

When I have finished flood on l4linux I see that memory continues to grow in your new allocator and l4linux hangs. I have tested it on qemu and real hardware. Log https://gist.github.com/4546771

@chelmuth
Copy link
Member

That's really strange. Your log shows that the allocator grows above 320K. Does it stop growing eventually?

@iloskutov
Copy link
Contributor

No, it doesn't stop. I didn't wait when it has finished.

@chelmuth
Copy link
Member

So, I have to give your run script and stress test a try, which will not happen before Monday unfortunately.

@iloskutov
Copy link
Contributor

Thank you.

@chelmuth
Copy link
Member

I did several test during the last 3 hours. Here are my results:

  • The scenario is really stable when using only ping -f, e.g., sudo ping -c 100000 -s 1400 -f 10.0.2.16
  • Using TCP and, therefore, adding latency in L4Linux results in the stability problems mentioned before.
    • sudo hping3 -q -c 100000 -d 120 -w 64 -S -p 5555 -s 445 -i u1000 10.0.2.16 works pretty good and uses size=20000 + chunk=8000 - also on consecutive calls.
    • If the interval is reduced further, e.g. -i u500, the driver seems to enter a state in which iobufs are not free'd correctly. Therefore, a log like that recorded by @iloskutov is produced.

I also tried to incorporate upstream fixes from the iPXE repository, which did not help. Unfortunately, I currently have no idea how to tackle this. One possible next step could be to build an iPXE ROM for Qemu and enter the command line on startup to interrupt the normal boot. Then, the original implementation could be stress-tested like our port.

@chelmuth
Copy link
Member

chelmuth commented Mar 4, 2013

@alex-ab could you please have a look at this my rebased branch above? I tested it only on OKL4 where it fails.

@alex-ab
Copy link
Member

alex-ab commented Mar 5, 2013

@chelmuth please see commit message above

@alex-ab
Copy link
Member

alex-ab commented Apr 15, 2013

With the commit c34bbe2caa86edc5ca61f9f0d92eb35a47f96892 I can't trigger the

memory allocation failed in alloc_memblock (size=2048, align=800, offset=0

messages anymore. Of course, the root cause is not solved. A bad nic_session client can still cause the driver to fail if it just don't consume any packets ...

Does the commit work for you @dwaddington, @iloskutov ?

@nfeske
Copy link
Member

nfeske commented Apr 15, 2013

@alex-ab Thanks for investigating. I am wondering, what will a bad NIC client be able to do besides cutting off its own network connection? For NIC bridge this would be fatal but is this really an issue for the driver?

@alex-ab
Copy link
Member

alex-ab commented Apr 15, 2013

What's about a nic driver acting in promiscuous mode and serving more than one nic_session client? I assumed the nic_session interface doesn't restrict the number of clients.

@nfeske
Copy link
Member

nfeske commented Apr 16, 2013

Indeed, the number of clients should not be restricted by the interface (I haven't yet understood how it would restrict the number of clients to a single one). The NIC bridge is a particular example for a NIC service with multiple clients. With my statement I referred to the NIC driver, which supports only one client anyway. With "root cause", are you pointing at the NIC driver implementation or the NIC session interface?

@alex-ab
Copy link
Member

alex-ab commented Apr 16, 2013

I'm refering to the nic driver implementation (dde_ipxe), where the number of nic_session clients is not restricted.

@chelmuth
Copy link
Member

dde_ipxe uses nic/component.h, which implements Nic::Root as Genode::Root_component<Session_component, Genode::Single_client>. So, dde_ipxe implicitly restricts the number of clients by usign the NIC-driver framework and just implementing the missing pieces.

@alex-ab
Copy link
Member

alex-ab commented Apr 16, 2013

Ok, I see, thanks for clarification.

@alex-ab
Copy link
Member

alex-ab commented Apr 26, 2013

@chelmuth: With the three last commits the 'memory allocation failed in alloc_memblock' messages don't appear anymore. Do we want to mark this issue as fixed or should I create a new issue, since you have still the rewrite of the memory allocation of dde_ipxe in the pipe ?

@chelmuth
Copy link
Member

IMO all relevant parts are in your version and, no, I've no work in progress regarding this issue.

@ghost ghost assigned alex-ab Apr 26, 2013
alex-ab added a commit that referenced this issue Apr 26, 2013
alex-ab added a commit that referenced this issue Apr 26, 2013
@alex-ab alex-ab closed this as completed in f6d31d7 May 3, 2013
cproc pushed a commit to cproc/genode that referenced this issue May 12, 2014
As iPXE header files are not C++ compatible, the implementation missed
proper include directives. For example, alloc_memblock() had a wrong
signature, which was not detected. Now, C wrapper functions are
implemented using a local API to the C++ backend.

Related to genodelabs#593.
cproc pushed a commit to cproc/genode that referenced this issue May 12, 2014
cproc pushed a commit to cproc/genode that referenced this issue May 12, 2014
cproc pushed a commit to cproc/genode that referenced this issue May 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants