Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirage-www on ARM crash #274

Closed
avsm opened this issue Jul 21, 2014 · 10 comments
Closed

mirage-www on ARM crash #274

avsm opened this issue Jul 21, 2014 · 10 comments

Comments

@avsm
Copy link
Member

avsm commented Jul 21, 2014

Trying to get mirage-www working on ARM and I get:

Console is on port 2
Console ring is at mfn 10400015
Found GIC: gicd_base = ac401000, gicc_base = ac402000
Mirage: start_kernel
MM: Init
    _text: 00408000(VA)
    _etext: 0055b43c(VA)
    _erodata: 00596000(VA)
    _edata: 0096b000(VA)
    stack start: 00400000(VA)
    _end: 0097c688(VA)
Found memory at 80000000 (len 0x10000000)
Using pages 525693 to 589824 as free space for heap.
MM: Initialise page allocator for 97d000(8057d000)-103ff000(8ffff000)
MM: done
Initialising timer interface
Virtual Count register is 1278e6f, freq = 24000000 Hz
Initialising console ... done.
FDT suggests grant table base b0000000
gnttab_table mapped at 30400000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
getenv(TMPDIR) -> null
getenv(TEMP) -> null
Netif: add resume hook
getenv(DEBUG) -> null
getenv(OMD_DEBUG) -> null
getenv(OMD_FIX) -> null
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: 00:16:3e:01:48:7b
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
Manager: Interface to 10.0.0.2 nm 255.255.255.0 gw [10.0.0.1]

 sg:true gso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false
ARP: sending gratuitous from 10.0.0.2
Manager: configuration done
Fault handler at 408184 called (prefetch_abort)
r0 = c55f0c
r1 = c740e4
r2 = 1
r3 = 8cee90
r4 = 1
r5 = c48664
r6 = 100f361c
r7 = 800
r8 = 403d58
r9 = 7fc00000
r10 = 100f3618
r11 = 10001000
r12 = c740e4
r13 = 3ffff8
r14 = 50f0d3
r15 = 50f0c6
CPSR = 200001f3

with the PC somewhere in Pervasives;

0050f0c0 <camlPervasives__$40_1135>:
  50f0c0:       b082            sub     sp, #8
  50f0c2:       f8cd e004       str.w   lr, [sp, #4]
  50f0c6:       2801            cmp     r0, #1
  50f0c8:       d016            beq.n   50f0f8 <.L251+0x26>
  50f0ca:       9000            str     r0, [sp, #0]
  50f0cc:       6840            ldr     r0, [r0, #4]
  50f0ce:       f7ff fff7       bl      50f0c0 <camlPervasives__$40_1135>

static_website works for me, so this is perhaps due to the size of the crunched mirage-www? I gave it 256MB which should be enough in the normal course of things.

@talex5
Copy link
Contributor

talex5 commented Jul 21, 2014

r13 (the stack pointer) is just before the start of RAM, so probably the stack is full.

@avsm
Copy link
Member Author

avsm commented Jul 21, 2014

makes sense; where can i bump the stack size?

@talex5
Copy link
Contributor

talex5 commented Jul 21, 2014

mini-os/arch/arm/include/arch_limits.h gives the thread stack size, but you'll also need to change the #if 1 in libxencaml/main.c so it uses a Mini-OS thread.

Alternatively, we could enlarge the boot stack by moving it to the .bss section in arm32.S. Let me see if I can reproduce the problem here.

@talex5
Copy link
Contributor

talex5 commented Jul 21, 2014

Mine's still installing, but you might be to just edit /home/mirage/.opam/system/lib/minios-xen/libminios.lds and relocate the stack by moving _boot_stack and _boot_stack_end inside .bss.

@talex5
Copy link
Contributor

talex5 commented Jul 21, 2014

Actually, better to put it in .data. Otherwise C will overwrite part of its own stack! This lets it serve up some pages (getting lots of TCP retransmissions though, so something still isn't quite right):

.data : {                     /* Data */
      *(.data)

      _boot_stack      = .;
      . += 0x40000;
      _boot_stack_end  = .;
      }

@avsm
Copy link
Member Author

avsm commented Jul 21, 2014

confirmed that boots, and also got tcp retransmissions here. bad checksums? i dont seem to be icmp loss from a ping flood:

$ sudo ping -f 192.168.2.8
PING 192.168.2.8 (192.168.2.8): 56 data bytes
.^C.
--- 192.168.2.8 ping statistics ---
8990 packets transmitted, 8989 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.468/0.580/2.373/0.074 ms

On 21 Jul 2014, at 06:59, Thomas Leonard notifications@github.com wrote:

Actually, better to put it in .data. Otherwise C will overwrite part of its own stack! This lets it serve up some pages (getting lots of TCP retransmissions though, so something still isn't quite right):

.data : { /* Data */
*(.data)

  _boot_stack      = .;
  . += 0x40000;
  _boot_stack_end  = .;
  }


Reply to this email directly or view it on GitHub.

@talex5
Copy link
Contributor

talex5 commented Jul 21, 2014

I don't know too much about TCP, but this is what came out of wireshark.

If I'm reading this right, Mirage sent two large data packets (Seq = 1527 and 2987) together, but Linux only ack'd the first one. 4 seconds later, Mirage retransmitted the second.

|Time           | 192.168.0.11                          |
|               |                     192.168.0.13      |                   
|0.000000000    |         SYN                           |Seq = 0
|               |(45961)  ------------------>  (80)     |
|0.000934000    |         SYN, ACK                      |Seq = 0 Ack = 1
|               |(45961)  <------------------  (80)     |
|0.001001000    |         ACK                           |Seq = 1 Ack = 1
|               |(45961)  ------------------>  (80)     |
|0.002235000    |         PSH, ACK - Len: 131           |Seq = 1 Ack = 1
|               |(45961)  ------------------>  (80)     |
|0.004012000    |         PSH, ACK - Len: 17            |Seq = 1 Ack = 132
|               |(45961)  <------------------  (80)     |
|0.004076000    |         ACK                           |Seq = 132 Ack = 18
|               |(45961)  ------------------>  (80)     |
|0.004500000    |         PSH, ACK - Len: 49            |Seq = 18 Ack = 132
|               |(45961)  <------------------  (80)     |
|0.004536000    |         ACK                           |Seq = 132 Ack = 67
|               |(45961)  ------------------>  (80)     |
|0.004735000    |         PSH, ACK - Len: 1460          |Seq = 67 Ack = 132
|               |(45961)  <------------------  (80)     |
|0.004767000    |         ACK                           |Seq = 132 Ack = 1527
|               |(45961)  ------------------>  (80)     |
|0.004987000    |         PSH, ACK - Len: 1460          |Seq = 1527 Ack = 132
|               |(45961)  <------------------  (80)     |
|0.005743000    |         PSH, ACK - Len: 458           |Seq = 2987 Ack = 132
|               |(45961)  <------------------  (80)     |
|0.005769000    |         ACK                           |Seq = 132 Ack = 1527
|               |(45961)  ------------------>  (80)     |
|4.001366000    |         PSH, ACK - Len: 1460          |Seq = 1527 Ack = 132
|               |(45961)  <------------------  (80)     |
|4.001439000    |         ACK                           |Seq = 132 Ack = 3445
|               |(45961)  ------------------>  (80)     |
|4.001743000    |         FIN, ACK                      |Seq = 132 Ack = 3445
|               |(45961)  ------------------>  (80)     |
|4.002666000    |         ACK                           |Seq = 3445 Ack = 133
|               |(45961)  <------------------  (80)     |

@mor1
Copy link
Member

mor1 commented Jul 21, 2014

if you can open trace in wireshark, it should tell you if checksums are
broken.

tcp/ip checksums are 16 bit only though eg http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation

looks to me like it's the first of the pair (seqno:1527--2987) that's
retransmitted (the late ack covers up to seqno=3445).
i can't immediately see why linux wouldn't ack the first copy of that, so
perhaps there is something wrong with that segment...?

On 21 July 2014 16:30, Thomas Leonard notifications@github.com wrote:

Should the TCP checksum be 32-bit? If so, this isn't going to work:

val ones_complement_list: Cstruct.t list -> int

Reply to this email directly or view it on GitHub
#274 (comment).

Richard Mortier
mort@cantab.net

@talex5
Copy link
Contributor

talex5 commented Jul 22, 2014

Just to record the outcome here. Looks like a problem with the Linux net backend. Here's a work-around:

http://lists.xenproject.org/archives/html/mirageos-devel/2014-07/msg00145.html

@avsm
Copy link
Member Author

avsm commented Nov 3, 2014

Fixed in mirage2

@avsm avsm closed this as completed Nov 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants