Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net] Ether/ne2k interrupt problem #133

Closed
Mellvik opened this issue Jun 6, 2017 · 64 comments
Closed

[net] Ether/ne2k interrupt problem #133

Mellvik opened this issue Jun 6, 2017 · 64 comments
Assignees
Labels
problem Problem while using the product solved Problem solved

Comments

@Mellvik
Copy link
Contributor

Mellvik commented Jun 6, 2017

My ne2k ISA card is set to IRQ11, and does not support IRQ9. Changed ne2k.h: #define NE2K_IRQ 11
Compile and test. No cigar. Run on QEMU w/IRQ set to 11. Still no cigar. Set QEMU IRQ to 9. Voila!

So - changing the IRQ in ne2k.h does not seem to have any effects. Is this a bug or did I miss something?

@mfld-fr
Copy link
Contributor

mfld-fr commented Jun 6, 2017

Not really a bug. After your change, the ETH / NE2K driver requests the IRQ11, but in irqtab.S, the interrupt for the board is still fixed to IRQ9 (IRQ11 is surrounded by #if 0). You are the first one to need to set the interrupt number else than to 9 😄.

Could you please change that piece of code and test again, just to validate the fix, before we submit a clean patch ?

! IRQ9 is used by the Ethernet device

#ifdef CONFIG_ETH
_irq9:
	push    ax
	mov     ax, #9
	br      _irqit
#endif

#if 0
_irq10:
	push	ax
	mov	ax,#10
	br	_irqit
_irq11:
	push	ax
	mov	ax,#11
	jmp	_irqit

@Mellvik
Copy link
Contributor Author

Mellvik commented Jun 6, 2017 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented Jun 6, 2017

The basic test for the ETH / NE2K driver is the eth command, that you could build in /elks/elkscmd/test/eth/. It tests the connection to the local network with ARP requests / replies. Don't forget to update the IP addresses in the code with the ones of your real local network, because the defaults are for QEMU as configured in qemu.sh.

@lithoxs
Copy link
Collaborator

lithoxs commented Jun 6, 2017

Actually, two changes are needed in irqtab.S. The one pointed to by mfld-fr and another where the pointer to the ISR is placed in the interrupt vector table.
The code for irqtab.S has changed recently. The relevant parts for you case should now be:

#ifdef CONFIG_ETH_NE2K
#define METH 0x800
#else
...

#if 0
_irq9: ! Ethernet device
call _irqit
.byte 9
_irq10: ! USB
call _irqit
.byte 10
#endif
#ifdef CONFIG_ETH_NE2K
_irq11: ! Sound
call _irqit
.byte 11
#endif
#if 0
_irq12: ! Mouse
call _irqit

@lithoxs
Copy link
Collaborator

lithoxs commented Jun 6, 2017

Regarding the FPU interrupt, ELKS does not know anything about the FPU.

@Mellvik
Copy link
Contributor Author

Mellvik commented Jun 7, 2017 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented Aug 26, 2017

Hello, any news on that issue ?

@Mellvik
Copy link
Contributor Author

Mellvik commented Aug 27, 2017 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented Sep 6, 2017

Mmm... what is your CPU model ? Because I just remember I used some 80186 instructions in the Ne2k driver, and this could explain the failure.

@Mellvik
Copy link
Contributor Author

Mellvik commented Sep 6, 2017 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented Sep 6, 2017

Forget my last idea if you have a 80286.
About "never stops sending", could you please detail ?
Did you catch an Ethernet dump before the power supply break ?

@Mellvik
Copy link
Contributor Author

Mellvik commented Sep 6, 2017 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented Apr 12, 2018

Hello, any progress in your testing ? Were you able to test the HW with another OS ?

@Mellvik
Copy link
Contributor Author

Mellvik commented Apr 12, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented Apr 12, 2018

OK, thanks for the update. So the problem is specific to ELKS.

If you could connect point-to-point your target with the NE2K card to a PC, run the eth test program, reboot the target, then run the telnet client program, and capture the packets with WireShark, it would help us to diagnose.

@mfld-fr mfld-fr added the problem Problem while using the product label Apr 17, 2018
@Mellvik
Copy link
Contributor Author

Mellvik commented May 5, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 5, 2018

Could you please take a photo of your ISA card, so that we could identify the main chip and check its datasheet against the current implementation ?

You could also disable all the network applets in /etc/rc.d/rc.sysinit (ktcp, telnetd, httpd, etc), only run the /bin/eth command and check the captured packets.

And last but not least, you could add some printk() in ne2k-main.c to try to find what is looping.

@Mellvik
Copy link
Contributor Author

Mellvik commented May 6, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 6, 2018

Sorry, but I see no picture in your latest reply.

About /bin/eth, it looks like you forgot to set your build environment as explained in the main README : in the top directory (here .../elks-master), run . tools/env.sh before trying to invoke any make command.

Do you still have the legacy dev86 package installed on your system ? Please uninstall it if stills there : now ELKS relies on the auto-build & latest dev86 of @jbruchon, not the legacy one that is rather outdated and not able to run "in-tree".

@mfld-fr
Copy link
Contributor

mfld-fr commented May 6, 2018

By the way, I just saw that there is a mistake in /elkscmd/Makefile(114) that prevents the test programs to be automatically built : please replace ifdef $(CONFIG_APP_TEST) by ifdef CONFIG_APP_TEST. Just fixed it for the next PR.

@Mellvik
Copy link
Contributor Author

Mellvik commented May 6, 2018 via email

@jbruchon
Copy link
Collaborator

jbruchon commented May 6, 2018

You cannot post attachments of any kind (other than plain text) to Github issues; they are discarded and you're posting by email. Post a link to the image on an image host such as Imgur instead.

@mfld-fr
Copy link
Contributor

mfld-fr commented May 6, 2018

Hey... I also just realized you are not capturing the packets on a point-to-point connection, but you are using a router. So if you are sniffing the packets behind that router, you won't be able to see the packets that are actually sent by ELKS, if they are malformed. Please make the capture directly on the NE2K side, not behind a router.

@Mellvik
Copy link
Contributor Author

Mellvik commented May 6, 2018 via email

@Mellvik
Copy link
Contributor Author

Mellvik commented May 6, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 6, 2018

About the switch / router : I am wondering why the traffic LED on your switch / router is blinking while you capture no packet except the ARP ones ? I cannot explain this else than thinking that your switch / router discards some packets. This is why I am suggesting to connect the NE2K socket directly to the host socket where you run the capture program.

@mfld-fr
Copy link
Contributor

mfld-fr commented May 6, 2018

Thanks for the photos. Unfortunately, I cannot find any datasheet for the W89C90, W89C901 or W89C902. I only found some references in the W89C94 one, but this latest is for PCI and has no detailed information on the NE2K implementation.

@Mellvik
Copy link
Contributor Author

Mellvik commented May 6, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 6, 2018

About /bin/eth, I checked on my side. After fixing the mistake as described above, and selecting the test applets in the configuration menu, that program is automatically built. Did you check you have no legacy dev86 in your path, just the one in /cross ?

@Mellvik
Copy link
Contributor Author

Mellvik commented May 14, 2018 via email

@Mellvik
Copy link
Contributor Author

Mellvik commented May 14, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 14, 2018

Let us forget the terminal stuff for now (that is reproducible in QEMU and so could be tested later without your real HW), and let us focus on the "no TX" problem with your ETH board, and only with inbound telnet to avoid any side effect of a key press in the ELKS console (select only the ktcp and telnetd applets in the configuration).

From what you describe since the beginning, I am starting to think that your board does not reset the TXP bit 2 in the command register (CR offset 00h) after transmitting the first and only packet (i.e. the ARP reply).

In current implementation, the driver puts the packet to send in the board TX buffer, then sets this bit to trigger the TX, and will block any more write / send until this bit is reset (see ne2k_tx_stat() call in ne2k_write()).

So I would add some traces in the ne2k_write() to confirm that this function completes on first call (= the observed ARP reply), and is blocked on second call (= the expected reply to TCP SYN packet from telnet), because ne2k_tx_stat() never returns NE2K_STAT_TX and ne2k_write() enters interruptible_sleep_on(), causing ktcp to hang.

@Mellvik
Copy link
Contributor Author

Mellvik commented May 14, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 14, 2018

OK, what is the meaning of R0 / R1, or W2 in that screenshot ?

@mfld-fr
Copy link
Contributor

mfld-fr commented May 14, 2018

Could you also activate the commented traces in the file ktcp/deveth.c, function deveth_send(), to see if at least ktcp tries to respond to the telnet SYNs ?

@Mellvik
Copy link
Contributor Author

Mellvik commented May 14, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 14, 2018

Sorry, but I still not understand your R0 / R1 / W2. The res variable could be a return value from a lower level function, or the size of the received / sent packet. What actual value are you displaying in your trace ?

@mfld-fr
Copy link
Contributor

mfld-fr commented May 14, 2018

And last but not least, what is the speed of this R0W2It endless loop ? Very fast or an iteration each second ? If the period is around one second, I think I understood the current behavior (there is a retry loop in ktcp_run() every second).

@Mellvik
Copy link
Contributor Author

Mellvik commented May 15, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 15, 2018

Thanks for these latest info. Forget my previous though about the TXP bit, it is working as expected. So ELKS actually responds to the TCP SYNs and the packets are really sent on the wire (I guess the router traffic LED is blinking at the same speed as that loop, right ?), as we get the TX interrupt after the write.

So why do you not capture any packet from ELKS ? I guess this is because these packets are too short, and are discarded by your host. Ethernet requires a minimal packet size of 64. QEMU is more tolerant and works with little packets.

Let me prepare a patch to confirm that analysis...

@Mellvik
Copy link
Contributor Author

Mellvik commented May 15, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 15, 2018

Yes, in addition to the invalid packet size, there is another problem in the select() that does not honour the timeout of 1 second before ktcp tries again to resend the SYN ACK. But let us fix that quick looping in a second step, it should not be a blocking point in the traffic.

@mfld-fr
Copy link
Contributor

mfld-fr commented May 15, 2018

Here is the patch to have an ETH packet minimum size of 64 bytes.

In ne2k_main.c, line 128:

		// Client should write packet once
		// otherwise end of packet will be lost

		if (len > MAX_PACKET_ETH) len = MAX_PACKET_ETH;
		memcpy_fromfs (send_buf, data, len);

		if (len < 64) len = 64;  /* issue #133 */
		res = ne2k_pack_put (send_buf, len);

Could you please apply that patch and test again the inbound telnet ?

@Mellvik
Copy link
Contributor Author

Mellvik commented May 15, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 15, 2018

Is this capture really made in P2P, not through the switch / router ?

@mfld-fr
Copy link
Contributor

mfld-fr commented May 15, 2018

Could you please clean and rebuild all ELKS ? I cannot understand how just adding a line to force the minimal size could cause such a big effect...

@Mellvik
Copy link
Contributor Author

Mellvik commented May 15, 2018 via email

@mfld-fr
Copy link
Contributor

mfld-fr commented May 15, 2018

So nice 😃 !

Yes, you are right for the switch nature, I don't know why I started to consider it as a router earlier in the discussion ?..

There is still some extra and useless traffic because your HW is rather fast, coupled to the select() ineffective timeout, but it looks like we are done to make the NE2K driver & board to basically work.

Let us then close this long issue by submitting the above patch, and focus on other defective parts in dedicated issues (#213 #211 #125 #124 and so on).

mfld-fr added a commit to mfld-fr/elks that referenced this issue May 15, 2018
@mfld-fr mfld-fr added the solved Problem solved label May 15, 2018
This was referenced May 15, 2018
@ghaerr ghaerr mentioned this issue May 31, 2020
@pawosm-arm
Copy link
Contributor

pawosm-arm commented Sep 18, 2020

Here is the patch to have an ETH packet minimum size of 64 bytes.

memcpy_fromfs(send_buf, data, len);
if (len < 64U) len = 64U;  /* issue #133 */
...
res = len;

^^ I don't think this is right. The value returned through res isn't telling how many bytes were put into send buffer.

IMO it should be like this:

                if (len > MAX_PACKET_ETH) len = MAX_PACKET_ETH;
                memcpy_fromfs(send_buf, data, len);

                if (ne2k_pack_put(send_buf,
                     (len < 64) ? 64 : len /* issue #133 */
                   )) {
                        res = -EIO;
                        break;
                }

                res = len;

@ghaerr
Copy link
Owner

ghaerr commented Sep 18, 2020

Hello @pawosm-arm,

I can see you've been intricately inspecting driver source :)

You are correct on the point that the write return value isn't what the user specified when len < 64, as it is returned larger, which is usually not a good thing, since the user's buffer wasn't actually advanced. Note that the len can be returned smaller (also never handled nor needed to be in ktcp) when len > MAX_PACKET_ETH. The driver has made it clear that multiple writes for a packet are not supported.

However, the current code does return the amount of bytes sent on the wire, if you include the garbage bytes required for proper operation. It could be said it would be better to return the larger len to inform the application of what happened.

That said, the code could be changed to what you're talking about with less code using:

if (len > MAX_PACKET_ETH) len = MAX_PACKET_ETH;
res = len [add this statement]
...
res = len; [remove this statement]

I'll have to think which return approach would be better. In general, ktcp can't do anything different either way, and the only real purpose in the return value is to report anomalies, which is currently only checked for read returns. Write return values are ignored because of the higher probability of packet loss anyways.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Problem while using the product solved Problem solved
Projects
None yet
Development

No branches or pull requests

6 participants