Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in sockInit #37

Closed
phillipjohnston opened this issue Nov 13, 2019 · 26 comments · Fixed by #38
Closed

Infinite loop in sockInit #37

phillipjohnston opened this issue Nov 13, 2019 · 26 comments · Fixed by #38

Comments

@phillipjohnston
Copy link

I'm using an ATMega2560 and a W5500 shield. I can talk to the shield via web server and reset server, and I've configured everything on the EEPROM. But I can't get passed an infinite loop triggering a watchdog reset here. The socket is never opened and always returns "closed".

	do {
        // Write interrupt
		spiWriteReg(REG_S3_IR, S3_W_CB, 0xFF);
		// Write mode
		spiWriteReg(REG_S3_MR, S3_W_CB, MR_UDP);
		// Write TFTP Port
		spiWriteWord(REG_S3_PORT0, S3_W_CB, port);
		// Open Socket
		spiWriteReg(REG_S3_CR, S3_W_CB, CR_OPEN);
		while(spiReadReg(REG_S3_CR, S3_R_CB)) {
			//wait for command to complete
 		}
		// Read Status
		if(spiReadReg(REG_S3_SR, S3_R_CB) != SOCK_UDP)
			// Close Socket if it wasn't initialized correctly
			spiWriteReg(REG_S3_CR, S3_W_CB, CR_CLOSE);

		// If socket correctly opened continue
	} while(spiReadReg(REG_S3_SR, S3_R_CB) != SOCK_UDP);

Any ideas on getting past that?

@loathingKernel
Copy link
Owner

I have not tested W5500, it was done by a contributor. Does this happen on a cold boot or after a watchdog reset?

If it is after a watchdog reset, can you close any open sockets before the watchdog reset?

@phillipjohnston
Copy link
Author

phillipjohnston commented Nov 13, 2019 via email

@loathingKernel
Copy link
Owner

loathingKernel commented Nov 13, 2019

W5500 has been tested by others and it was reported working. I don't think anything has changed in the meantime.

Could it be the W5500 doesn't like the loop below? I can't test it, so your guess is probably better than mine.

    while(error) {
		//wait for command to complete
		error = spiReadReg(REG_S3_CR, S3_R_CB);
		err_count++;

		if(err_count > 128)
		{
			DBG_TFTP(tracePGMlnTftp(mDebugTftp_OPERR);)

 			tftpInitError = TRUE;
 			return;
		}
	}

@phillipjohnston
Copy link
Author

phillipjohnston commented Nov 13, 2019

No, that's not the code that's looping infinitely. It does loop forever if you try to run the bootloader without an ethernet shield in, which is why I added a way to break from the loop. Serial upload and boot-to-application still work in that case.

The infinite loop is happens because while(spiReadReg(REG_S3_SR, S3_R_CB) != SOCK_UDP); is never returning SOCK_UDP. The value is always SOCK_CLOSED. Happens both with and without the modified loop you highlighted.

I believe that others have used the W5500, I'm just a bit surprised that out-of-the-box with no modifications the socket is never opened successfully. Figured someone might have run into that before :)

@phillipjohnston
Copy link
Author

I did remove that section of the code to remove the loop you highlighted, as well as other socket close calls, but I can't ever get it to return anything but SOCK_CLOSED.

@phillipjohnston
Copy link
Author

Looking over the W5500 documentation, the process seems to be generally correct. Wondering if there's a reset or some other initialization that's needed (just thinking out loud). But I'd also expect everyone to hit this problem if that was the case.

@loathingKernel
Copy link
Owner

loathingKernel commented Nov 13, 2019

I also don't see why it shouldn't work now out of the box. My initial thought was that w5500 didn't like having its register polled that fast and it hanged, but that doesn't happen.

The other thing that I am thinking is that it might not be completely initialized by the time it gets to that check. That check is the first time a result from the shield is read. Up to that point, it is just writing things.

@phillipjohnston
Copy link
Author

I purchased an Uno to run with the W5500 shield. That works perfectly, and I can upload a binary.

So it seems like the ATMega2560 + W5500 is a problem. Still could be timing related.

@loathingKernel
Copy link
Owner

loathingKernel commented Nov 16, 2019

I think I have an idea what might be going on. Can you edit spi.h and set ETH_SS pin to PINB0?

@phillipjohnston
Copy link
Author

That causes an infinite loop when printing out the network settings, for some strange reason.

Looking at the schematics, ETH CS is connected to Pin 10, which is PB4. That still fails in the same way.

@phillipjohnston
Copy link
Author

Strangest thing. I went to capture some SPI traces with my logic analyzer, and now the previously working Arduino UNO now fails in the same way, repeatedly.

@phillipjohnston
Copy link
Author

When the bootloader runs, I see nothing happening on the SCK pin:

Screen Shot 2019-11-18 at 15 25 30

When I run an app with the ethernet shield, SCK is active:

Screen Shot 2019-11-18 at 15 25 07

@phillipjohnston
Copy link
Author

The Arduino Uno problem was an errant pin definition change, it works still.

I realize now that the problem is that the SPI pins for the Mega are mapped to a different header. I see many solutions around the 'net that refer to jumper wires for these pins... But the sketches work even without jumper wires. Still digging.

@phillipjohnston
Copy link
Author

I realize now that it's mapping consistently through the debug header. I tried setting the shield pins to inputs to rule out a conflict on the SS line. No change.

@loathingKernel
Copy link
Owner

Uno not working out of the blue got me confused, glad you worked it out.

Checking the pin definitions in the Arduino core, they seem to be the same. IIRC pins 50-53 (PB0-3) are on the header on the opposite side of the USB port, but those pins are connected to the ICSP header too, with the exception of SS which is still on 10 (PB4). The definitions in spi.h seem to check out.

Could it be a timing issue as you have suggested before? I think I will buy a shield and a mega myself to be able to help out better than just this guesswork I have been doing so far.

@phillipjohnston
Copy link
Author

Well, given that the SCK pin doesn't toggle at all, I don't know if I still think there's a timing problem. It feels like there's something obvious I'm missing...

@phillipjohnston
Copy link
Author

Spoke too soon. Maybe it is timing related. I disabled setting SCK high (still setting it to output):

Screen Shot 2019-11-18 at 17 26 39

@phillipjohnston
Copy link
Author

Adding delays shortens the distance there, but even 2s doesn't get SCK overlapped with MISO/MOSI. But that could point to a configuration problem. Perhaps I need to adjust the SPI mode settings.

When I restore the part setting SCK high, then the toggling on that line disappears altogether.

@phillipjohnston
Copy link
Author

I booted up a working application and recorded register values:

SPCR: 80 (0x50)
SPSR: 1
DDRB: 151 (0x97)
PORTB: 29 (0x1D)

And those match what is being set in the bootloader, other than SCK as output high.

Then there's a curious case. If I remove the tftpInit() call, there is clearly correct SPI output in the polling loop. But the netInit call is still made without the proper SCK output.

Screen Shot 2019-11-19 at 16 30 26

@phillipjohnston
Copy link
Author

Now, the curious thing is that when you remove netInit and tftpInit, SPI works immediately. No delay needed. But neither one of those calls is working correctly.

@phillipjohnston
Copy link
Author

Ok, I can force the failing scenario in the polling loop by removing updateLed(). Lucky guess, seems like that's unsticking something.

Here's the previous loop data with that call removed:

Screen Shot 2019-11-19 at 16 50 10

@phillipjohnston
Copy link
Author

phillipjohnston commented Nov 20, 2019

AHHHHH. The LED on the Arduino Mega is mapped to Digital pin 13?! Conflicting with the SCK pin! The reason that's not being detected for the Mega case is that the LED pin is actually defined differently from SCK on the Mega. But due to the routing of the shield, they overlap.

@phillipjohnston
Copy link
Author

In spiInit(), I made this change:

#if !defined(__AVR_ATmega1280__) && !defined(__AVR_ATmega2560__)
#if (LED != SCK)
	/** Set up pins to flash the onboard led */
	/** Set led pin to high */
	LED_PORT |= _BV(LED);
	/** Set led pin as output */
	LED_DDR |= _BV(LED);
#endif
#endif

And then in updateLed():

void updateLed(void)
{
	uint16_t next_timer_1 = TCNT1;

#if !defined(__AVR_ATmega1280__) && !defined(__AVR_ATmega2560__)
	if(next_timer_1 & 0x1000) LED_PORT ^= _BV(LED); // Led pin high
	else LED_PORT &= ~_BV(LED); // Led pin low
#endif

	if(next_timer_1 < last_timer_1) {
		tick++;
		DBG_UTIL(
		    tracePGMlnUtil(mDebugUtil_TICK);
		    tracenum(tick);
		    tracePGMlnUtil(mDebugUtil_NEXT);
		    tracenum(next_timer_1);
		    tracePGMlnUtil(mDebugUtil_LAST);
		    tracenum(last_timer_1);
		)
	}

	last_timer_1 = next_timer_1;
}

@loathingKernel
Copy link
Owner

Ok, what? Sorry, it is really late here.

Let me see if I have understood correctly. First of all, if I am reading the schematic correctly, the led is driven by PINB7 (13), SCK is on PINB1 (52). But due to the pins on the ICSP header being shorted with the pins on the side, the led is driving the SCK pin. Is that correct?

@phillipjohnston
Copy link
Author

Yes, that is correct.

@loathingKernel
Copy link
Owner

Heh, that is one hell of a rabbithole. Thank you for looking into and commenting it so thoroughly,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants