Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethernet stop working with dm9051 (IDFGH-4598) #6414

Closed
fabiolino2416 opened this issue Jan 16, 2021 · 9 comments
Closed

Ethernet stop working with dm9051 (IDFGH-4598) #6414

fabiolino2416 opened this issue Jan 16, 2021 · 9 comments
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@fabiolino2416
Copy link

using example hyperf for a long time 600 seconds ethernet interface stops working (after about 100...300 seconds) no error or warning is detected in console only a restart solves the problem spi frequency 20mhz the test is run with two demo boards dm9051 demo_board and wroom32d kit I will run the test again on a production board to exclude any problems related to a demo hardware

@github-actions github-actions bot changed the title Ethernet stop working with dm9051 Ethernet stop working with dm9051 (IDFGH-4598) Jan 16, 2021
@MadsHHLund
Copy link

I have a similar problem.
I use a wiz850:
Connects nicely, gets IP.
But if I want to do "HTTP client connect", then it goes wrong every other time.

In the debug log ,,,
..
D (6134) spi_master: Allocate RX buffer for DMA
D (6134) spi_master: Allocate RX buffer for DMA
D (6134) spi_master: Allocate RX buffer for DMA
D (6134) spi_master: Allocate RX buffer for DMA
D (6134) spi_master: Allocate RX buffer for DMA
D (6134)

At the same time, I can no longer ping the device.

Have tried with all possible combinations of: Clock-speed, Queue-size.

It seems that the SPI communication is stopping. There will be no more interrupts
"emac_w5500_task"

Does anyone have a suggestion on what I should try

@MadsHHLund
Copy link

I forgot to write ...

With a Wifi connection it runs perfectly.

@fabiolino2416
Copy link
Author

the problem seems to be in the interface of the missing flow control drivers esp_eth_mac_dm9051.c we are checking with idf v4.3 by performing dm9051 memory dumps (with the invaluable help of davicom great professionals) I also believe that the bug can intervene (IDFGH-4569) corrupting the flow but it is difficult to locate

@MadsHHLund
Copy link

I think I have found the error in my system.

I have tried switching to another DNS server, after which it seems to run more stably.
Now I want to investigate why DNS does not provide consistent return responses.

@MadsHHLund
Copy link

What a mystery ...

I try to do a "HTTP_CLIENT post"

On the wired network, in my private home office, wiz850io - w5500 runs perfectly, so does wifi.

At work, the wired network fails with DNS resolve error:
esp-tls: couldn't get hostname for xxx

After which - I can no longer "ping" the device.

At work, the same code runs perfectly on WIFI ...

Have tried switching DNS server, and much more.

Is there anyone who can give me, ideas to explore. ??????

@MadsHHLund
Copy link

What a mystery ...

When event IP_EVENT_ETH_GOT_IP occurs, I make a . esp_netif_set_dns_info () calls and sets secondary dns to 0.0.0.0,

then everything runs perfectly.

@david-cermak
Copy link
Collaborator

Hi @fabiolino2416

Sorry for not replying sooner. It seems like the problem you reported is little related to the recent discussion, is it not?
Any update from your end? Do you still see the ethernet interface stop responding?

@MadsHHLund Are you using the wired and wireless configuration at the same time? lwip keeps an array of DNS servers common for all interfaces, couldn't the WiFi rewrite the DNS server somehow?

When event IP_EVENT_ETH_GOT_IP occurs, I make a . esp_netif_set_dns_info () calls and sets secondary dns to 0.0.0.0,

Could you please check, what was the value before? and where it was set?

@david-cermak
Copy link
Collaborator

@fabiolino2416 We've had a similar problem with missed GPIO interrupt on another device, could you please check if this change won't fix the issue?

diff --git a/components/esp_eth/src/esp_eth_mac_dm9051.c b/components/esp_eth/src/esp_eth_mac_dm9051.c
@@ -397,7 +397,7 @@ static void emac_dm9051_task(void *arg)
     uint32_t length = 0;
     while (1) {
         // block indefinitely until some task notifies me
-        ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
+        ulTaskNotifyTake(pdTRUE, pdMS_TO_TICKS(1000));
         /* clear interrupt status */
         dm9051_register_read(emac, DM9051_ISR, &status);
         dm9051_register_write(emac, DM9051_ISR, status);

@espressif-bot espressif-bot added the Status: In Progress Work is in progress label Sep 3, 2021
@david-cermak
Copy link
Collaborator

Hi @fabiolino2416

We were able to reproduce the issue after the stress testing. The patch above actually works around the problem, as it makes the Rx task to check periodically for any new Rx packets (it didn't address the root cause, though)

Here's a preliminary patch that fixes the issue:
esp_eth-Fix-dm9051-Rx-interrupt-processing.patch.txt

@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress Resolution: NA Issue resolution is unavailable labels Sep 8, 2021
espressif-bot pushed a commit that referenced this issue Oct 12, 2021
* Disable Tx interrupts to fix race condition of missing Rx interrupt
* Check if GPIO interrupt is asserted periodically if the ISR event missed

Closes #6414
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

4 participants