Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st-util should exit on LIBUSB_ERROR_NO_DEVICE #780

Closed
ceremcem opened this issue Mar 17, 2019 · 12 comments · Fixed by #913
Closed

st-util should exit on LIBUSB_ERROR_NO_DEVICE #780

ceremcem opened this issue Mar 17, 2019 · 12 comments · Fixed by #913

Comments

@ceremcem
Copy link
Contributor

ceremcem commented Mar 17, 2019

When I first unplug the USB debugger and then try to issue a command via GDB, st-util gives the following errors very rapidly:

  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
[!] send_recv send request failed: LIBUSB_ERROR_NO_DEVICE                      
[!] send_recv STLINK_DEBUG_GETSTATUS
  core status: unknown
2019-03-17T14:26:42 ERROR src/flash_loader.c: flash loader run error           
2019-03-17T14:26:42 ERROR src/common.c: stlink_flash_loader_run(0x8000000) failed! == -1

thus occupying the CPU nearly 100%. This results in serious waste of time (restarting the virtual machine, etc.).

The kind of error message is quite normal. However, stutil should exit immediately on this error so that I could handle the rest of the issue by restarting stutil or taking some other measure.

@ceremcem
Copy link
Contributor Author

I think stlink should terminate at this point: https://github.com/texane/stlink/blob/df3c2b02867db03fb82f6faaad71300398965e85/src/usb.c#L54

@xor-gate
Copy link
Member

I think stlink should terminate at this point:

https://github.com/texane/stlink/blob/df3c2b02867db03fb82f6faaad71300398965e85/src/usb.c#L54

Yeah I think you are right, an call to libusb should be checked appropriate. If you want you can patch and send a PR. It should also be done for the other calls to libusb in the usb.c file. Thanks for reporting!

@xor-gate xor-gate added this to the Unplanned (Contributions Welcome) milestone Jun 25, 2019
ceremcem pushed a commit to ceremcem/stlink that referenced this issue Jun 25, 2019
@ceremcem
Copy link
Contributor Author

ceremcem commented Jun 25, 2019

If you want you can patch and send a PR.

I would love to. However, I'm not sure that I understand the project structure so a plain exit(1) suffice here or not.

It should also be done for the other calls to libusb in the usb.c file.

I might not understand this very well. Do you mean "send_recv make the application terminate wherever it returns -1"?

@xor-gate
Copy link
Member

xor-gate commented Jun 26, 2019

I have placed a comment in the PR, there are more places where it can go wrong when calls are made to libusb.

The project is organised in the following way:

  • libstlink (usb.c is part of it and should never use exit in a library) files of the library are located in the src folder, most important is usb.c and common.c (and of course flash_loader.c)
  • st-util is located in src/gdbserver
  • st-info is located in src/tools/info.c
  • st-flash is located in src/tools/flash.c

@Nightwalker-87 Nightwalker-87 modified the milestones: Unplanned (Contributions Welcome), Next, General, v1.6.1 Feb 19, 2020
@Nightwalker-87 Nightwalker-87 linked a pull request Mar 17, 2020 that will close this issue
@Nightwalker-87
Copy link
Member

Nightwalker-87 commented Mar 27, 2020

I think this could be related to #888 and #445.
@ceremcem and @rewolff: What about the idea of discussing this together?

@Nightwalker-87
Copy link
Member

@chenguokai: Do you have an idea on how to proceed here?

@chenguokai
Copy link
Collaborator

chenguokai commented Apr 8, 2020

Cannot reproduce this case.
I followed the procedure on my macOS machine. The error message appears, but no sign of entering any dead loops.
image
image

Edit: I issued command n after setting up a breakpoint on main function and unplugging st-link

There does exist one issue that the program ought to exit rather than unlimitedly retry sending or receiving packets.

https://github.com/texane/stlink/blob/0082e488579077762627dba2cee54ffca64dad17/src/gdbserver/gdb-server.c#L1121

This function call does not return a proper value to end the while(1) loop. Problems may lie inside the function or the while(1) mechanism.

@chenguokai
Copy link
Collaborator

After checked the error handling process with lldb, I probably figured out the fault code.

https://github.com/texane/stlink/blob/bf840a1ae82ffc3ef9929f243bb8050d6856f698/src/gdbserver/gdb-server.c#L1448

When _stlink_usb_step encounters an error initially triggered by the failure of a libusb call, it returns a non-zero return value (-1 in this case). However this return value is not handled by the recv-handle-send loop in gdb-server:

https://github.com/texane/stlink/blob/bf840a1ae82ffc3ef9929f243bb8050d6856f698/src/gdbserver/gdb-server.c#L1119

I come up with two possible deal strategies:

Common part: Check the return value of stlink_step in the switch case.

  1. If any error is presented, try to notify GDB and exit.
  2. Redo this function call for restricted times, if failure remains, notify GDB and exit, otherwise proceed.

Analysis:

Option 1 is acceptable since we cannot predict what will happen after a communication error. The draw back is that st-util may fail more frequently on some unstable usb ports or something alike.
Option 2 will give unstable devices another chance but may cause packet duplication, which makes debugging slightly more unpredictable.

@Nightwalker-87
Copy link
Member

Thx for the detailed analysis. I'd favour option 1 here. If people run into problems related to specific local hardware instabilities, there is nothing we can do to it anyway and this is also nothing what the stlink tools should considerate. It seems more important that we do any compromise on debugging. Are you able to fix this straight away?

@chenguokai
Copy link
Collaborator

I will check the documentation of gdb to send packet(s) properly. Not too difficult I guess.
After I am done, I will raise a PR.

@rewolff
Copy link
Contributor

rewolff commented Apr 8, 2020

I didn't say it, but I was going for option 2. But after your explanation: You're right. You've got me convinced: Unless there is a valid reason to believe an error is temporary an error should be considered fatal immediately.

Retrying without informing the user will lead to sudden surprises. Suppose 1/100th or 1/1000th of the transferred bits gets corrupted (at whatever level). If that results in a command being "invalid" in 99% of the time, the user in that situation would on average get plenty warnings (i.e. stlink suddenly quitting) that his hardware is unstable before he gets the silent data corruption that causes problems....

The "easy way out" is to just "exit ()". That will suddenly close the connection to gdb and it will handle that sufficiently gracefully.

@chenguokai
Copy link
Collaborator

chenguokai commented Apr 8, 2020

Basic functions run well under the PR with a stm32f401 board.
The dead loop is resolved during my local test. Now st-util will exit. gdb will receive a failure reply and disconnect.

@Nightwalker-87 Nightwalker-87 linked a pull request Apr 8, 2020 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.