Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coral edge TPU issues on RPI4 #46

Closed
joshdurbin opened this issue Nov 23, 2020 · 18 comments
Closed

coral edge TPU issues on RPI4 #46

joshdurbin opened this issue Nov 23, 2020 · 18 comments

Comments

@joshdurbin
Copy link

Howdy. I'm not sure this is explicitly a thing with doods, but thought you might have some insight into what might be going on with doods and Coral. I'm running Coral attached to a RPI4B that has USB boot enabled and uses an external USB disk as its system drive (no SD card at all). The OS is debian/pios lite running arm64.

I'm trying to transition from CPU-based detection to the Coral tf lite models. To do so I used the script to download all the models then, with the following config:

doods:
  detectors:
    - name: default
      type: tflite
      modelFile: models/coco_ssd_mobilenet_v1_1.0_quant.tflite
      labelFile: models/coco_labels0.txt
      numThreads: 4
      numConcurrent: 4
      hwAccel: false
      timeout: 2m
    - name: edgetpu
      type: tflite
      modelFile: models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
      labelFile: models/coco_labels0.txt
      numThreads: 0
      numConcurrent: 4
      hwAccel: true
    - name: tensorflow
      type: tensorflow
      modelFile: models/faster_rcnn_inception_v2_coco_2018_01_28.pb
      labelFile: models/coco_labels1.txt
      numThreads: 4
      numConcurrent: 4
      hwAccel: false
      timeout: 2m

... am running: docker run -it --device /dev/bus/usb -e logger.level='debug' -v /home/pi/models/:/opt/doods/models -v /home/pi/asdf.config:/opt/doods/config.yaml -p 8080:8080 snowzach/doods:latest. The first time through it errors out, stating it can't initialize the edgetpu.

2020-11-23T06:40:45.233Z	INFO	detector/detector.go:79	Configured Detector	{"package": "detector", "name": "default", "type": "tflite", "model": "models/coco_ssd_mobilenet_v1_1.0_quant.tflite", "labels": 80, "width": 300, "height": 300}
2020-11-23T06:41:16.772Z	ERROR	detector/detector.go:74	Could not initialize detector edgetpu: could not initialize edgetpu /sys/bus/usb/devices/2-2	{"package": "detector"}
2020-11-23T06:41:17.357Z	ERROR	detector/detector.go:74	Could not initialize detector tensorflow: Could not import model: Node 'SecondStageFeatureExtractor/InceptionV2/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm': Unknown input node 'SecondStageFeatureExtractor/InceptionV2/Mixed_5c/Branch_0/Conv2d_0a_1x1/Conv2D_bn_offset'	{"package": "detector"}
2020-11-23T06:41:17.359Z	INFO	server/server.go:284	API Listening	{"package": "server", "address": ":8080", "tls": false, "version": "v0.2.5-0-gbf6d7a1-dirty"}
^C2020-11-23T06:42:28.809Z	INFO	conf/signal.go:45	Received Interrupt...

Prior to this run the lsusb output returns:

pi@rpi-4b-3:~ $ lsusb
Bus 002 Device 003: ID 1a6e:089a Global Unichip Corp. 
Bus 002 Device 002: ID 174c:55aa ASMedia Technology Inc. Name: ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

...after the run, though, the Bus 002 Device 003 changes to:

pi@rpi-4b-3:~ $ lsusb
Bus 002 Device 004: ID 18d1:9302 Google Inc. 
Bus 002 Device 002: ID 174c:55aa ASMedia Technology Inc. Name: ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

A second pass at running the container yields the following:

pi@rpi-4b-3:~ $ docker run -it --device /dev/bus/usb -e logger.level='debug' -v /home/pi/models/:/opt/doods/models -v /home/pi/asdf.config:/opt/doods/config.yaml -p 8080:8080 snowzach/doods:latest
2020-11-23T07:11:18.487Z	INFO	detector/detector.go:79	Configured Detector	{"package": "detector", "name": "default", "type": "tflite", "model": "models/coco_ssd_mobilenet_v1_1.0_quant.tflite", "labels": 80, "width": 300, "height": 300}
2020-11-23T07:11:21.200Z	INFO	detector/detector.go:79	Configured Detector	{"package": "detector", "name": "edgetpu", "type": "tflite-edgetpu", "model": "models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite", "labels": 80, "width": 300, "height": 300}
2020-11-23T07:11:21.673Z	ERROR	detector/detector.go:74	Could not initialize detector tensorflow: Could not import model: Node 'SecondStageFeatureExtractor/InceptionV2/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/FusedBatchNorm': Unknown input node 'SecondStageFeatureExtractor/InceptionV2/Mixed_5c/Branch_0/Conv2d_0a_1x1/Conv2D_bn_offset'	{"package": "detector"}
2020-11-23T07:11:21.674Z	INFO	server/server.go:284	API Listening	{"package": "server", "address": ":8080", "tls": false, "version": "v0.2.5-0-gbf6d7a1-dirty"}

and will remain up and online for about another 8-10 seconds before the host OS crashes. It crashes because the USB host controller seems to die and hang. Specifically:

[28882.276448] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[28882.292499] xhci_hcd 0000:01:00.0: Host halt failed, -110
[28882.292508] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead

and:

[28882.293319] xhci_hcd 0000:01:00.0: HC died; cleaning up
[28882.293860] usb 1-1: USB disconnect, device number 2
[28882.296644] usb 2-1: USB disconnect, device number 2

Full log:

[28866.415880] docker0: port 1(vethec1d83d) entered disabled state
[28866.416026] vethda2c989: renamed from eth0
[28866.482708] docker0: port 1(vethec1d83d) entered disabled state
[28866.495637] device vethec1d83d left promiscuous mode
[28866.495648] docker0: port 1(vethec1d83d) entered disabled state
[28867.639223] docker0: port 1(vethd25a02b) entered blocking state
[28867.639232] docker0: port 1(vethd25a02b) entered disabled state
[28867.639371] device vethd25a02b entered promiscuous mode
[28867.639552] docker0: port 1(vethd25a02b) entered blocking state
[28867.639558] docker0: port 1(vethd25a02b) entered forwarding state
[28867.639850] docker0: port 1(vethd25a02b) entered disabled state
[28867.707442] IPv6: ADDRCONF(NETDEV_CHANGE): vethd25a02b: link becomes ready
[28867.707518] docker0: port 1(vethd25a02b) entered blocking state
[28867.707524] docker0: port 1(vethd25a02b) entered forwarding state
[28868.148606] docker0: port 1(vethd25a02b) entered disabled state
[28868.148983] eth0: renamed from veth65474e7
[28868.165092] docker0: port 1(vethd25a02b) entered blocking state
[28868.165102] docker0: port 1(vethd25a02b) entered forwarding state
[28869.984866] usb 2-2: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
[28870.004932] usb 2-2: LPM exit latency is zeroed, disabling LPM.
[28882.276448] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[28882.292499] xhci_hcd 0000:01:00.0: Host halt failed, -110
[28882.292508] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[28882.292804] usb 2-1: cmd cmplt err -108
[28882.292817] usb 2-1: cmd cmplt err -108
[28882.292829] usb 2-1: cmd cmplt err -108
[28882.292840] usb 2-1: cmd cmplt err -108
[28882.292851] usb 2-1: cmd cmplt err -108
[28882.292863] usb 2-1: cmd cmplt err -108
[28882.292873] usb 2-1: cmd cmplt err -108
[28882.292884] usb 2-1: cmd cmplt err -108
[28882.292894] usb 2-1: cmd cmplt err -108
[28882.292905] usb 2-1: cmd cmplt err -108
[28882.292924] usb 2-1: cmd cmplt err -108
[28882.292935] usb 2-1: cmd cmplt err -108
[28882.292945] usb 2-1: cmd cmplt err -108
[28882.292955] usb 2-1: cmd cmplt err -108
[28882.292965] usb 2-1: cmd cmplt err -108
[28882.292975] usb 2-1: cmd cmplt err -108
[28882.292988] usb 2-1: cmd cmplt err -108
[28882.292998] usb 2-1: cmd cmplt err -108
[28882.293008] usb 2-1: cmd cmplt err -108
[28882.293019] usb 2-1: cmd cmplt err -108
[28882.293030] usb 2-1: cmd cmplt err -108
[28882.293042] usb 2-1: cmd cmplt err -108
[28882.293053] usb 2-1: cmd cmplt err -108
[28882.293063] usb 2-1: cmd cmplt err -108
[28882.293074] usb 2-1: cmd cmplt err -108
[28882.293084] usb 2-1: cmd cmplt err -108
[28882.293097] usb 2-1: cmd cmplt err -108
[28882.293107] usb 2-1: cmd cmplt err -108
[28882.293116] usb 2-1: cmd cmplt err -108
[28882.293319] xhci_hcd 0000:01:00.0: HC died; cleaning up
[28882.293860] usb 1-1: USB disconnect, device number 2
[28882.296644] usb 2-1: USB disconnect, device number 2
[28882.297038] sd 0:0:0:0: [sda] tag#6 uas_zap_pending 0 uas-tag 1 inflight: CMD 
[28882.297054] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x2a 2a 00 00 65 80 d8 00 00 08 00
[28882.297081] sd 0:0:0:0: [sda] tag#7 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[28882.297093] sd 0:0:0:0: [sda] tag#7 CDB: opcode=0x2a 2a 00 00 65 80 e8 00 00 10 00
[28882.297106] sd 0:0:0:0: [sda] tag#4 uas_zap_pending 0 uas-tag 3 inflight: CMD 
[28882.297117] sd 0:0:0:0: [sda] tag#4 CDB: opcode=0x2a 2a 00 00 65 81 38 00 00 30 00
[28882.297130] sd 0:0:0:0: [sda] tag#3 uas_zap_pending 0 uas-tag 4 inflight: CMD 
[28882.297141] sd 0:0:0:0: [sda] tag#3 CDB: opcode=0x2a 2a 00 02 09 34 28 00 00 10 00
[28882.297156] sd 0:0:0:0: [sda] tag#0 uas_zap_pending 0 uas-tag 5 inflight: CMD 
[28882.297166] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 02 13 6c 70 00 00 08 00
[28882.297179] sd 0:0:0:0: [sda] tag#1 uas_zap_pending 0 uas-tag 6 inflight: CMD 
[28882.297189] sd 0:0:0:0: [sda] tag#1 CDB: opcode=0x2a 2a 00 02 48 20 00 00 00 08 00
[28882.297195] sd 0:0:0:0: [sda] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.297209] sd 0:0:0:0: [sda] tag#2 uas_zap_pending 0 uas-tag 7 inflight: CMD 
[28882.297218] sd 0:0:0:0: [sda] tag#3 CDB: opcode=0x2a 2a 00 02 09 34 28 00 00 10 00
[28882.297220] sd 0:0:0:0: [sda] tag#2 CDB: opcode=0x2a 2a 00 02 48 20 80 00 00 08 00
[28882.297233] sd 0:0:0:0: [sda] tag#5 uas_zap_pending 0 uas-tag 8 inflight: CMD 
[28882.297248] blk_update_request: I/O error, dev sda, sector 34157608 op 0x1:(WRITE) flags 0x103000 phys_seg 2 prio class 0
[28882.297251] sd 0:0:0:0: [sda] tag#5 CDB: opcode=0x2a 2a 00 02 48 21 28 00 00 08 00
[28882.297263] sd 0:0:0:0: [sda] tag#8 uas_zap_pending 0 uas-tag 9 inflight: CMD 
[28882.297271] Buffer I/O error on dev sda2, logical block 4203141, lost async page write
[28882.297281] sd 0:0:0:0: [sda] tag#8 CDB: opcode=0x2a 2a 00 02 48 21 40 00 00 08 00
[28882.297295] sd 0:0:0:0: [sda] tag#9 uas_zap_pending 0 uas-tag 10 inflight: CMD 
[28882.297304] sd 0:0:0:0: [sda] tag#9 CDB: opcode=0x2a 2a 00 00 e1 12 80 00 00 18 00
[28882.297318] sd 0:0:0:0: [sda] tag#10 uas_zap_pending 0 uas-tag 11 inflight: CMD 
[28882.297323] Buffer I/O error on dev sda2, logical block 4203142, lost async page write
[28882.297332] sd 0:0:0:0: [sda] tag#10 CDB: opcode=0x2a 2a 00 00 e1 51 30 00 00 18 00
[28882.297347] sd 0:0:0:0: [sda] tag#11 uas_zap_pending 0 uas-tag 12 inflight: CMD 
[28882.297357] sd 0:0:0:0: [sda] tag#11 CDB: opcode=0x2a 2a 00 00 e0 9f f8 00 00 08 00
[28882.297370] sd 0:0:0:0: [sda] tag#12 uas_zap_pending 0 uas-tag 13 inflight: CMD 
[28882.297380] sd 0:0:0:0: [sda] tag#12 CDB: opcode=0x2a 2a 00 00 e0 c4 f8 00 00 08 00
[28882.297393] sd 0:0:0:0: [sda] tag#13 uas_zap_pending 0 uas-tag 14 inflight: CMD 
[28882.297403] sd 0:0:0:0: [sda] tag#13 CDB: opcode=0x2a 2a 00 00 11 21 68 00 00 08 00
[28882.297416] sd 0:0:0:0: [sda] tag#14 uas_zap_pending 0 uas-tag 15 inflight: CMD 
[28882.297425] sd 0:0:0:0: [sda] tag#14 CDB: opcode=0x2a 2a 00 00 88 20 08 00 00 08 00
[28882.297438] sd 0:0:0:0: [sda] tag#15 uas_zap_pending 0 uas-tag 16 inflight: CMD 
[28882.297448] sd 0:0:0:0: [sda] tag#15 CDB: opcode=0x2a 2a 00 00 08 20 00 00 00 08 00
[28882.297461] sd 0:0:0:0: [sda] tag#16 uas_zap_pending 0 uas-tag 17 inflight: CMD 
[28882.297470] sd 0:0:0:0: [sda] tag#16 CDB: opcode=0x2a 2a 00 00 08 2e c8 00 00 10 00
[28882.297483] sd 0:0:0:0: [sda] tag#17 uas_zap_pending 0 uas-tag 18 inflight: CMD 
[28882.297493] sd 0:0:0:0: [sda] tag#17 CDB: opcode=0x2a 2a 00 00 08 a3 10 00 00 08 00
[28882.297506] sd 0:0:0:0: [sda] tag#18 uas_zap_pending 0 uas-tag 19 inflight: CMD 
[28882.297515] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x2a 2a 00 00 08 f3 18 00 00 08 00
[28882.297528] sd 0:0:0:0: [sda] tag#19 uas_zap_pending 0 uas-tag 20 inflight: CMD 
[28882.297537] sd 0:0:0:0: [sda] tag#19 CDB: opcode=0x2a 2a 00 00 88 25 a0 00 00 08 00
[28882.297549] sd 0:0:0:0: [sda] tag#20 uas_zap_pending 0 uas-tag 21 inflight: CMD 
[28882.297559] sd 0:0:0:0: [sda] tag#20 CDB: opcode=0x2a 2a 00 00 88 25 b0 00 00 08 00
[28882.297571] sd 0:0:0:0: [sda] tag#21 uas_zap_pending 0 uas-tag 22 inflight: CMD 
[28882.297581] sd 0:0:0:0: [sda] tag#21 CDB: opcode=0x2a 2a 00 00 88 29 88 00 00 08 00
[28882.297593] sd 0:0:0:0: [sda] tag#22 uas_zap_pending 0 uas-tag 23 inflight: CMD 
[28882.297603] sd 0:0:0:0: [sda] tag#22 CDB: opcode=0x2a 2a 00 00 89 35 70 00 00 08 00
[28882.297615] sd 0:0:0:0: [sda] tag#23 uas_zap_pending 0 uas-tag 24 inflight: CMD 
[28882.297625] sd 0:0:0:0: [sda] tag#23 CDB: opcode=0x2a 2a 00 01 48 20 10 00 00 08 00
[28882.297638] sd 0:0:0:0: [sda] tag#24 uas_zap_pending 0 uas-tag 25 inflight: CMD 
[28882.297648] sd 0:0:0:0: [sda] tag#24 CDB: opcode=0x2a 2a 00 01 c8 2f a8 00 00 08 00
[28882.297661] sd 0:0:0:0: [sda] tag#25 uas_zap_pending 0 uas-tag 26 inflight: CMD 
[28882.297670] sd 0:0:0:0: [sda] tag#25 CDB: opcode=0x2a 2a 00 01 c9 34 f0 00 00 08 00
[28882.297682] sd 0:0:0:0: [sda] tag#26 uas_zap_pending 0 uas-tag 27 inflight: CMD 
[28882.297692] sd 0:0:0:0: [sda] tag#26 CDB: opcode=0x2a 2a 00 02 08 20 00 00 00 08 00
[28882.297704] sd 0:0:0:0: [sda] tag#27 uas_zap_pending 0 uas-tag 28 inflight: CMD 
[28882.297714] sd 0:0:0:0: [sda] tag#27 CDB: opcode=0x2a 2a 00 02 08 20 80 00 00 08 00
[28882.297726] sd 0:0:0:0: [sda] tag#28 uas_zap_pending 0 uas-tag 29 inflight: CMD 
[28882.297736] sd 0:0:0:0: [sda] tag#28 CDB: opcode=0x2a 2a 00 02 08 29 50 00 00 08 00
[28882.297809] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.297828] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 02 13 6c 70 00 00 08 00
[28882.297844] blk_update_request: I/O error, dev sda, sector 34827376 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[28882.297863] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 1044678 (offset 0 size 4096 starting block 4353423)
[28882.297880] Buffer I/O error on device sda2, logical block 4286862
[28882.297887] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.297903] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x2a 2a 00 00 65 80 d8 00 00 08 00
[28882.297919] blk_update_request: I/O error, dev sda, sector 6652120 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[28882.297932] sd 0:0:0:0: [sda] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.297945] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 260101 (offset 0 size 0 starting block 831516)
[28882.297948] sd 0:0:0:0: [sda] tag#1 CDB: opcode=0x2a 2a 00 02 48 20 00 00 00 08 00
[28882.297958] Buffer I/O error on device sda2, logical block 764955
[28882.297964] blk_update_request: I/O error, dev sda, sector 38281216 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
[28882.297977] Buffer I/O error on dev sda2, logical block 4718592, lost async page write
[28882.298019] sd 0:0:0:0: [sda] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.298032] sd 0:0:0:0: [sda] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.298035] sd 0:0:0:0: [sda] tag#2 CDB: opcode=0x2a 2a 00 02 48 20 80 00 00 08 00
[28882.298048] blk_update_request: I/O error, dev sda, sector 38281344 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
[28882.298051] sd 0:0:0:0: [sda] tag#7 CDB: opcode=0x2a 2a 00 00 65 80 e8 00 00 10 00
[28882.298061] Buffer I/O error on dev sda2, logical block 4718608, lost async page write
[28882.298067] blk_update_request: I/O error, dev sda, sector 6652136 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[28882.298083] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 260101 (offset 0 size 0 starting block 831519)
[28882.298095] Buffer I/O error on device sda2, logical block 764957
[28882.298102] sd 0:0:0:0: [sda] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.298109] Buffer I/O error on device sda2, logical block 764958
[28882.298116] sd 0:0:0:0: [sda] tag#5 CDB: opcode=0x2a 2a 00 02 48 21 28 00 00 08 00
[28882.298129] blk_update_request: I/O error, dev sda, sector 38281512 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
[28882.298141] Buffer I/O error on dev sda2, logical block 4718629, lost async page write
[28882.298145] sd 0:0:0:0: [sda] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.298159] sd 0:0:0:0: [sda] tag#4 CDB: opcode=0x2a 2a 00 00 65 81 38 00 00 30 00
[28882.298173] blk_update_request: I/O error, dev sda, sector 6652216 op 0x1:(WRITE) flags 0x800 phys_seg 6 prio class 0
[28882.298179] sd 0:0:0:0: [sda] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.298191] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 260101 (offset 0 size 0 starting block 831533)
[28882.298194] sd 0:0:0:0: [sda] tag#8 CDB: opcode=0x2a 2a 00 02 48 21 40 00 00 08 00
[28882.298203] Buffer I/O error on device sda2, logical block 764967
[28882.298208] blk_update_request: I/O error, dev sda, sector 38281536 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
[28882.298220] Buffer I/O error on dev sda2, logical block 4718632, lost async page write
[28882.298223] Buffer I/O error on device sda2, logical block 764968
[28882.298235] Buffer I/O error on device sda2, logical block 764969
[28882.298248] Buffer I/O error on device sda2, logical block 764970
[28882.298257] sd 0:0:0:0: [sda] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[28882.298261] Buffer I/O error on device sda2, logical block 764971
[28882.298272] sd 0:0:0:0: [sda] tag#9 CDB: opcode=0x2a 2a 00 00 e1 12 80 00 00 18 00
[28882.298274] Buffer I/O error on device sda2, logical block 764972
[28882.298285] blk_update_request: I/O error, dev sda, sector 14750336 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0
[28882.298301] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 702 (offset 0 size 0 starting block 1843793)
[28882.298329] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 702 (offset 6623232 size 8192 starting block 1843795)
[28882.298335] Buffer I/O error on dev sda2, logical block 4203130, lost async page write
[28882.298360] Buffer I/O error on dev sda2, logical block 4203131, lost async page write
[28882.298382] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 5480 (offset 0 size 0 starting block 1845799)
[28882.298385] Buffer I/O error on dev sda2, logical block 4203132, lost async page write
[28882.298407] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 5480 (offset 6451200 size 8192 starting block 1845801)
[28882.298462] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 5504 (offset 0 size 0 starting block 1840128)
[28882.298498] EXT4-fs warning (device sda2): ext4_end_bio:315: I/O error 10 writing to inode 5511 (offset 0 size 0 starting block 1841312)
[28882.298569] Buffer I/O error on dev sda2, logical block 1048577, lost async page write
[28882.307151] JBD2: Detected IO errors while flushing file data on sda2-8
[28882.307196] Aborting journal on device sda2-8.
[28882.307337] JBD2: Error -5 detected when updating journal superblock for sda2-8.
[28882.307442] JBD2: Detected IO errors while flushing file data on sda2-8
[28882.318324] EXT4-fs (sda2): previous I/O error to superblock detected
[28882.324787] EXT4-fs (sda2): I/O error while writing superblock
[28882.324801] EXT4-fs error (device sda2): ext4_journal_check_start:61: Detected aborted journal
[28882.335182] EXT4-fs (sda2): Remounting filesystem read-only
[28882.341249] EXT4-fs (sda2): I/O error while writing superblock
[28882.467747] sd 0:0:0:0: [sda] Synchronizing SCSI cache

This requires a hard reset of the PI4. Any idea what could be going on here?

I've tried the Coral device on the 2 other RPI4Bs that I have and it produces the same error/problem.

@snowzach
Copy link
Owner

That's a new one for me. Do you think it's possible the USB flash drive is bad? Maybe it starts swapping and causing the issue? I have a RPi4 with microsd and never had an issue. I thought I saw some comment about power... did you delete that? Could the power be sagging? I thought I saw the RPi4 needs a 3A power supply...

@joshdurbin
Copy link
Author

Nah, I don't think it's an issue with the USB drive -- they're 2.5" western digital SSDs attached to UASP/USB3 to SATA3 interface adapters and I've got the exact same setup running on 4x RPI 4Bs -- all of which have been up and stable for months running various workloads. I think a relevant question is whether its something with the Firmware of the Coral device or the RPI4B. It's a bit interesting the device IDs change once loaded or probed via the app or whatever is happening there.

The PIs all have 3.5A power supplies.

I did write a thing about power earlier. I had installed the libedgetpu1-max package and replaced it with libedgetpu1-std thinking that it helped, but it didn't.

@joshdurbin
Copy link
Author

I'm running these workloads as part of a Hashi Nomad, Consul, Vault cluster, with noobs having an affinity to land on a particular node that has the Coral Edge TPU. What I can do, maybe, is move that workload to the one RPI 3b+ I have that's got an SD card as it's boot/primary volume to see if it's able to do its thing. I'll try doing that later today.

@joshdurbin
Copy link
Author

joshdurbin commented Nov 23, 2020

The PI3B+ doesn't have the same issues.

pi@rpi-3bplus-1:~ $ cat asdf.config 
doods:
  detectors:
    - name: default
      type: tflite
      modelFile: models/ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite
      labelFile: models/coco_labels.txt
      numThreads: 0
      numConcurrent: 4
      hwAccel: true
pi@rpi-3bplus-1:~ $ ls -lah models/
total 6.7M
drwxr-xr-x 2 pi docker 4.0K Nov 23 00:38 .
drwxr-xr-x 7 pi docker 4.0K Nov 23 09:41 ..
-rw-r--r-- 1 pi docker  930 Nov 23 00:38 coco_labels.txt
-rw-r--r-- 1 pi docker 6.7M Nov 23 00:38 ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite

It looks like we're doing requests pretty quick with the TPU:

pi@rpi-3bplus-1:~ $ docker run -it --device /dev/bus/usb -v /home/pi/models/:/opt/doods/models -v /home/pi/asdf.config:/opt/doods/config.yaml -p 8080:8080 snowzach/doods:latest
2020-11-23T18:01:34.701Z	INFO	detector/detector.go:79	Configured Detector	{"package": "detector", "name": "default", "type": "tflite-edgetpu", "model": "models/ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite", "labels": 80, "width": 300, "height": 300}
2020-11-23T18:01:34.707Z	INFO	server/server.go:284	API Listening	{"package": "server", "address": ":8080", "tls": false, "version": "v0.2.5-0-gbf6d7a1-dirty"}

Logs like:

2020-11-23T18:02:41.632Z	INFO	detector/detector.go:163	Stream Request	{"package": "detector"}
2020-11-23T18:02:41.695Z	INFO	tflite/detector.go:393	Detection Complete	{"package": "detector.tflite", "name": "default", "id": "", "duration": 0.060813247, "detections": 20, "device": {"Type":1,"Path":"/sys/bus/usb/devices/1-1.3"}}

... and a load average close to 1.

The same request stream using the non-TPU model is significantly slower.

pi@rpi-3bplus-1:~ $ docker run -it -p 8080:8080 snowzach/doods:latest
2020-11-23T18:04:00.308Z	INFO	detector/detector.go:79	Configured Detector	{"package": "detector", "name": "default", "type": "tflite", "model": "models/coco_ssd_mobilenet_v1_1.0_quant.tflite", "labels": 80, "width": 300, "height": 300}
2020-11-23T18:04:03.725Z	INFO	detector/detector.go:79	Configured Detector	{"package": "detector", "name": "tensorflow", "type": "tensorflow", "model": "models/faster_rcnn_inception_v2_coco_2018_01_28.pb", "labels": 65, "width": -1, "height": -1}
2020-11-23T18:04:03.728Z	INFO	server/server.go:284	API Listening	{"package": "server", "address": ":8080", "tls": false, "version": "v0.2.5-0-gbf6d7a1-dirty"}

log entries:

2020-11-23T18:04:54.708Z	INFO	detector/detector.go:163	Stream Request	{"package": "detector"}
2020-11-23T18:04:54.827Z	INFO	tflite/detector.go:393	Detection Complete	{"package": "detector.tflite", "name": "default", "id": "", "duration": 0.117880229, "detections": 10, "device": null}

...with a CPU load average closer to 3 proving that the TPU is about 2x as fast. Which is better than I thought it'd be considering it's USB 2.

I'm using this example to send 1080p h264 streams to doods. There might be a penalty downsampling the video frames too. Perhaps I should switch to lower res.

@1ubuntuuser
Copy link

Man these look good.... shame you can't buy them in Australia!

Hope you get it going!

@joshdurbin
Copy link
Author

Currently looking at a lead here thinking that it might related to power safe issues with the firmware.

@snowzach
Copy link
Owner

Hmmm.. That's interesting. Sorry, I don't have a lot of other suggestions. Most problems I have seen with EdgeTPU generally seem to relate to power or the USB. It either works perfect or has random crashes on a particular hardware platform. I have a raspberry Pi 4 and I don't have issues with it though. There is another issue with resizing causing it to be really slow, I am working on that one. It should be even faster soon.

@mmatesic01
Copy link

mmatesic01 commented Dec 26, 2020

@snowzach
Hi,
I also run doods on RPi4+Coral USB (seperate HW from Home Assistant) without any mayor issues.
Except occuring warnings/errors in image processing seen in HA logs.
"Updating doods image_processing took longer than the scheduled update interval 0:00:03"
"Update for image_processing.doods_terasa fails"

Is working but sometimes processing delays are longer.
process_tme is in 0.2s-0.35s area, each 3 s.

How is the progress with issue with resizing going?
Thanks.

@snowzach
Copy link
Owner

@mmatesic01 I fixed the issue with resizing making it slow, that shaved off a good 500ms or so. That is def fixed. Make sure for the models using the EdgeTPU you have numThreads set to 1 and numConcurrent also set to 1. It might be trying to send more than one detection at a time through the TPU and causing issues. I think I also upgraded the edgeTPU library as well. I'll have to check that. Been a while since I looked. Are you def on the latest version? If so, I can try checking for an EdgeTPU upgrade.

@mmatesic01
Copy link

@snowzach
Mine were numThreads: 0 and numConcurrent: 4. I corrected that now.
My version is: {"version":"v0.2.6-0-gb2a1c53-dirty"}

@mmatesic01
Copy link

mmatesic01 commented Dec 27, 2020 via email

@snowzach
Copy link
Owner

snowzach commented Dec 27, 2020

@mmatesic01 it doesn't matter which one you use... It will be resized to 300x300 because that's what the model requires. That's the latest version as well. Curious if those changes to the config help?

@mmatesic01
Copy link

Changes you sugested did not make much of an effect on silencing the errors in HA. But in general it does not effect the expected functionality.

@sebirdman
Copy link

sebirdman commented Jan 21, 2021

I'm seeing a very similar issue to @joshdurbin here. Running RPI4 off of a SSD with a Coral Edge TPU. The Edge TPU is plugged into a USB hub that is not powered.

HOWEVER, After the first run, so far my system seems stable. for whatever reason, the first time i start doods i get stuck, the error:

2021-01-21T22:33:57.270Z	ERROR	detector/detector.go:74	Could not initialize detector edgetpu: could not initialize edgetpu /sys/bus/usb/devices/2-1.2	{"package": "detector"}

Then after that, all is well and I haven't seen any crashes. Will see if the stability continues. Maybe if doods tried to initialize a few times, It would work without hitting this the first run?

EDIT: as I've run out of spots on my current USB hub, i'm going to grab a larger one that is powered separately. perhaps that'll help resolve this for me.

EDIT 2: Even after using a powered USB hub, i still see the same issues.

@mmatesic01
Copy link

@sebirdman
Do you use RPi 4 explicitly for Coral USB/Doods?
If that is so, I advise you to connect the Coral directly to USB 3.0 port on RPi with USB cable that Coral comes with. In that way RPi 4 has no issues powering the Coral USB. But applying proper power supply for RPi 4 also must be met.
I had lower performance issues when Coral USB was on USB 2.0 port or using USB 3.0 hub on USB 3.0 port.
If Coral USB heats up a bit the means that it uses full USB bandwidth and that it is working with 100%-ish performance.

@sebirdman
Copy link

sebirdman commented Jan 23, 2021

@mmatesic01 I've got this RPi 4 running a whole bunch of things, including Coral USB/Doods.

I used to run a NUT server for my UPS on it, but noticed that the UPS usb was frequently disconnecting from my RPi. After removing that device, so far - all of my issues have went away. Even the strange error on first start of the Doods server.

Thanks for the suggestion, I've got another RPi4 that is currently unused. If i continue to have issues i'll report them here and will try switching to a dedicated RPi4 for Coral USB/Doods.

Edit: Also, Do you have your dedicated RPi running wireless or wired internet? I'm curious how the network connection would effect the processing time from HA to Doods. I've got a few RPi's already setup around my home doing other things that i could try as well, but they're all wifi.

@snowzach
Copy link
Owner

snowzach commented Jan 2, 2022

Please try with new Python DOODS if this is still and issue: https://github.com/snowzach/doods2

@snowzach snowzach closed this as completed Jan 2, 2022
@mmatesic01
Copy link

mmatesic01 commented Jan 2, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants