Skip to content
This repository has been archived by the owner on May 3, 2021. It is now read-only.

Windows 7x64 Slow Orientation Tracking #18

Open
Ixstala opened this issue Dec 31, 2015 · 69 comments
Open

Windows 7x64 Slow Orientation Tracking #18

Ixstala opened this issue Dec 31, 2015 · 69 comments

Comments

@Ixstala
Copy link

Ixstala commented Dec 31, 2015

Hi,
Great work getting all of this together and working rather easily on Windows. I've got your Psmove libraries working in my graphics framework and the tracker seems to be working fine (even great) and reports >400FPS tracking a single move.

The problem arises when I try to get orientation estimates using psmove_fusion_get_projection_matrix & psmove_fusion_get_modelview_matrix; the orientation updates very slowly, ~1 or more seconds of lag. This affects psmove_get_accelerometer_frame as well.

I checked the raw accelerometer, magnetometer, & Gyro data, it is streaming fast, so I'm not sure why the orientation estimate is so slow. The orientation tracking works perfectly with your as built binaries. I've run the magnetometer calibration too and both files are in my AppData directory.

Any ideas?

If I can't get it working I'll probably just do my own implementation of the sensor fusion bit, although I would like to avoid that - it's so close to working!

Thanks!

@cboulay
Copy link
Collaborator

cboulay commented Dec 31, 2015

So, to be clear, you're not using UE4, correct?
The UE4 plugin and the provided binaries are all set to use lowpass filtering and not Kalman filtering.
See here and here.

The Kalman filtering never worked right for me, even after doing the smoothing_calibration. I'd get very noticeable lag, but nothing on the order of 1-second. I'm guessing it was closer to 100-msec.

If you initialize the psmove tracker using the settings as in the above examples, does it improve your orientation performance?

@brendanwalker
Copy link
Contributor

The orientation smoothing doesn't use the Kalman filtering. That's only for position smoothing. The default orientation smoothing is done in psmoveapi in psmove_orientation_update and uses a modified Madgwick filter. You could try switching to using the original madgwick filter, but it has drift issues. If you wanted to try that you can call psmove_set_orientation_fusion_type with a different filter after the call to psmove_enable_orientation in FPSMoveWorker.cpp

What does the output for you magnetometer file look like? Mine looks like this:

mx,my,mz
-0.218764,0.922553,0.316330
axis,min,max
x,-211.000000,160.000000
y,-280.000000,87.000000
z,-44.000000,303.000000

There was someone else who lived in the southern hemisphere (Indonesia) who was having weird controller drift issues when they laid their controller flat. I was going to look into that tonight as I suspect their might be some assumption my filtering code is making that works fine in the northern hemisphere, but breaks in the southern hemisphere (i.e. the 'my' value of the calibration direction is negative instead of positive).

@Ixstala
Copy link
Author

Ixstala commented Dec 31, 2015

Hi Thanks for the responses. You are correct I am not using UE4 for this project.

I went in and added the additional smoothing and tracker settings. It didn't seem to help. I just noticed that my raw accelerometer values are also quite laggy, initially I thought they were fine. For example I flip the controller end over end and the output lags by at least 0.5 second or more. I'm thinking I might have a problem with my polling frequency.

My magnetomer file looks like this:

mx,my,mz
-0.33063,0.928896,-0.1634
axis,min,max
x,-59,271
y,-72,250
z,-38,278

So not that different from yours.

@Ixstala
Copy link
Author

Ixstala commented Dec 31, 2015

I just checked and I'm polling at ~30FPS, so roughly 33ms between calls to my update function. This does seem a bit slow.

After looking at the output a bit more the orientation does track well, but it seems to be buffered so my motions occur some time after the fact.

Thoughts?

Update:
I'm making headway.
If I comment out the tracker_update_image and tracker_update the orientation estimate becomes speedy and stable. Maybe my camera is only operating at 30FPS? Is there some way to set that?

@brendanwalker
Copy link
Contributor

The framerate and resolution settings are part of the PSMoveTrackerSettings structure that get's passed into psmove_tracker_new_with_settings. The default should be 640x480@60fps.

Just out of curiousity do you have the same update latency issue when you run the test_opengl.exe?

Also If you are using the libusb camera drivers they often have trouble in multi-threaded apps running faster than 30fps. The Windows implementation of libusb driver doesn't support isosynchronous transfers and instead uses an event based bulk transfer that ultimately block using WaitForSingleObject() which has a hard time responding faster than about 20-30ms. You could try compiling against the commercial CLEye drivers instead of the libusb ones. Are you building psmoveapi directly or just using the library included with psmove-ue4? And is your app a 32-bit application or a 64-bit one. I ask because the CLEye DLL only has a 32-bit dll available.

@Ixstala
Copy link
Author

Ixstala commented Jan 1, 2016

Happy New Year!

I seem to have got this sorted now. I had to run a separate thread just for polling the controller and then move the tracker update calls into my drawing thread.

test_opengl.exe works just fine for me, no issues.
I'm using the psmove liibraries from the psmove-ue4 project and targeting a 64 bit platform.

Now my only issue is libusb flakiness. Occasionally the camera will drop out for a split second causing the controller to de-illuminate and the video feed to freeze. Depending on which USB port I'm plugged into this can get worse, fun!

@brendanwalker
Copy link
Contributor

Yeah this libusb flakiness is not uncommon. The advice I usually give to people is as follows:

  • Another USB device on the same hub interfering with the bandwidth available to the PS3Eye. The test camera app is running at 320x240, but psmoveapi uses 640x480 to get better tracking range (at reduced frame rate). This probably pushes it over a bandwidth threshold in some instances. Moving the PS3Eye to a different USB port not occupied by another camera (like the dk2 camera) or the bluetooth dongle may help. See issue Windows 10 - No light on controller when DK2 is on #13 comments.
  • Sometime Windows gets confused about which usb driver to be using when the camera has been unplugged and replugged in a bunch (it saves a bunch of mappings from usb port to driver). You can clean these up using USBDeview. Take a look at the troubleshooting wiki under the "Test_Tracker or Test_OpenGL stall out right after starting" section that walks through how to clean up old driver mappings.

Since you're application is 64-bit that makes using the CLEye driver a bit more complicated. There is a way to spin up a host exe to load the CLEye dll and then pull over video frames using shared memory. I have a modified version of psmoveapi that supports this that I used in my psmove-unity5 project, but I have had mixed success with it. Sometimes it works great and other times it stalls out completely. See the psmove-unity5 wiki for more details on that if you're interested.

Finally I should mention that cboulay and I are actively working on windows service / os x daemon replacement for psmoveapi that manages ps move controllers, trackers, calibration, etc. Applications connect to the service over a socket to get the controller tracking data (the socket management is hidden behind a simple api). This will allow multi apps to share access to the controller and remove the need to do blink calibration every time you start your app. I mention this because it may affect how you author your psmove support into your app depending on what your time frame/use case is.

@Ixstala
Copy link
Author

Ixstala commented Jan 2, 2016

I'm only building 64bit because that seemed to be the current magic combination of available built libraries, working/available camera drivers, and working bluetooth pairing process (sans motioninjoy), to get something done quick and dirty.

I tried running FRAPS on the test_opengl example and I get a locked 30FPS, which seems to be what I'm seeing in my own app (I have Vsync off). Before the tracker initializes my drawing loop is >2000fps.

CL Labs now advertises that their driver is compatible with 64bit windows, although you have to pay. Have you guys tested that out?

I'd be interested in testing out the CLeye drivers with a 32 bit build if my tracking speed would increase over 30FPS, but I couldn't find the .libs for psmove to link against (I'm using visual studio). I did buy the CLeye driver some weeks back, but it seems to be even flakier than libusb - and that's just in their video test software. Maybe I have a PSeye with some loose wires, it was bought used.

If you have a repository for the windows service I'd be glad to help test it out. Sounds like a much needed improvement.

@cboulay
Copy link
Collaborator

cboulay commented Jan 2, 2016

You can still use 64-bit built utilities for bluetooth pairing, camera calibration, etc, and then use 32-bit binaries for tracking. PS3EYEDriver (and libusb) are open source so you can build them with 32- or 64-bit architectures.

CL Labs now advertises that their driver is compatible with 64bit windows, although you have to pay.

Unfortunately in this case it only means that they run on Windows 32 or 64-bit. The binaries are still 32-bit.

I'd be interested in testing out the CLeye drivers with a 32 bit build if my tracking speed would increase over 30FPS, but I couldn't find the .libs for psmove to link against (I'm using visual studio).

There's "CL Eye Driver" and "CL Eye (Multicam) SDK". CL Eye Driver doesn't come with a header or .lib file; you communicate with it through Windows APIs (DirectShow) and modify its parameters through the registry. CL Eye SDK does come with a header and .lib, but programs built with this can only be used by people that have 'activated' their cameras. I'm not sure if there's a way to 'activate' without buying credits, and that seems to be even more expensive. (Users can also use 'Multicam' programs if they purchase the SDK, choose to install the 'developer' library files, and make sure the 'distributable' dll is not on the path).

Make sure when playing around with different drivers that you properly uninstall all of the old drivers before installing a new one. I think it's possible to get into a state where you have libusb finding the CL Eye camera, and you can get video, but none of the parameter settings will work.

@cboulay
Copy link
Collaborator

cboulay commented Jan 2, 2016

I tried running FRAPS on the test_opengl example and I get a locked 30FPS, which seems to be what I'm seeing in my own app (I have Vsync off). Before the tracker initializes my drawing loop is >2000fps.

In the UE4 plugin, we're communicating with psmoveapi in a secondary thread. That'll be necessary in any VR application, until PSMoveService is ready (and maybe even then)..

But, I think there might be a problem in PS3EYEDriver/libusb. I've found that switching the framerate between 30- and 60-fps doesn't make much of a difference, and PS3EYEDriver 60 fps is obviously lower than CL Eye 60 fps. There might be an easy win here if we can find the problem.

PS3EYEDriver was originally mac only. It was me that modified it to work in Windows, but I didn't check that all of the commands worked as expected. It seemed to work, so that was enough at the time. An easy place to test would be in the sdl test app.

@cboulay
Copy link
Collaborator

cboulay commented Jan 2, 2016

In reply to my last comment: I just re-read what Brendan wrote above.

The Windows implementation of libusb driver doesn't support isosynchronous transfers and instead uses an event based bulk transfer that ultimately block using WaitForSingleObject() which has a hard time responding faster than about 20-30ms.

That would certainly explain why changing the framerate from 30 to 60 appears to do nothing. The 'easy win' is probably not so easy. Maybe libusb on Windows can be fixed? This would benefit a lot of people.

A quick search found this. The linked pull request at the beginning of that thread is 404'd but the two commits can be found here.

@brendanwalker
Copy link
Contributor

A quick search found this. The linked pull request at the beginning of that thread is 404'd but the two commits can be found here.

That's super interesting. JoshBlake's commits adding isosynchronous transfers to libusb look pretty straight forward to incorporate. I did some searching to see who had used isosychronous transfers in libusb to provide a guide as to how to proceed. It turns out there are two interesting cases.

In both examples it just looks like you fill the transfer request a bit differently (more in flight packets) and the callback is a bit different (iterate over the iso packets). Other than that the ps3eye.cpp code wouldn't have to change too much.

The only thing that makes we worried is that even with isosynchronous transfers the same wait for event function using libusb_handle_events_timeout_completed (which calls WaitForSingleObject internally) is still used. However, after reading a bit about how isosynchronous transfers work here it sounds like they lock in a latency (as compared to bulk transfers which have no guarantee on latency). So I think this approach is certainly worth a try.

@cboulay
Copy link
Collaborator

cboulay commented Jan 2, 2016

Maybe unrelated, I was reading other libusb issues and I noticed that one libusb contributor kept saying "that'll be fixed with an upcoming event abstraction merge". Here is his fork that contains those changes. I suspect that this abstraction would make it easier to support isoc in Windows.

I also noticed that the isoc support seems to be backend-dependent. JoshBlake's commits linked above seem to only be for the libusbK. There's another issue that references a fork that implemented a libusbdk backend and that supposedly supports isoc OOTB. I couldn't find anything else about libusbdk or how to install it.

@Ixstala
Copy link
Author

Ixstala commented Jan 2, 2016

This sounds promising. Do you think you guys will be able to figure out how to add the ISOC support to speed up the video transfer?

@cboulay
Copy link
Collaborator

cboulay commented Jan 3, 2016

This issue is worth reading, especially the comments by Timmmm. I'm going to ping him and see if he had any success.

@Ixstala
Copy link
Author

Ixstala commented Jan 3, 2016

I was able to build your fork of psmoveapi with the CL eye SDK for x86 target in visual studio. I'm getting a solid 60FPS from the camera now, tracking is great - virtually no drift. Still some lag, but it's likely just a handful of frames.

For some reason CLEyeMultiCam.dll is missing from code laboratories latest SDK - maybe it's an oversight. Luckily I had it from a different API that I was exploring. There are two versions floating around, a 38k and 45k file. The 38k version worked.

@Timmmm
Copy link

Timmmm commented Jan 3, 2016

Hey guys, I never had any success with UsbDk or libusbK. At the time I wasn't sure my device firmware was working correctly though, so maybe the host software was fine. For the device I'm using an Atmel SAM3X8E on an Arduino Due - one of the very few Cortex M chips that supports USB High Speed.

I did get both patches building though so I could send you the code if you like. One of the main things that scared me off UsbDk was that it totally broke USB when I installed it on my work laptop!

I eventually gave up on libusb entirely and just waited for Windows 10 to come out. I now have USB High Speed isochronous transfers working perfectly using the native WinUsb driver (example code here - scroll to Step 4). I'm using 1024-byte packets, with one packet per microframe so I get around 8 MB/s transfer speed. You can have up to 3 packets per microframe so you can get up to about 24 MB/s (exactly 24576000 B/s) but I haven't tested that.

I was planning to add native WinUsb ISO support to libusb after getting it to work, but I haven't got around to it, and probably won't because WinUsb has a much easier to use API (and probably someone else will do it eventually anyway).

I can send you any of the following code if you like:

  1. Working WinUsb code that sends ISO IN requests. It's very similar to Microsoft's example code.
  2. Working Arduino Due firmware that just sends dummy data to an ISO IN endpoint.
  3. Non-working (AFAIK) code that uses libusb, patched to support isochronous transfers via libusbK ("libusb-winiso").
  4. Non-working (AFAIK) code that uses libusb, patched to support isochronous transfers via UsbDk ("libusb-usbdk-backend-v3").

By the way another reason to avoid UsbDk or libusbK is that the driver signing has got more difficult in Windows 10. I'm not exactly sure of the situation - there are discussions on the libusb mailing list.

@cboulay
Copy link
Collaborator

cboulay commented Jan 4, 2016

@Ixstala The CL SDK installer puts the DLL into C:\Windows\SysWOW64 . During setup, the installer will ask you if you want to install the "distributable" or the "developer" library. The "distributable" one won't work unless your camera is "activated". The "developer" one should work without any driver installed, but you can't package a project with this library. We have the latest "distributable" dill in our (still private) PSMoveService repo and it's ~ 80k.

@Timmmm This is going to be deprioritized for us for now but it is something that we are ultimately interested in fixing. I expect it will be useful in the future, so please send me (3) either to the e-mail I contacted you from or a link to a repo. As for (4), they are now up to v5... I wonder if there have been any improvements.

I'm going to ask someone else to take a look at this so the conversation might not be dead just yet.

@cboulay
Copy link
Collaborator

cboulay commented Jan 4, 2016

@rovarma Based on your interest in psmoveapi for Windows, I thought you might be interested in this thread.

@rovarma
Copy link

rovarma commented Jan 4, 2016

@cboulay Thanks for bringing this to my attention; this explains some of the issues I've been seeing as well (30 vs 60 FPS). I was planning on having a look at the problem in WPA, but haven't gotten around to it yet. I didn't realize libusb was using WaitForSingleObject internally, but that may indeed explain it.

When used with a timeout, WaitForSingleObject/WaitForMultipleObjects are dependent on the system timer resolution, which by default is 15.6ms, so depending on when the event is signalled (and the specific timeout used for the wait) within a system tick, it may wait up to ~15-30ms, which seems to fit with the delays @brendanwalker is seeing.

This problem (and related problems such as Sleep(1) sleeping for more than 1ms) are usually fixed by calling timeBeginPeriod(1) somewhere during application startup, which increases the timer resolution to 1ms (don't forget to call timeEndPeriod when done!).

I can't test it right now (late), but perhaps somebody in this thread can test calling timeBeginPeriod(1) in the main() of one of the test apps and see if that improves things? If it does, libusb may be just fine and we won't need to look at replacements (perhaps timeBeginPeriod(1) it should be part of PSMoveAPI's init in that case?).

@Timmmm & @cboulay I'm not particularly familiar with the inner workings of usb, libusb or isochronous transfers, but @Timmmm mentions an upper limit of 24.576 MB/s. Is that configurable? Because if not, that's clearly insufficient for our purposes, since we need atleast 640 * 480 * 2bpp * 60 FPS = 36.864 MB/s to drive the PS3 camera at 60 FPS. Nevertheless, @Timmmm, I'd be interested in your changes for (1), incase my above timeBeginPeriod suggestion doesn't pan out.

@cboulay
Copy link
Collaborator

cboulay commented Jan 4, 2016

@rovarma Can you also send me an e-mail chadwick.boulay at gmail.com ? I'd like to ask you about something else.

@Timmmm
Copy link

Timmmm commented Jan 5, 2016

@rovarma Quick explanation of USB 2 speeds:

USB 2 "High Speed" (480 Mb/s) transfers occur in 1 ms frames, which are divided into 8 microframes (125 us each). With an isochronous transfer you can send up to 3 packets per microframe and the packets can be up to 1024 bytes each. That gives you a maximum speed of 1000_8_3*1024 = 24 MB/s.

Bulk transfers can be faster because they can theoretically have up to 13 512-byte packets per microframe, but the bandwidth is not gauranteed (for iso transfers it is). In practice bulk transfers can get up to around 40 MB/s.

The packet size and number of packets per microframe are decided by the device, although I believe some devices provide several alternate interfaces with different sizes, so you may be able to choose. It is easy to see if this is the case by plugging in the device and using UsbView (on Windows; lsusb on Linux). Look at the wMaxPacketSize field (it also encodes the number of packets per microframe) and bInterval (if >1 then not every microframe is used).

More info here: https://msdn.microsoft.com/en-us/library/windows/hardware/ff539317%28v=vs.85%29.aspx

If the PS3 camera can actually output 640x480x2@60fps I'd guess they are using some form of compression (or a non-standard USB implementation). Modern USB webcams all use on-board H.264 encoders to allow them to output HD video over USB 2.

@rovarma
Copy link

rovarma commented Jan 6, 2016

@Timmmm Thanks for the information. I've been doing some reading about isochronous transfers (in addition to your information) and I think I have a better understanding now.

Looking at the configuration of the isochronous endpoint on the PS3 Eye:

  • wMaxPacketSize is 0x0300 (ie. 768 bytes) with 1 packet per microframe
  • bInterval is 4

If I understand the meaning of bInterval correctly, this means that the driver will poll the camera every 8 microframes for new data (ie. every 1ms), leading to a theoretical max bandwidth of 768 bytes * 8 microframes * 1000ms = 6.144 MB/s on the isochronous endpoint.

This is clearly not enough to drive the video stream at 640x480@60 FPS (by far), which leads me to suspect that the isochronous endpoint on the PS3 Eye is intended to stream audio data (it also has a microphone array built in).

Please let me know if I misunderstood something; I am not an expert on USB.

Regarding the output of the video stream: the PS3 Eye output is in YUV422 format (ie. 2 bytes per pixel), so 36.864 MB/s of bandwith is definitely needed to stream 640x480@60 FPS.

@cboulay @brendanwalker Given all of the above, I don't think usage of isochronous transfers will work for the video output and bulk transfer will need to keep being used. It's again late, so I'll try to have a look at whether using timeBeginPeriod in one of the test apps makes a difference tomorrow.

@brendanwalker
Copy link
Contributor

@rovarma Ahh I bet you are totally right about the iso endpoint being for the mic. I just did a quick test with timeBeginPeriod in psmoveapi and ran locally with the test_tracker app. It /seems/ snappier but I don't have any hard numbers yet. I have some watchdog timers implemented in psmove-ue4 on the psmove worker thread (in particular around the camera update call). I can do a before and after test to see if update perf is helped there tonight.

@cboulay
Copy link
Collaborator

cboulay commented Jan 7, 2016

https://github.com/cboulay/psmoveapi now has its PS3EYEDriver submodule pointing to the modified version. Brendan, if you'd like to do it tonight, can you build the dll's then drop them into psmove-ue4 and create a pull request? Otherwise I should have time to do it tomorrow.

@brendanwalker
Copy link
Contributor

Yup I was just working on that now. I'll update with a link when I get that in.

@brendanwalker
Copy link
Contributor

I just did a test in both psmove-ue4 and with psmoveapi with timeBeginPeriod(1) added to ps3eye.cpp. Sadly I'm still getting frame update rates between 30-50fps. I added some timing code in psmove_tracker_update_image() where we read the frame from the ps3eye code just to be sure. At first I thought the issue might be that we forgot to drop the timeout value from 50ms to something lower on the call to libusb_handle_events_timeout_completed (it was 50ms) but lowering it to 5ms did nothing. So unless I'm testing this wrong, this fix doesn't appear to give the perf gains we hoped.

@rovarma
Copy link

rovarma commented Jan 7, 2016

Ah...that's unfortunate. I'll do some digging with WPA tonight to see if I can find something. Thanks for testing.

@Ixstala
Copy link
Author

Ixstala commented Jan 9, 2016

How does the CL eye driver achieve 60FPS if there isn't an ISOC endpoint for the video camera. Maybe the camera frame rate/video mode is not being set properly with libusb?

@Timmmm
Copy link

Timmmm commented Jan 10, 2016

@rovarma Yep that sounds right to me, although if bInterval is 4, and there is 1 packet per microframe it actually means that it sends 1 packet per 8 microframes, giving a data rate of 768 000 bytes/second.

Sounds about right for audio - two channels at 48 kHz, 16 bit is 192 000 bytes/second.

@Ixstala If they use bulk endpoints you can get up to about 40 MB/s. 640 x 480 x 2 x 60 is about 36 MB/s so they could do it that way (especially if they control the hardware and can put the camera on its own bus).

@cboulay
Copy link
Collaborator

cboulay commented Jan 25, 2016

About that previous comment, can you also change main.cpp to use 640x480@60fps instead of its current 320x240@187fps?

Another thought: I'm guessing it's impossible to make the changes you outline without changing the PS3EYEDriver API. Is that correct? If so, then can you open an issue on the Inspirit PS3EYEDriver repository outlining your proposed changes? If getting the high throughput is worth it then they might consider changing the API in the upstream repository. That way, more people will benefit from your work and, more importantly, the burden of supporting these changes will be shared by more people.

@brendanwalker
Copy link
Contributor

@cboulay I just tried building ps3eye_sdl last night with the latest from your rovarma-optimization branch (I used manually setup msvc project files rather than mingw). I got 640x480@60fps (which is awesome!) but am seeing a lot of flickering artifacts on my win10 home machine. I did get this flickering artifacts before but they were more infrequent. I'll upload a video tonight and update this post so you can see what I mean. Is this something you guys have seen? I suspect this is might be an issue with my local usb setup (too many devices on the same root hub).
@rovarma What is the usb hardware sniffer you are using? I'd love to try and analyze what other traffic might be interfering with my camera's usb packets.

@cboulay
Copy link
Collaborator

cboulay commented Jan 25, 2016

I decided to setup a MinGW build system anyway. I was able to get the original PS3EYEDriver SDL program to compile (no rovarma optimizations). That went @ 60fps (!) but with frequent flickering. It's a bit surprising that it went at 60 fps, but this was on a high-spec desktop (i7, very high specs for 1 yr ago). I also tested on my Macbook Pro (2 yrs old), and it also went at 60 fps. I don't know whether to attribute that to better libusb handling in Mac or to the computer specs. But, what I'm getting at is that I can't reproduce < 60 fps performance on either of my computers except when I use OpenCV to display images (or maybe when I'm using more resources with psmoveapi and a game engine, but I never profiled that).

I tried to build the version in rovarma-optimization. I wanted to see if there was an improvement in the flickering. But I was getting build errors. @rovarma , have you tried a mingw32-make of the sdl example in your optimization branch?

Oh, and I answered my previous questions about the SDL2 include. Yes, if we remove the README instructions to move the include files around then both Windows and OS X can #include <SDL.h> and not <SDL2/SDL.h>. I made a pull request to inspirit's repo with these changes.

@rovarma
Copy link

rovarma commented Jan 25, 2016

Oof, a lot of things to respond to here! Let me know if I missed something :)


Another thought: I'm guessing it's impossible to make the changes you outline without changing the PS3EYEDriver API. Is that correct?

Yes, the API changes are unfortunately required. The reason is that in order to associate a specific libusb_context with a specific libusb_device, the libusb_context passed to libusb_get_device_list must match the libusb_context you wish to associate the device with.

The old API was using a single context to enumerate all the devices once, and then returning that as an array of PS3EYERef, which caused all created devices to share the same context. This is normally fine, until you want to use multiple cameras simultaneously, then it breaks down. Internally all libusb_contexts share locking state, send/receive descriptors etc; this is very poorly documented (like most of libusb, really quite frustrating).

If so, then can you open an issue on the Inspirit PS3EYEDriver repository outlining your proposed changes? If getting the high throughput is worth it then they might consider changing the API in the upstream repository. That way, more people will benefit from your work and, more importantly, the burden of supporting these changes will be shared by more people.

Yes, I was planning to do that anyway, but I wanted to wait until it had been further tested, before pushing this to any mainline branch.

I did get this flickering artifacts before but they were more infrequent. I'll upload a video tonight and update this post so you can see what I mean. Is this something you guys have seen? I suspect this is might be an issue with my local usb setup (too many devices on the same root hub).

It depends on what you mean by 'flickering'. Is it actual frame corruption? Or does it look like your frame is having the right/left sides flipped (basically flip around the vertical axis) every other frame? I suppose your video will illustrate it more clearly, but if it's the second case, I have in fact seen that before and that was due to a bug in the software (psmoveapi) rather than in the driver/camera/usb hardware.

As for the USB hardware sniffer, I used this thing, mostly because I could borrow one from work. But, I am not sure if it will help with multiple devices on a single controller interfering with eachother; it's an inline analyzer that you put between your device and the port it's connected to and as such will only capture data on that line.

Do note that I am far from an expert on USB performance analysis; about the only thing I got from the hardware analyzer was that it seemed like the device was not sending packets at the rate I would expect, which led me to read about how the USB protocol actually works in the first place. The optimization to increase the number of simultaneous transfer was more a reasoned guess based on the protocol documentation rather than any hard proof I got from the analyzer.

Before you go the analyzer route, here's some things you might try:

  1. You mention that you have a lot of devices attached to the same hub. Is it possible to just disconnect everything but the camera and see if it goes away?
  2. I used this USB software analyzer to generate the USB bandwith graphs in my original post. It has a free trial, but running it should at least give you a good idea of where the problem lies; if it's reporting a bandwith of ~36 MB/s then it's a fair assumption that there's actually nothing wrong with your controller (it's getting 60 FPS from the camera at that rate). It also has the USB protocol packets in there, so that may even give you an idea about other-device-interference.
  3. Are you using any USB extension cables between the camera and your PC by any chance? If so, did you try plugging it in directly to see if that fixes any issues? I did have some flakiness with extension cables a while ago, which were solved by switching to active USB extension cables.

That went @ 60fps (!) but with frequent flickering. It's a bit surprising that it went at 60 fps, but this was on a high-spec desktop (i7, very high specs for 1 yr ago). I also tested on my Macbook Pro (2 yrs old), and it also went at 60 fps. I don't know whether to attribute that to better libusb handling in Mac or to the computer specs. But, what I'm getting at is that I can't reproduce < 60 fps performance on either of my computers except when I use OpenCV to display images (or maybe when I'm using more resources with psmoveapi and a game engine, but I never profiled that).

Yes, the bad performance is far from a reproducible thing. My main desktop has zero issues with running at 60 FPS, but it's a pretty beefy machine. I suspect it's because the 'outer' loop (that calls psmove_tracker_update) is so fast that it can actually queue the 2 transfers back to back, leading to no bandwith loss, but I am not 100% sure about that. My two other test machines get nowhere near the required bandwidth, even though they're not like 10 year old PCs or whatever.

Bottomline, I don't think my optimizations are 100% required in all cases, but they do make it more reliable in the general wildly-varying-hardware-case.

have you tried a mingw32-make of the sdl example in your optimization branch?

No, I have not. I exclusively work with MSVC, but I can try setting up a MingW buildsystem. Do you happen to have some instructions somewhere that I can follow? I'm pretty stuck in my happy windows workflow :)

@cboulay
Copy link
Collaborator

cboulay commented Jan 25, 2016

No, I have not. I exclusively work with MSVC, but I can try setting up a MingW buildsystem. Do you happen to have some instructions somewhere that I can follow? I'm pretty stuck in my happy windows workflow :)

Instructions are here. But, as your Windows builds are using MS Win APIs, you'd have to change your platform checks to look for MSVC vs rest instead of WIN32 vs rest.

Alternatively, I think you can save a lot of cross-platform work by using std::mutex. Recent versions of MSVC have good C++11 support, as does Xcode, and some freely available mingw builds. Here is an example of semaphores in C++11.

@rovarma
Copy link

rovarma commented Jan 26, 2016

Thanks, I'll have a look at MingW tonight.

I actually originally tried using the std threading support, but it wouldn't compile under MSVC. I was getting a lot of compile errors in < ratio > about not being able to find types that should've been defined in < stdint.h > but somehow weren't.

I tried to get that to work for several hours before I gave up.

@brendanwalker
Copy link
Contributor

@rovarma I gave device monitoring studio a try while running my local version of ps3eye_sdl a try with various usb cable configurations. I originally had my ps3 eye plugged into a USB3 hub with nothing else on it. To make sure I wasn't getting any other device interference I pulled out everything except my keyboard and mouse and plugged the ps3 eye directly into the back of my PC. I was still getting between 28 and 36MB/sec from the camera when running at 640x480x60fps:

image

I tried the other resolutions and frame rates (32x240x60, 320x240x187, and 640x480x30) to see if they resulted in the same variablity, but they were always stable (at 8, 28, and 18MB/s).

Then after removing a sleep statement in the frame polling loop I got a stable 36 MB/s:

image

So this would seem to me either that the sleep was taking longer than 10ms randomly or that falling behind on reading frames chokes up the event polling somehow?

@rovarma
Copy link

rovarma commented Jan 26, 2016

@brendanwalker The fork you linked to appears to contain the vanilla version of PS3EYEDriver (ie. without my changes). Is that the version you ran the usb monitor against?

If so, then what you're seeing makes complete sense: in that version the FPS you'll get is directly dependent on how frequently you call PS3EYECam::updateDevices, since that drives the libusb event loop. If you don't call it frequently enough, libusb will not queue its transfers back-to-back, causing you to lose bandwith (the camera will not be sending data).

Part of the change I made in my fork was specifically to cut this dependency; the libusb event loop is now driven from a seperate thread so that the program consuming the frames does not have to worry about these details and can simply get frames as fast (or slow) as needed.

Regarding the sleep resolution: sleep on windows is actually pretty precise, but is only as precise as the system timer resolution (the time you specify is quantized to the nearest system timer interval).
You can set the system timer resolution to 1ms by using timeBeginPeriod(1) as I described earlier in this thread. After setting it, you should not be seeing the variation you were seeing before anymore.

Bruce Dawson has a pretty good post explaining the sleep behaviour over here.

Does removing the sleep also fix the flickering issue you were seeing? Or did that only show up when using my fork?

@cboulay
Copy link
Collaborator

cboulay commented Jan 26, 2016

@rovarma Did you try MSVC 2015? (It's actually supposed to be there as of MSVC 2012, though buggier in older versions).
Also, what time zone are you in? I'm EST (UTC-5) and Brendan is in PST (UTC-8) but he likes to work late.

@rovarma
Copy link

rovarma commented Jan 26, 2016

Nope, I'm using 2013 at home. I've been considering the switch to 2015 (have been using it at work for about a year now), just haven't gotten around to it yet.

However, I believe it indeed should work with 2013, it's just some weird include ordering issue; googling around for the error will turn up some people who 'fixed' it by fudging with the include ordering, but I couldn't get it to work. I can't remember the exact error I had off the top of my head, but if you #include < thread > in ps3eye.cpp you should see it.

I'm in GTM+1 (Europe)

@brendanwalker
Copy link
Contributor

@rovarma Some updates on my testing

The fork you linked to appears to contain the vanilla version of PS3EYEDriver (ie. without my changes). Is that the version you ran the usb monitor against?

Ah oops, I meant to link to my "rovarma-optimization" branch that includes your changes:

https://github.com/brendanwalker/PS3EYEDriver/tree/rovarma-optimization

This version has the updateDevices call commented out in the update loop of the test app as well as your other changes I brought over:

https://github.com/brendanwalker/PS3EYEDriver/blob/rovarma-optimization/sdl/main.cpp#L114

This was the version that I used to get those graphs from. I also finally got around to recording me running my build of ps3eye_sdl:

https://youtu.be/2cVb4bHNKuQ

In addition to watching me be a spaz, you can see there are a few minor hitches in the video, but overall it holds at a solid 60fps, with that sleep() commented out:

https://github.com/brendanwalker/PS3EYEDriver/blob/rovarma-optimization/sdl/main.cpp#L206

Independent of this, I tried incorporating your ps3eye changes into my branch of psmoveapi used for psmove-unity5. The test_tracker and test_opengl tools in that fork are definately snappier than they were before. I then incorporated that new version of psmoveapi into a branch of psmove-unity5 to test and it's looking good there too:

https://youtu.be/qI_lyUydsbc

On a side note, I noticed that if I closed the ps3eye_sdl test I was getting a consistent double-delete crash in the Semaphore::Destroy() call here:

https://github.com/brendanwalker/PS3EYEDriver/blob/rovarma-optimization/src/ps3eye.cpp#L141

I added an check to make sure that it wouldn't crash if you call that method twice, but I'm not sure if that is the right place to make that check. It seemed to fix the crash for me though.

@rovarma
Copy link

rovarma commented Jan 28, 2016

Hmm. Thanks for the info. That's really weird to me.

If you're using my version, I really wouldn't expect the sleep in the main loop to impact the bandwidth of the camera at all. The entire point of my change is that the data transfer from the camera is decoupled from the rate at which frames are actually consumed.

With that sleep there, what I would expect is that you'd potentially present at < 60 FPS, but that the driver would continue grabbing at 60 FPS, so maintaining a steady 36 MB/s. That's clearly not what's happening for you though.

I do notice that the SDL example is also polling the frames from a separate thread, which is then polled again by the mainthread. Perhaps there's some weird interaction going on there? I'll try and have a look at your branch tonight, see if I can reproduce anything.

About the frame corruption: Does the frame corruption appear only in the SDL test app? Or does it also appear in, say, test_tracker? If it's only in the SDL app, I would imagine it's simply due to the fact that the code is not thread safe.

There is a thread memcpy'ing data from the driver into the ctx->buffers data structure, while the main thread is at the same time continuously memcpy'ing data from ctx->buffers into the framebuffer. This is all completely unsynchronized, so very possible for the main thread to present a half-memcpy'd buffer/other random garbage (for example), which would look like corrupted frames.

I would recommend removing that separate polling thread in the SDL app: simply get the frames directly from the driver (you don't even need the various yuv422_buffer_t structures anymore) and present those. If you make this change, I wonder if the USB bandwidth drop is still present when using SDL_Delay(10). Can you test this (if you're still awake)?

Glad to hear that the FPS is otherwise nice and steady at 60 FPS though!

@rovarma
Copy link

rovarma commented Jan 28, 2016

BTW @cboulay @brendanwalker I had a think about the code structure of the new driver some more and I think that with the change that the producer thread will no longer block if the buffer is full, I can simplify the change quite a bit, keeping it very close to the original driver API. I'll try and test this tonight in a new branch.

@rovarma
Copy link

rovarma commented Jan 28, 2016

@cboulay Just tried including < thread > in ps3eye.cpp again. These are the errors I'm seeing:

C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(122): error C2065: 'INTMAX_MAX' : undeclared identifier
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(133) : see reference to class template instantiation 'std::ratio<_Nx,_Dx>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(124): error C2065: 'INTMAX_MAX' : undeclared identifier
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(44): error C2065: 'INTMAX_MAX' : undeclared identifier
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(217) : see reference to class template instantiation 'std::_Safe_mult<0x01,0x01>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(36): error C2338: integer arithmetic overflow
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(44) : see reference to class template instantiation 'std::_Safe_multX<0x01,0x01,false>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(44): error C2039: 'value' : is not a member of 'std::_Safe_multX<0x01,0x01,false>'
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(44): error C2065: 'value' : undeclared identifier
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(44): error C2057: expected constant expression
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(44): error C2039: 'value' : is not a member of 'std::_Safe_multX<0x01,0x0989680,false>'
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(219): error C2975: '_Nx' : invalid template argument for 'std::ratio', expected compile-time constant expression
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(116) : see declaration of '_Nx'
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(219): error C2975: '_Dx' : invalid template argument for 'std::ratio', expected compile-time constant expression
C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\ratio(117) : see declaration of '_Dx'

Including < thread > in a different solution/project works just fine, however, so it has to be something specific to either ps3eye.cpp or to the psmoveapi project configuration. I'll try and see if I can get this to work so I can ditch the homegrown threading primitives.

In addition, as I mentioned in my post above, I think with the change of FrameQueue::Enqueue no longer blocking if the buffer is full, I can get rid of the thread-per-camera approach and just run the event loop in a single thread. I need to test this performance-wise (especially in a multi-camera scenario), but if it works, it should simplify the change quite a lot; the API would remain largely the same. That, in turn, will probably help with upstream adoption.

I hope to be able to test this over the weekend. I'd recommend holding off on merging my current optimization branch into any of your projects until I get this part sorted; the resulting code/change is likely to be much smaller and cleaner.

@brendanwalker I tried out your fork of ps3eye_sdl (with my optimizations) and I cannot reproduce the USB bandwidth drops when putting the SDL_Delay(10) in. However, I do get some occasional frame corruption. Turning off the thread polling the camera in main.cpp and just polling the camera in the mainloop directly makes this go away for me.

The fixed main loop looks like this:

while (ctx.running) {
    while (SDL_PollEvent(&e)) {
        if (e.type == SDL_QUIT) {
            ctx.running = false;
        }
    }

    // TODO: Proper thread signalling to wait for next available buffer
    //SDL_Delay(10);

    uint8_t *new_pixels = ctx.eye->getFrame();

    SDL_LockTexture(video_tex, NULL, &video_tex_pixels, &pitch);
    memcpy(video_tex_pixels, new_pixels, ctx.eye->getRowBytes() * ctx.eye->getHeight());
    SDL_UnlockTexture(video_tex);

    free(new_pixels);

    SDL_RenderCopy(renderer, video_tex, NULL, NULL);
    SDL_RenderPresent(renderer);
}

Can you try and see if this also fixes the corruption you're seeing? I'm also curious if the USB bandwidth drops still happen if you switch to this main loop and put back the SDL_Delay.

@brendanwalker
Copy link
Contributor

@rovarma I updated my version of ps3eye_sdl to rip out the thread polling in main.cpp and just call ctx.eye->getFrame() directly as you suggested. That completely fixes the frame corruption I was seeing. I also added back in the SDL_Delay(10) and it doesn't see to be a problem anymore. I should note that between now and the last time I tested this I added a new USB 3.0 expansion card that the PS3 eye is now connected to. Not sure if this should make a difference or not (I was plugged into another USB 3.0 port on my MOBO previously). In any event, everything seems to be running well now. These changes have been checked into my branch

@rovarma
Copy link

rovarma commented Jan 30, 2016

@brendanwalker Cool, that's good to hear. I'm currently working on simplifying the change, should be done soon.

BTW, you probably want to free(new_pixels); in the loop somewhere; you're leaking 36 MB/s now ;)

@rovarma
Copy link

rovarma commented Jan 31, 2016

@cboulay @brendanwalker I've finished the clean version of my changes. It's here.

  • The API is now mostly unchanged:
    • PS3EYECam::handleEvents has been removed
    • PS3EYECam::isNewFrame() has been removed
    • PS3EYECam::getLastFramePointer has been renamed to PS3EYECam::getFrame
  • Now using C++11 threading primitives instead of my own.
  • There is now a single updater thread for all cameras, instead of one per camera.
  • I've updated the SDL test app to remove the background thread (similar to the changes @brendanwalker already made)
  • Also builds with MingW now

Can you guys do another test to see if performance is still solid for you?
@cboulay Can you try building this with OSX and/or Linux again to see if anything is missing?

If everything still works okay and builds on other platforms, I'll make a pull request to inspirit's repo for all of this.

@cboulay
Copy link
Collaborator

cboulay commented Jan 31, 2016

Looks great. I won't have time to try it until tonight (about 11 hrs from now).

@brendanwalker
Copy link
Contributor

@rovarma I just merged your latest changes from your "optimization-clean" branch. The only differences I have now is that I left in the FPS counter in main.cpp and my msvc project files (since I don't have Mingw installed). My forks of ps3eye_sdl and psmove api built and ran great. I also updated my "rovarma-optimization" branch of psmove-unity5 and everything runs great there too.

@cboulay
Copy link
Collaborator

cboulay commented Feb 1, 2016

I knew I shouldn't have promised a time. I was asked to sub for my old ultimate team. I'll do this when I get back.

@cboulay
Copy link
Collaborator

cboulay commented Feb 1, 2016

@rovarma The ps3eyedriver.h/.cpp files do not exist in the inspirit upstream. Those files were created by thp to give a C API. While it's not a bad idea for inspirit to provide a C API, so it might get merged upstream, I think there might be more acceptance it if the filenames were changed. i.e., ps3eye_capi.h/.cpp or something similar. This was a point of confusion for me when I first started with ps3eyedriver. I think both repos will be better off with this change but it's not a major issue.

+1 for adding the FPS counter

I had to modify the psmoveapi/src/CMakeLists.txt so Xcode would support C++11, necessary for <atomic> and <threading>.

With those changes, the camera worked well for me. I tested ps3eye_sdl and test_opengl. I don't have a PSMove with me so I can't comment on the effect on tracking.

@rovarma
Copy link

rovarma commented Feb 1, 2016

Thanks for testing.

I'll add the FPS counter in and make the required changes for C++11 support on XCode tonight, then see about opening a pull request to inspirit/thp.

@rovarma
Copy link

rovarma commented Feb 1, 2016

@cboulay @brendanwalker I've added the FPS counter to the SDL app + renamed the ps3eyedriver.h/cpp here.
Also modified psmoveapi/src/CMakeLists.txt as @cboulay said to make it build under XCode.

I've opened PRs in all relevant places for all of this (inspirit/PS3EYEDriver#21, https://github.com/thp/PS3EYEDriver/pull/15, thp/psmoveapi#200).

Thanks for testing and helping out.

@brendanwalker
Copy link
Contributor

@rovarma Thanks for implementing this optimization! It's been nice using the snappier camera tracking here at work. And it's allowed me to can rip out my janky implementation of the CLEye driver support I had in psmove-unity5 (since the ps3eye driver is now delivering frames just as fast).

@rovarma
Copy link

rovarma commented Feb 2, 2016

No problem! Glad it works for you.

@KRSteinke
Copy link

Awesome work rovarma! Your work is greatly appreciated! Even by us lurkers.

@brendanwalker
Copy link
Contributor

@rovarma @cboulay FYI I just found one small issue with the latest camera driver that only manifests if you attempt to close and re-open a camera in the same process. In all our test apps we've never do this (open camera on start-up, close on shut-down). However in both psmove-ue4 and in psmove-unity5 we setup and teardown the tracker context whenever play is hit on the game. The USBMgr singleton has a exit_signaled flag used to tell the camera thread to exit. Since USBMgr stays around for the lifetime of the process, the exit_signaled flag wasn't getting reset after the camera thread finished shutting down. The next time you try to start up the camera thread it exits right away and then the FrameQueue::Dequeue() function will block forever waiting for a new frame that will never come. I made a fix in my fork of PS3EYEDriver here that resets the flag.

@rovarma
Copy link

rovarma commented Feb 5, 2016

@brendanwalker Ah, good catch. Thanks for the fix. Can you open a PR for that to inspirit's repo, or shall I?

@brendanwalker
Copy link
Contributor

I can open a PR tonight. I've been meaning to bring my fork of PS3EyeDriver in line with inspirits anyway.
EDIT: PR submitted to inspirit/PS3EYEDriver:master: inspirit/PS3EYEDriver#22

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants