-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes three things Hangprinter needs #680
Fixes three things Hangprinter needs #680
Conversation
... follow this pull request to see when stock ODriveFirmware can be used again. odriverobotics/ODrive#680
For my step/dir usecase to work properly, I set these configuration values:
The requirement of the "extremely critical" line is kind of mentioned as part of the name of the variable, but it wasn't immediately clear to me either from the docs or from Oskar's heads up yesterday. It was also very confusing that |
You said you fixed an underflow bug, but I think that's not right. |
Ok. Is there a third variable at play here? My motor is able to run both forwards and backwards relative to zero, when driven by step/dir. It doesn't suddenly spin wildly when I step below zero, like my CAN code in this commit did (before I "fixed" the underflow by allowing negative I mocked up the Any ideas how we avoid the sudden wild spin if |
So the "underflow flix" only appeared to be needed because I had used step/dir together with |
I put |
When I cross below
I tested dropping below the wrap limit at very low motor speed. Sending a few step-signals forward, to get the |
I'm suspecting that the bug comes from here: |
At least, since the speed stays constant, we know that |
So the 0.5.4 fw indeed has an underflow bug. I'm not able to fix it, so I will go back to 0.5.1 I think. |
After reducing circular range from 10000 down to 1000 it works as expected across 0. The setting of input pos via can also seems to work. However, my application reads out encoder estimate first, and then uses that to set input pos. This doesn't work when we're using circular setpoints. I get an offset between the encoder readout and the setpoint required for the machine to stand completely still upon switching to pos mode. |
In my opinion the I pushed a suggestion for how my use case may be supported. By making the motor to stand still upon enabling position control via CAN, regardless of what happened before that. If users depend on the feature that the motor immediately chases some setpoint upon entering position control mode, I'm sure their use case can be supported by setting their desired input pos immediately after enabling position control mode. It's much safer that way. |
The latest fw changes have been tested on hardware and seems to work well. |
... because otherwise, the effect of the callback is canceled by an assignment in the main update() loop, that calculates input_pos based on steps_, if we're in step/dir mode. - Fixes a steps_ underflow/wrapping bug
6b0fd23
to
87633b8
Compare
Rebased, fixed some issues, and force pushed. Copying my comments so you get the whole picture: "Hi, I've made a build of the devel branch, and can't seem to get the motors into closed loop control anymore. dump_errors(odrv0) reports no errors, and the motors do the full calibration sequence just fine I got it working again by reverting the commit called "add back subcomponent error flags" Before the "add back subcomponent error flags"-commit, I could get my system back to working by doing a full calibration sequence, setting motor -and encoder-pre_calibrated flags to True and rebooting. After the "add back subcomponent error flags"-commit I can still do the full calibration sequence, and set my pre_calibrated flags, but the system does not get back to working after saving and rebooting. dump_errors reports no errors, but the motors refuse to enter closed loop mode. Also, after the "Fix get_iq_callbacks function"-commit, the CANSimple::get_iq_callback(axis)-function only sends a response if axis==0. Setting the get_iq_callback() back to the state it was in 0.5.1 gives me my axis1 readings back, most of the time.
The only thing I can see that is different is that this version might be slightly faster. I could get rid of the last few failing CAN calls by disabling the heartbeat messages: odrv0.axis0and1.config.can.heartbeat_rate_ms = 0 |
This corresponds to
Generally we want to keep this commit but this sounds like a bug. I don't have an absolute encoder setup handy so I cannot easily try. What happens if you run encoder offset calibration after reboot? Does it enter closed loop control after that? Maybe you can also check with the debugger to see where it goes wrong? I don't really know what's wrong with CAN. Maybe you can swap the node IDs of axis0 and axis1 and see if the dropped messages are still on axis1 or moved to axis0. |
I was running odrivetool from the git repository.
I use the 8192 cpr indexed encoder from the ODrive webshop. Not an absolute one.
The problem persists.
Sorry, I don't have time for this now. If you detail how I would do it maybe I get time in the future. With the can, it's a timing issue. Left unchanged, the axis1 fails (response is too slow, so my other board times out waiting for a response) 100% of the time. If I change the code around slightly, so make it faster, then the problem disappears from axis1, but appears 10% of the time in axis0 instead. If I make the code even faster, the 10% reduces to 5%. If I remove the heartbeat messages, the problem disappears completely. I tried to provoke the same error using the get_encoder_estimate message but failed. The encoder messages never time out. |
I'll look at your response next week but in the meantime you could try to compile in release mode instead of debug mode, that could free up some CPU time. |
All my builds have been in release mode |
Can you confirm that this is what you're doing:
And step 4 succeeds before cc7bbd4 and fails after cc7bbd4? Regarding CAN, it sounds like in both cases the ODrive sends back the correct response, just in one case it's a bit slower so your receiving end times out? |
Yes, can confirm that step 4 succeeds before cc7bbd4 and fails after. Regarding CAN, I haven't looked at the signal, or tried extending the Duet3 timeout, so I can't say for sure. But I believe what you describe is what happens. |
I tried the following on the latest
Regarding the CAN delay we currently don't make any real-time guarantees for the responses but we can look into that. Do you know by any chance what the timeout of Duet3 is? Note that on ODrive Pro we have much a more generous CPU budget so it should probably not be a problem there. |
Hi, I'm back, had a few days off.
It looks like it waits 200 ms. But given the names of the variables, it looks like it has used the wrong variable, so it should really have waited 1000 ms. I'll check if this will get a fix on the Duet side, and if the potential fix would solve the problem.
I'll try to reproduce the bug with your minimal config. |
I made an experiment with the CAN.
Results were:
I guess that's good enough for me. I'll make a PR on the Duet side and remove my CAN fixes from this PR. |
I ran the minimal config example posted above and the motor indeed entered closed loop control. |
I made an experiment regarding the closed loop state:
After this, the axis0 of the board booted into a working closed loop control as expected. |
On axis1, after the
The motor on axis1 wakes up after the reboot with Suspecting that
After this, the motor on axis1 also woke up in a working closed loop mode as expected. So after commit daa089c, the Given the name of the full calibration sequence, I'd consider this a bug. Luckily a very easy one to work around, so I'll revert my revert and let you decide instead. |
^That message has not appeared during any of my experiments today. |
Timeout fix on the Duet side, for reference: Duet3D/RepRapFirmware#568 |
I did a third experiment where I dropped the
The axis0 woke up and entered closed loop control as expected. So there's no bug. Just a new variable ( I was a bit surprised by this result since I thought I had done exactly this experiment before my vacation. I guess I must have made a typo last time. It's clearly working now at least. |
87633b8
to
58a6567
Compare
I removed the two unnecessary commits (CAN timing fix and enum commit revert) from this PR and force pushed. I built a new hex with only the three relevant commits included. I tested it on hardware, including jumping back and forth between pos mode and torque mode a few times and jogging a few rotations via step/dir. It works as expected. |
Looks like |
Yeah we probably should have the same behavior for all the protocols. I can't do those hw tests though, so I won't include in this PR. Feel free to add if you have the time (or if you dare to do it without testing on hw). In case we don't get any voluntary Fibre coders/testers, may we merge my PR anyways, and do the Fibre change when need arises? As I understand it, there's a chance that this firmware's development gets completely sunset before an explicit need arises anyways. |
Actually, I took 0.5.4 + these three commits out for a full scale print test today. Turns out, the motor wanders about quite a lot (many tens of degrees). It doesn't sit completely still when it's supposed to, and it doesn't return to the same point after jogging it back and forth. |
Motors are random-waliking about 20 degrees back and at a speed varying from 1 deg/sec to 0.00001 deg/sec. If I send movement commands, the errors get much worse much faster. Up to several hundred degrees+. I had a similar error before, when ``circular_setpoint_range` was 10000. I've since lowered it to 1000. Should I lower it even more? |
I did an experiment where I lowered |
I'm not sure I understand what I should put Any reason why I wouldn't put it all the way down to 2? (I tried 1, which didn't work) It seems to carry a trade-off between setpoint range size, and precision. A trade-off I don't really want to have to make. |
I tried lowering from 100 to 10. I wasn't able to detect any additional accuracy or precision when at 10 compared to when I was at |
Today the board is back to drifting those motors. -8 degrees now and still drifting. Gets worse if I step the motors. Also gets worse when motors are under load. That's probably the reason why the problem appeared solved yesterday. There were less pretension on the motors during the later (seemingly working) tests. |
Today I did another experiment to see if the drifting problem exists in 0.5.1 as well:
Result: No drift beyond an error margin of +-0.2 degrees. Since the 0.5.5 drift got worse when I moved the motors, I commanded the Hangprinter to move its effector around. This jogs the motors and varies their load. The result was still the same. No significant drift in 0.5.1. |
Here's a issue describing the drift bug, which has nothing to do with this PR: #688 |
Having shown that the drift issue is not related to this PR's commits I hope we can merge this PR soon. 😉 |
.. because circular_setpoints was broken beyond repair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, thank you. We just need to make the behavior consistent on the other (non-CAN) interfaces. See my other note.
if (mode == Controller::CONTROL_MODE_POSITION_CONTROL) { | ||
float estimate = 0.0f; | ||
if (axis.controller_.config_.circular_setpoints) { | ||
estimate = axis.encoder_.pos_circular_.any().value_or(0.0f); | ||
} else { | ||
estimate = axis.encoder_.pos_estimate_.any().value_or(0.0f); | ||
} | ||
|
||
axis.controller_.set_input_pos_and_steps(estimate); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reaction on writing the control mode is appropriate, but we need the behavior to be consistent when set from other sources than CAN. My suggestion is:
- Merge to a reusable function this snippet with the "// To avoid any transient on startup" section in
axis.cpp start_closed_loop_control
. You could call itset_setpoint_to_estimate
. - Add a
c_setter
tag to thecontrol_mode
endpoint inodrive-interface.yaml
, so this can get executed when we write it on USB or UART. - Add an entry in the changelog about this reaction on control mode change.
Defines and uses set_setpoint_to_estimate() reusable function Uses it in three places: - axis.cpp to avoid transient on startup - can_simple.cpp to avoid sudden movement upon mode change - odrive-interface.yaml just to make it executable via USB and UART Swaps set_input_pos() to set_input_pos_and_steps() so that we get the same behavior when changing input_pos over USB and UART as we get when we change it via CAN.
Thanks for the comments. I have no experience with editing odrive-interface.yaml and took a guess at how it might work. |
Hi, you didn't turn on allowing push access to maintainers of destination of the PR so I can't push updates. I made some changes and moved the PR to here: |
Hi thanks for reviewing, making the change and moving the PR. Didn't know that setting, sorry. |
... follow this pull request to see when stock ODriveFirmware can be used again. odriverobotics/ODrive#680
This PR was wholly superseded by #689. I suggest reverting the merge of this PR. |
Never mind my previous comment. Everything's fine, the right commits are on top. No action is needed. Sry for the back-and-forth. I'm super happy about Hangprinter fixes having been merged. Thanks. |
... because otherwise, the effect of the callback is canceled by an
assignment in the main update() loop, that calculates input_pos
based on steps_, if we're in step/dir mode.