New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBB Firmware: Add command for pre-computed stepper moves #73

Closed
oskay opened this Issue Jan 8, 2017 · 19 comments

Comments

Projects
None yet
4 participants
@oskay
Copy link
Contributor

oskay commented Jan 8, 2017

Right now there is a lower limit of SM stepper move duration, 16 ms, below which there may be a brief delay introduced between subsequent movements. This is partially due to the time required for computing the move.

Following on #71, perhaps there should be a similar function that offloads some of the planning to the computer. This function could be a little less "user friendly" -- for example phrased in terms of something like number of interrupt cycles per forward step, rather than total steps -- but would allow for making subsequent moves that are closer together.

Here is a proposed first-pass at such a command description:

LM – Low-level Move

  • Command: LM,duration,axis1,axis2<CR>
  • Response: OK<NL><CR>
  • Firmware versions: 2.5.0 and newer
  • Execution: Added to FIFO motion queue
  • Arguments:
    • duration is an integer in the range from 1 to 16777215 which gives the number of 25 kHz interrupt cycles that this movement takes to complete
    • axis1 and axis2 are integers, each in the range from -32767 to 32767, giving movement rates in interrupt cycles per step for the two motors.
  • Description:
    Use this command to make the motors draw a straight line at constant velocity, or to add a delay to the motion queue.
    (This is the same basic function as the (easier to use) SM command. However, LM is a "low-level" alternative, where the computation of step rate is offloaded to the computer, requiring less on-board computation time, and allowing for shorter total move times without delays between them.)
    The sign of axis1 and axis2 represent the direction each motor should turn.
    A value of 0 for either axis1 and axis2 means that the given motor will stay at a fixed position until the move is complete. If both axis1 and axis2 are zero, then a delay (of duration interrupt cycles) will be executed.
    The total duration of a move executed by the LM command is limited to 16777215/25000, or 671 seconds total. Moves longer than this should be broken into separate chunks where necessary, or use the SM command instead.
    The rate of each movement for each motor is limited within a certain range. At one end, a value of axis1 or axis2 equal to 1 gives one step at each interrupt, or 25,000 steps per second. At the other end, a value of a value of axis1 or axis2 equal to 32767 gives one step per 32767/25000 = 1.31 seconds.
  • Example: LM,25000,1000,0<CR> Execute a stepper move lasting for 1 second (25000 interrupt cycles), during which axis1 moves forward by one step every 1000 interrupt cycles, or 25 steps total. The second motor, axis2, does not move.
  • Example: LM,1000,-750,2000<CR> Execute a stepper move lasting for 40 ms (1000 interrupt cycles), during which axis1 moves backward by one step every 750 interrupt cycles, or 1 step total. The second motor, axis2, does not move, since the number of interrupt cycles never reaches as high as 2000.
  • Example: LM,12500,5,-7<CR> Execute a stepper move lasting for 500 ms (12500 interrupt cycles), during which axis1 moves forward by one step every 5 interrupt cycles, or 5000 times per second, 2500 steps total. The second motor, axis2, moves backwards by one step every 7 interrupt cycles, or about 3571.4 times per second. It moves 1785 steps while executing this command.

Again, this is a first pass approach to the design. Comments and changes are welcome.

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 9, 2017

Well, it's a bite more 'raw' than that unfortunately.

What we need to send the EBB are six values, three for each axis.

They are:
StepAdd : 32 bit unsigned value. Described below. This is the crazy one.
StepsCounter : 24 bit signed value - the number of steps for this axis for this move
StepsAddInc : 32 bit signed value - for now, set to zero (more later)

The only other piece of information that the EBB needs is the direction bits, which it can compute based on the sign of the StepsCounter value for each axis.

So, here's what StepAdd is: Internally there is a 32 bit accumulator. Every 40uS (25KHz), StepAdd is added to this accumulator. When the MSb of the accumulator gets set, 0x80000000 is subtracted from the accumulator and a step is taken. The accumulator is cleared on the beginning of each move. Each time a step is taken StepsCounter is decremented, and when it reaches zero on both axes, the move is complete.

The math you can use to compute StepAdd for a given move is as follows:
(Assume StepCounter is positive positive and Duration is in milliseconds)

StepAdd = (StepCounter << 31)/(25 * Duration)
You want that to either be done in floating point, or as a 64 bit division. Not so bad on PC.

The other value - StepAddInc - is what gets added to StepAdd each 40uS, and this is how we implement acceleration/deceleration during a move. Simply set to zero for no accel/decl.

So the command I envision would be something like:
LM,<StepAdd1>,<StepsCounter1>,<StepAddInc1>,<StepAdd2>,<StepsCounter2>,<StepAddInc2>

I'm going to quick prototype this, and do some timing tests, to see how much time this saves us. My guess right now is that we'll be able to get pretty close to 1 or 2 ms per command if we do this. If so, that would be really cool.

@oskay

This comment has been minimized.

Copy link
Contributor

oskay commented Jan 9, 2017

I would infer that we might be able to use this as a stepping stone an improved version of the AM command, too.

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 9, 2017

Oh, and I should add that no limit checking will be done on the values for the LM command. This means that you should be careful not to set either StepsCounter to zero, StepsCounters should be less than 0xFFFFFF, StepAdds should be less than or equal to 0x80000000.

And yes! This is, in fact, everything necessary to do accel/decel. The AM command is simply the front end math that computes these three values for each axis. So with this new LM command, you can do full accel/decel by using the PC.

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 9, 2017

Hmm. Well, initial tests are not looking good. I clearly don't understand exactly what takes a lot of time in this system. I think I'm doing something wrong. With the LM command, it takes about 17ms to process the command, so that's our shortest move. Hmmm.

I need to dig into this much further before I can figure out what's taking so long. Unfortunately I'm going to be tied up this week, but I'll try to get back to it next weekend.

@ShelMi

This comment has been minimized.

Copy link

ShelMi commented Jan 9, 2017

Another country heard from - and please let me know if you'd like me to butt out. 8^)

As I understand the current situation ( running version 2.4.5 ), we expect a pause of ~15-20 msec. between each segment of a line. The thing is, I can't seem to actually observe that pause in practice.

Here's my test setup, using eggbot:

  1. I set down-speed to 200 steps/sec.
  2. I prepare a horizontal line comprising 1 segment of length 10000 steps.
  3. I expect a plot to take 10000 steps / 200 steps/sec = 50 seconds to complete.
  4. Actual plot time is observed to be 50 sec., good!
    BUT
  5. I prepare a horizontal line comprising 2000 connected segments each 5 horizontal steps each. I observe the time required for the line to be plotted, at a down-speed of 200 steps/sec.
  6. I expect plot time to be ( 5 steps/seg * 2000 segs / 200 steps/sec ) + 0.015 sec/seg * 2000 segs = 50 + 30 = 80 sec.
  7. Instead, I observe that plot time is still only 50 sec.!!

Have I misunderstood the phenomenon under discussion, or is there something else going on?

@oskay

This comment has been minimized.

Copy link
Contributor

oskay commented Jan 9, 2017

The issue is that very short moves (moves less than 16 ms, with 2.4.6) may have gaps between them. 2000 segments in 50 s is 25 ms per segment; you should not see an issue there.

If you make that 4000 segments, on the other hand....

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 9, 2017

@ShelMi

This comment has been minimized.

Copy link

ShelMi commented Jan 9, 2017

OK, thanks, I should have known better - over and out! 8^)

@oskay

This comment has been minimized.

Copy link
Contributor

oskay commented Jan 9, 2017

These are pretty esoteric corner cases, which is the only reason that they haven't been nailed down long ago.

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 14, 2017

OK, great news. I'm stupid, and that's the reason it looked like 15ms or 17ms was the fastest that we could execute moves.

Bottom line - in reality, all versions of EBB firmware (including these latest couple) can run at 3ms move times all day long with no gaps (as long as you have a dedicated USB host port to the EBB - thus no bus contention). And even 2ms moves are very close to being gapless. This is true with SM, XM, or LM commands (it turns out the math doesn't really add much to our command processing overhead).

Explanation:

My big error was in simply copy/pasting large blocks of command text into a terminal emulator. The terminal emulator takes each character of a copy/paste, and puts it in its own USB packet. This makes sense when you are just typing characters into a terminal emulator - you want each character to be transmitting across the USB as soon as you type it. But because USB has a 1KHz maximum packet rate, that means that a 28 byte command (like "LM,85899346,2,0,85899346,2,0") takes a really long time to send. (It turns out that the terminal emulator does sometimes put more than one byte of data into each USB packet, thus it would take about 17ms to send that command, not 28)

The solution was simply to save a long list of commands as a text file, and send the whole file to the EBB. This is then sent as large (32KB maybe?) chunks to the USB host code on the PC, which can then very efficiently make each USB packet have as many bytes as possible. maximizing the available bandwidth.

Lesson To Be Learned : always make sure your custom PC application code is putting as many bytes (or commands) together as it can as a single 'write' command to the lower level USB stack.

Now, here's some timing data to back up the above conclusions:

For each of the following logic analyzer captures-

  • 'Step1' is the pulse output to the axis 1 stepper motor driver
  • 'Step2' is the pulse output to the axis 2 stepper motor driver
  • 'CmdProce' goes high when there is USB data waiting to process into a command
  • 'LM cmd' goes high when the LM command function is running
  • 'Move Ready' goes high when the LM command has all of it's data converted and is ready to place data into the FIFO
  • 'FIFO Wait' goes high while we are waiting for the FIFO to empty before we stuff the next command into it
  • 'StepsISR' goes high during the part of each 25KHz ISR where we check to see if we should take a step
  • '25KHz ISR' goes high during each 25KHz ISR

lm_2ms_wide
This image shows a zoomed out view of multiple LM commands

LM,85899346,2,0,85899346,2,0
LM,85899346,2,0,85899346,1,0

repeated over and over sent as a file. Each of these commands takes 2ms for the step movement to complete. As you can see if you look closely at the StepISR signal, there are gaps. This indicates times when the stepping of a command is complete, but the next command isn't ready yet to start stepping, and thus there are little gaps (the longest ones are only about 750uS long).

lm_2ms
Here is a zoomed in view of the 2ms LM commands. Again, the gaps in the StepISR are visible, as are unequal distances between step pulses (when there are gaps in StepISR).

lm_3ms_wide
Here is the same thing, but with 3ms LM commands

LM,85899346,3,0,85899346,3,0
LM,85899346,3,0,85899346,1,0

What you should notice here is that there are no gaps. Even if I let this run for 10s of seconds, there are zero gaps in the StepISR trace, which indicates that the next command is always ready before the previous one finishes (just barely in many cases).

The other good sign here is that, after a couple commands, the FIFO Wait signal is going high for significant periods. That indicates that our next command is ready before the previous one finished, and so we have to wait for the FIFO to empty before we can send the next command into it.

lm_3ms
This is a zoomed in view of the 3ms LM commands. No StepISR gaps means that we can support this speed of command without gaps in the step pulses.

Fun facts:

When no step command is running, our ISR takes about 15% of our CPU time. When a step command is executing, the ISR takes 43% of our CPU time.

It takes about 1ms for the LM command to read in its parameters and prepare them to be written to the FIFO.

The traces for the SM and XM commands are virtually identical to the above traces - we have no trouble supporting constant 3ms commands with either of those commands.

So - what's next?

Well, the LM command is working as intended. I'm going to release this version with the new LM command. The original use for this command isn't really relevant anymore (since SM and XM can do 3ms gapless commands as well), but somebody could use it to do their own accel/decel if they wanted to.

@oskay

This comment has been minimized.

Copy link
Contributor

oskay commented Jan 14, 2017

The solution was simply to save a long list of commands as a text file, and send the whole file to the EBB.

Can you explain how this can work with the "one command deep" FIFO? We have, thus far, been sending one command at a time.

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 14, 2017

Absolutely. There are many different layers of stack that the data passes through to get from your PC app to the EBB.

Let's say you perform a single write() call (or whatever the equivalent is in your chosen PC language) with 32KB of serial data (commands) to go to the EBB. First it goes through the serial stack layer, then into the USB stack layer. There, the USB stack code, in conjunction with the USB host controller chip on your PC, will take that data and add it to an outgoing data buffer (which is probably much larger than 32KB). Then, each time the host controller chip asks the EBB USB hardware peripheral "can you accept any new data from me right now" and the EBB answers back "yes", the host controller will send down one packet's worth of USB data. That then gets stored in a RAM buffer in the EBB by the USB peripheral. Once that data is sitting in the RAM in the EBB, the EBB will then answer back "no" when the host asks if it's OK to send more data. (The flow control.)

Then the processor on the EBB gets around to noticing that there is some new USB data sitting in RAM. It then goes through that data, byte by byte, seeing where the / occurs, and when found, sends a single command string to the proper command parser function (parse_LM() in this case). Once that data has been pulled out of the USB RAM buffer, the USB peripheral starts saying "yes" to more data from the PC.

So you can send a command down, and it can get processed and get put in the FIFO and then get pulled out of the FIFO and then the motors start to move. Then you send down another command and it gets processed and gets put into the FIFO, but since the first command isn't done yet, the second command sits in the FIFO. Then you send down a third command, and it gets processed, but the parse_LM() function has to sit and wait until the FIFO is empty before it can place the 3rd command into the FIFO. This prevents other code from running, which prevents the USB peripheral on the EBB from answering back 'yes' to more data from the PC. In that case, the PC simply just keeps asking until it gets a 'yes' before sending more data.

So even though you're sending one command at a time, after the first couple commands (which each fill a different place in this overly complicated pipe), they start to fill the host's USB buffer up. At that point, your write() call will block (if its of a blocking type) in the PC app because the USB device (EBB) Isn't accepting more data, and so the host USB stack doesn't allow more data to be put into its buffer.

Bottom line - if, instead of sending one big file, I had sent individual files which just contained a single command each (in essence like you are doing in your app), everything still would have worked fine. This would have sent a full command in each write() call, and the efficiencies gained there would have allowed things to run smoothly. The problem I had was sending one character at a time.

@oskay

This comment has been minimized.

Copy link
Contributor

oskay commented Jan 14, 2017

Got it.

Can you please write a few example commands for the LM command, showing how it can be used for constant-velocity movements as well as accelerated/decelerated moves?

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 14, 2017

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 16, 2017

I'm in the process of writing up some halfway decent documentation for the LM command, which includes acceleration examples. I'm also checking each one with a real EBB and my logic analyzer, and I'm seeing some things that I don't quite understand yet. One is an error in the 25KHz ISR rate - it's off by .5%, when it should be a maximum of .15% (max internal oscillator error), and I believe I found the problem there.

The other issue has to do with the way to calculate the accel/decel values. Once I have this all sorted out, I'll get the examples up.

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Jan 18, 2017

OK, there's a new version (2.5.1) up now. While developing the examples and testing them, I found that there was a bug in 2.5.0 that didn't allow negative values for StepAddInc parameters to the LM command. This is now fixed in 2.5.1. Also the documentation for LM is up and it has equations and everything.

@oskay

This comment has been minimized.

Copy link
Contributor

oskay commented Jan 18, 2017

Fantastic. I think we're done here, with the exception of getting the Mac version of the firmware updater up.

@oskay oskay closed this Jan 18, 2017

@nornagon

This comment has been minimized.

Copy link

nornagon commented Aug 31, 2017

I wrote a tool that uses this command on the AxiDraw: nornagon/saxi. It works great :)

@EmbeddedMan

This comment has been minimized.

Copy link
Contributor

EmbeddedMan commented Aug 31, 2017

Woah. That's really cool! I'm glad the LM command is getting a good workout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment