Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for g-code compression/packing over serial connection #2955

Closed
wants to merge 245 commits into from

Conversation

scottmudge
Copy link
Contributor

@scottmudge scottmudge commented Jan 9, 2021

MeatPack - Easy, fast, effective g-code compression/packing

I've devised and implemented a g-code packing method which is very simple and has very little processing overhead on the MCU, but manages to reduce the size of g-code by a ratio of ~0.61. This is on-par with binary encoding, but it does not require any modifications to the parsing code, g-code files, or anything else. The MeatPack code/module is entirely self-contained, and it only required a few lines to be changed in cmdqueue.c to be integrated.

I'm not sure why the file-diff on github makes cmdqueue.c look so significantly modified (perhaps white-space?), but the changes are effectively limited to this (+ a closing bracket farther down):

image

This effectively increases the baud rate from 115,200 to 188,850, improving print quality for parts with very dense g-code.

I've also written a functional and tested OctoPrint plugin to work with this modification here:

https://github.com/scottmudge/OctoPrint-MeatPack/

The plugin repository better explains how the packing algorithm works. It's fairly simple and requires very little processing power, but leverages the limited nature of the character set used by most g-code.

The implementation with the firmware leaves the firmware functionally identical to how it already exists. The data packing mode is only enabled if a particular set of command bytes (well outside the ASCII range used by standard g-code) is sent to the MeatPack module by the associated OctoPrint plugin. Otherwise the firmware behaves exactly the same as it always does.

The plugin handles everything automatically. G-code files can remain completely unchanged. The plugin extends the standard PySerial Serial() object with additional functionality, packing whatever g-code data produced by OctoPrint right before it is sent to the firmware.

It also manages and synchronizes the MeatPack "state" of the firmware based on the active configuration of the plugin. If the user enables or disables g-code compression/packing, the plugin module will automatically send associated command bytes to the firmware, where the packing mode is dynamically enabled or disabled. The firmware also sends back confirmation of the enable/disable state. Overall the synchronization is very stable, and I have encountered no issues.

If the plugin is not installed or the user has disabled g-code compression in the plugin settings, the firmware will simply behave as it normally does.

How well has it been tested?

I've fully tested both firmware modifications and OctoPrint plugin. I've printed about 100-200 MB of various g-code (produced by both PrusaSlicer and Cura), which was effectively packed to a size of ~60 - 120 MB.

No errors or issues were encountered. And very dense g-code files which normally produced quality issues (zits, stuttering) when being printed at a high feed-rate over the serial connection no longer encountered the same quality issues when the data was packed with this module.

More Information:

Plugin Repository: https://github.com/scottmudge/OctoPrint-MeatPack
Reddit Post: https://www.reddit.com/r/prusa3d/comments/ksuywv/i_created_a_new_octoprint_plugin_and_firmware_mod/

@Cjkeenan
Copy link

Cjkeenan commented Jan 9, 2021

This would be a very cool feature and very nice to extend the life of OctoPrint users that are on a Zero W as they are especially prone to microstutters. How would this work in relation to #2657 ? They seem like a very good combo for super efficient and compressed GCode.

@scottmudge
Copy link
Contributor Author

scottmudge commented Jan 9, 2021

How would this work in relation to #2657 ? They seem like a very good combo for super efficient and compressed GCode.

Both would work together nicely, reducing g-code dramatically. MeatPack doesn't really care what's being sent -- so long as it uses the g-code character set, it will pack it just fine. I think ArcWelder would behave as the OctoPrint-equivalent to the "MeatPack-OctoPrint" plugin for #2657, but some firmwares/printers have a hard time linearizing or planning arc-based motion (floating point arithmetic can be quite expensive). The nice thing about MeatPack is that it only requires a few operations. One or two bit shifts and an access to a short lookup table.

Regardless, there's nothing incompatible between these two.

@wavexx
Copy link
Collaborator

wavexx commented Jan 11, 2021

@scottmudge did you try to submit this (even just as a proposal to marlin) to 1.1.x/2?
I love this in principle, but it would be a shame if this was ad-hoc just for the Prusa FW.

I'd also be curious to get some feedback from some marlin veterans to see if there's anything we could improve.

@DRracer
Copy link
Collaborator

DRracer commented Jan 12, 2021

@scottmudge that's a very interesting and simple idea, which could improve the throughput of serial printing - ~30-40% less bytes sent over serial can really make a difference.
I still haven't studied the code of yours, however several questions come to my mind:

  • what is the code size increase on the AVR and memory requirements?
  • how do you distinguish between "regular" G-code and the packed one?
  • do you know any "standard" compression algorithms which behave in a similar way?
  • any known limitations of your approach?


// State variables
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
volatile uint8_t mp_active = 0; // Defines if MeatPack is active
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these vars volatile? Are you planning to have them accessed from an ISR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99% of my embedded C experience has been with Cortex processors, and I rarely ever used the volatile keyword outside of its traditional uses. However, for the Arduino compiler, documentation noted that using "volatile" was an expected and frequently used way to force the variables out of registers, which Arduino had noted often behave unpredictably (due to compiler optimization? or architectural limitations of MCUs? it wasn't clear). Given the critical nature of command parsing, I didn't find "unpredictable behavior" tolerable, and found yielding data integrity to "luck of the compiler" was not an option.

If I misinterpreted the Arduino documentation, feel free to modify the variable declarations. The Arduino documentation was particularly terse, so perhaps to avoid confusion between ISR-related memory access and standard memory access, they simply phrased it as "registers are unpredictable", and assumed the performance cost wasn't a concern for most Arduino applications. I did find it very vague.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, after reading up more on this, the general online Arduino documentation was simply being overly vague for the sake of not confusing junior developers.

volatile is indeed treated as it normally is, so I've corrected the variable qualifiers. I've also removed the public interfaces to any methods which could have been potentially used in ISRs to modify internal variables. They weren't used anyway. Will commit shortly.

register uint8_t out = 0;

#ifdef USE_LOOKUP_TABLE
// If lower 4 bytes is 0b1111, the higher 4 are unused, and next char is full.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you do some comparison in terms of performance gain of either of these methods? Lookup table vs. direct computation?

Please note the AVR cannot shift by 4 -> these will be 4 separate instructions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I loaded up both methods in both Ghidra and IDA, and the disassembly was byte-for-byte identical.

The compiler optimizes part of the array away completely anyway, and figured out some math to compute an output byte. For the switch statement, it ends up substituting the case jumps for a memory access. In both cases, the end result looks to be faster than either explicit approach. The assembly indicates there is an array access, but not in the way intended by the c-code, and definitely not in the way described by conditional jumps.

The compiler also "mashed" together all of my separate mp_ methods into a single, large method. Overall the ability for the compiler to optimize was quite impressive, and indicated being too granular with c-level optimization is likely redundant or unnecessary.

@scottmudge
Copy link
Contributor Author

scottmudge commented Jan 12, 2021

  • what is the code size increase on the AVR and memory requirements?

It's minor, less than 1 KB static memory, and only a few bytes of dynamic memory. But I will get you specific byte count.

  • how do you distinguish between "regular" G-code and the packed one?

A state variable is maintained inside the unpacking module. By default unpacking is turned off until a control byte sequence is received to enable unpacking. There is nothing about the packed g-code (i.e., no bit mask or periodic indicator) that indicates it is packed, beyond the received bytes often lying outside of the standard ASCII range. The unpacker knows to unpack the data based on the value of the state variable.

A particular byte (0xFF) is used to indicate to the unpacker that a control command is expected. 0xFF has two purposes -- if received once in sequence while packing is enabled, the following two bytes are treated as full-width. If received once in sequence while packing is disabled, it is simply parsed as it normally is by the firmware (i.e., it is ignored, as it is not standard ASCII). But if it received twice consecutively at any point (i.e., regardless of the state of the packing enabled variable), the following byte received is treated as a command. Currently command bytes are very high value - ~248-254. This prevents standard g-code for being mistaken as commands.

The commands are parsed inside the unpacking module, and perform the following functions:

  • Set packing state variable to "enable"/1.
  • Set packing state variable to "disable"/0.
  • Toggle state packing variable from 1 to 0, or visa-versa.
  • Send the state of the packing variable back to the PC over the serial connection.

Synchronization of the packing state is done over the serial line with these command sequences. The firmware module sends the state of the packing enabled variable (along with any other MeatPack-specific information) over the outgoing serial connection with an "[MP]" header on the outgoing line. The OctoPrint module parses incoming messages, looking for this "[MP]" header. It then looks for "on" or "off" to determine the state of the the "packing enabled" variable inside the firmware.

In this way, the OctoPrint plugin can safely synchronize the packing state between the PC and the device firmware.

If the state is inconclusive (i.e., the time between when the OctoPrint plugin is initialized and when the "[MP] on/off" message is received, or the time between toggling the Enable/Disable parameter inside the plugin and also receiving the "[MP] on/off" message in return from the firmware to indicate the state has been synchronized), the OctoPrint plugin will buffer outgoing data and send it as soon as the state is safely synchronized.

In this way, there is no chance packed data is received while the firmware is expecting unpacked, and visa-versa.

I've had no issues with state synchronization with this strategy.

  • do you know any "standard" compression algorithms which behave in a similar way?

Most standard compression uses some form of dictionary, but a vast majority use pattern recognition rather than standard data packing. This means the dictionary is dynamic.

In theory the plugin could analyze the g-code file and perform histogram analysis on it, and then send a custom lookup table to the firmware, but in my experience this is unnecessary. The compression ratio is universally ~0.61, regardless of g-code file.

Because the library is dynamic with standard compression approaches, and because pattern recognition is so computationally expensive with these approaches, standard compression approaches would not be compatible with low-power AVR processors.

  • any known limitations of your approach?

Nothing directly related to this particular approach. The computational overhead is so minimal, it should not impact the timing of any other process.

In terms of it solving my original problem of stuttering while printing over the serial connection, serial packing has helped, but I still encounter stuttering at very high print-speeds for prints with very dense curvature. But at this point I do not think it is due to limitations with the serial connection. Instead it is due to limitations with the cmdqueue.c module... checksum validation, a (universal) lack of multithreading, etc. Reading from the SD card, regardless of the speed of the bus being used (SPI), is simply faster because it's one-way during printing, while serial is bi-directional.

I did find I could alleviate some of this stuttering by removing the MYSERIAL.checkRx() calls inside stepper.cpp. It looks like these calls were added into the Marlin code-base 9 years ago (https://git.notfound.dk/mikkel/marlinfw/commit/f75f426dfae5190d3e637b247030d3a244968c2a), and the reasoning for doing so doesn't really make sense anymore. OctoPrint wasn't really a thing back then, and interrupting stepper routines with serial comms when printing at high speeds appears to cause issues. Removing these calls also significantly reduced the static memory requirements of the firmware (over 1 KB). Given the definition of this method was inside MarlinSerial.h rather than the associated source file, it looked like the compiler was implicitly treating it as an inlined method, meaning the code was unnecessarily duplicated.

Limiting the cmdqueue.c serial processing loop to to a certain number of iterations based on the number of full-commands received also seems to help a bit. In this way, it prevents a continually full serial buffer (indicated by MYSERIAL.available()) from preventing the cmdqueue.c loop from yielding control back to the primary loop() function. Limiting the serial parsing loop inside cmdqueue.c to 2 or 3 full serial commands seems to be sufficient. Normally it would yield control back when the command queue gets full, but on my experimental branch I've also increased the command buffer size from 4 to 8. So adding a separate loop-break condition was necessary.

@scottmudge
Copy link
Contributor Author

scottmudge commented Jan 12, 2021

  • what is the code size increase on the AVR and memory requirements?

Here are the final stats. I've also moved the ENABLE_MEATPACK preprocessor directive to "Configuration.h". Where it probably should be.

WITHOUT MeatPack:

Sketch uses 234030 bytes (92%) of program storage space. Maximum is 253952 bytes.
Global variables use 6437 bytes of dynamic memory.

WITH MeatPack:

Sketch uses 234780 bytes (92%) of program storage space. Maximum is 253952 bytes.
Global variables use 6479 bytes of dynamic memory.

I should note this is the MK3S build w/ EN_ONLY language, but given MeatPack is unaffected by language, the byte-count is going to be the same across all build configurations.

- Removed unneeded 'volatile' qualifiers.
- Removed unused public interfaces.
- Moved ENABLE_MEATPACK to Configuration.h
- Corrected inlining with AVR-specific macro.
@DRracer
Copy link
Collaborator

DRracer commented Jan 14, 2021

@scottmudge I can see you really care about your proposal, that's great 👍 . Generally, the idea is simple and we find your implementation very good and considering it for testing now. The code size requirements are acceptable.

Another simple question - do you have some simple command line tool which would compress an existing G-code file? I know the algorithm is simple, but we'd like to perform some offline testing as well.

Thank you.

@ellensp
Copy link

ellensp commented Jan 14, 2021

I note your packing spaces. Why don't you just strip them out? I don't believe they are needed unless inside quotes.
Then you could add in 'Y'

@scottmudge
Copy link
Contributor Author

I note your packing spaces. Why don't you just strip them out? I don't believe they are needed unless inside quotes.
Then you could add in 'Y'

I may make that an extra option, as that would increase data throughput significantly more. The RS-274 standards indeed note spaces should be ignored, but there were cases of parsing not fully respecting these standards:

For instance:
MarlinFirmware/Marlin#2715

I'll add an optional dictionary shift and dictionary version command into the firmware to make this an optional change, just in case some firmwares encounter issues.

@scottmudge
Copy link
Contributor Author

Another simple question - do you have some simple command line tool which would compress an existing G-code file? I know the algorithm is simple, but we'd like to perform some offline testing as well.

This was how I first tested my implementation. I did write a python utility (converted to an executable w/ PyInstaller for easier global scripting) which packed g-code, but found I needed to make more modifications to the firmware to ensure that the file-seek behavior performed when selecting a gcode file from the SD card performs correctly.

Because of the explicit requirement for sequential unpacking, by jumping to some random byte in a packed stream, the unpacker must traverse the previous 2 bytes to ensure none of the bytes at the current index are full-width (and thus not misinterpreted).

I implemented this behavior into the firmware where the SD card file was read from a certain offset from the end (if it was longer than a pre-determined amount) to check for an M84 command. After these modifications, it could read packed g-code form the sd-card just fine.

However, I removed this code from the PR, as I figured it would be less desirable to modify a separate module in the code which would require further testing, and currently the sd-card interface does not present any transmission rate bottlenecks, like the serial port does.

But if you plan to simply analyze the way bytes are organized, I will provide you this commandline utility.

I should note (it may not be explained well in the original repo) that some of the characters in the byte stream are re-arranged to ensure no 4-bit window for packed data is wasted. I explain this here (scottmudge/OctoPrint-MeatPack#3 (comment)) but just in case you see some unexpected data organization in the gcode files, this is why.

- Refactored config sync output.
- Added support for whitespace omission.
- Removed debug output unless preprocessor directive is enabled.
wavexx and others added 28 commits January 27, 2023 14:55
TM: Check for PC parameters more carefully (3.12)
Add prusa.io/tm-cal link
and update all TM related translations
PFW-1479: MK3_3.12 Change Temp model to Thermal model
Use "echo:" for thermal model error reporting to avoid octoprint
automatically sending a M112 kill.

Keep using "error:" instead for other thermal errors (MAXTEMP/etc).

This should allow resuming a thermal mode pause with the default
octoprint settings.
- Expose TEMP_MODEL_fS and TEMP_MODEL_LAG as D and L respectively,
  initializing the default values based on the previous hard-coded
  values.
- Always round L to the effective sample lag to future-proof model
  upgrades or changes (the stored value _is_ the effective value).
- Introduce UV as a new linear term for PTC heaters, defaulting
  to an identity for model backward-compatibility.
Model UV as power-invariant, so that scaling P doesn't change the
intercept location (that is, the zero point remains at the same
temperature even for more powerful heaters).

NOTE: It's not clear to me whether this is generally true (couldn't
find a datasheet from the same heater in diffent power variants
showing the efficiency loss)
Save about 20 bytes by rewriting the sample count check
Bump up build number and FW version
Introduce a model version. This is initialized at "1" and doesn't
require any upgrade/downgrade checks since it's currently
retro-compatible.
Preparation to support multiple default model parameter sets
PFW-1499 TM: PTC model support (3.12)
…d_timeout

MK3_3.12: Set `Is filament loaded? Yes|No` default to `No` without a timeout.
@scottmudge
Copy link
Contributor Author

Updated PR from new branch equal w/ 3.12.2, cleaned up whitespace changes to make it easier to review.

#4067

@scottmudge scottmudge closed this Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet