Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors system describe json generation to reduce size, removes invalid factory modules from it #2347

Merged
merged 13 commits into from Aug 24, 2021

Conversation

avtolstoy
Copy link
Member

Problem

  1. If factory module (whether valid or seemingly valid) is flashed onto an Electron and it is running < Device OS 3.0, such module configuration will overflow the maximum supported CoAP message size and will cause an assertion failure, which causes a reset.
  2. There is no automated way of catching system describe overflows under such conditions

Solution

  1. Refactor system_module_info() to use JSONWriter, which should save few valuable bytes and also make generation a lot cleaner. Factory modules that are not valid are also no longer included in the describe.
  2. Add a set of automated tests to wiring/no_fixture that put valid and seemingly valid user part binaries into the factory location and perform cloud connection. If overflow occurs, we should be able to easily catch it during the release process as the device will reset due to assertion failure and the tests will fail.
  3. Add unit tests for system_module_info() to make sure that the new implementation is comparable to and old one and works as intended
  4. system_module_info() and other functions that deal with describe generation or system module info are decoupled into a separate source file system_info.cpp
  5. In order for the on-device tests to be able to write binaries into factory location a new minimal storage hal was added and exposed through dynalib
  6. (Unrelated) build/create_module.py script updated to generate product binaries
  7. Some other bugs resolved in CRC and suffix access with modules located in non-standard locations (e.g. factory)
  8. wiring/no_fixture tests overflow on Electron, so some cellular tests have been moved out into wiring/no_fixture_cellular
  9. Some other minor refactoring to achieve everything above

Steps to Test

  1. Run unit tests
  2. Run wiring/no_fixture and specifically SYSTEM_06 and SYSTEM_07 tests
  3. Run the same tests with these changes rebased onto 2.x branch (I'll push ready-to-use one later)

Example App

N/A

References

  • [CH84019]

Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

char tmp[4] = {};
snprintf(tmp, sizeof(tmp), "%u", (unsigned)module_index(info));
// FIXME: is this really supposed to be a string?
// json.name("n").value(module_index(info));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergeuz Can we make this an integer instead of a string? Does DS care about this? CLI seems to take it okay. As of right now the describe with factory module on an Electron takes 800 bytes which is exactly MAX_CONTENT_LEN on 2.x, with this turned into an int we end up with 776 instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(some notes from the discussion)
Looks like we'd better not, so instead when backporting to 2.x, we'll increase MBEDTLS_SSL_MAX_CONTENT_LEN and PROTOCOL_BUFFER_SIZE to 900 on Electron specifically.

info->modules[i].info.module_index = 1;
hybrid_module_found = true;
}
#endif // HYBRID_BUILD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 Not in this PR, but we should remove everything related to hybrid builds from Device OS altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big thumbs up.

@@ -56,7 +56,7 @@ int get_module_crc_suffix(const module_bounds_t* bounds, const module_info_t* in
// This should be done in fetch_module().
return SYSTEM_ERROR_NOT_SUPPORTED;
} else {
return FLASH_ModuleCrcSuffix(crc, suffix, bounds->location == MODULE_BOUNDS_LOC_INTERNAL_FLASH ? FLASH_INTERNAL : FLASH_SERIAL, (uint32_t)info->module_end_address);
return FLASH_ModuleCrcSuffix(crc, suffix, bounds->location == MODULE_BOUNDS_LOC_INTERNAL_FLASH ? FLASH_INTERNAL : FLASH_SERIAL, bounds->start_address + module_length(info));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

#endif // __cplusplus

typedef enum hal_storage_id {
HAL_STORAGE_ID_DEFAULT = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to HAL_STORAGE_ID_INVALID as the actual implementation in storage_hal.cpp has no notion of the default storage device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

return SYSTEM_ERROR_NOT_FOUND;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feels like this function should either return an error if the address is not aligned to the sector size and the destination storage device requires that, or adjust the address and number of sectors accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use flash_common.cpp, no longer a concern.

/* Data received are Word multiple */
FLASH_Status status = FLASH_COMPLETE;
for (size_t pos = 0; pos < size && status == FLASH_COMPLETE; pos += sizeof(uint32_t)) {
uint32_t tmp = 0xffffff;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to initialize this with the current contents of the flash at the destination address. Also, does STM32 allow unaligned writes? I'm wondering if this is what is causing the issue with the in place modification of the OTA binary on Gen 2 in my OTA test that is based on this branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right on the unaligned writes. As this was only for tests I was mainly focusing on whatever is required there in the context of this PR, but since you are running into issues in your tests, I'll quickly refactor to use existing functionality in flash_common.cpp for Gen 3 which tackles alignment.

Copy link
Member

@sergeuz sergeuz Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For visibility: the issue in my tests was caused by a bug in this code that uses 0xffffff instead 0xffffffff to initialize tmp, not by an unaligned write. Big thumbs up for still refactoring the code to reuse the functionality from flash_common.cpp 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use flash_common.cpp

va_start(args, fmt);
const auto r = vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
if (r > (int)sizeof(buf)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= should be used here as the output is always null-terminated. In fact, this method has a default implementation in JSONWriter which is good enough for most cases including this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
private:
bool state_;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍

json.name("u").value(bytes2hexbuf(module->suffix.sha, sizeof(module->suffix.sha), buf), sizeof(buf));
}
// FIXME?
if (info) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info can no longer be null in this code. I'd make it a const reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

json.name("f").value(module_function_string(module_function(info)));
char tmp[4] = {};
snprintf(tmp, sizeof(tmp), "%u", (unsigned)module_index(info));
// FIXME: is this really supposed to be a string?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, yes. It's worth mentioning this in the comment to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note comment.

int system_info_get(hal_system_info_t* info, uint32_t flags, void* reserved) {
CHECK_TRUE(info, SYSTEM_ERROR_INVALID_ARGUMENT);
return HAL_System_Info(info, true, nullptr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one important suggestion: let's clearly mark system_info_get() and system_info_free() as unstable APIs that should not be used in customer applications, perhaps even add an _unstable suffix to both of them, and remove all precaution comments added in this PR about the fact that certain structures that used to be internal can't be easily changed anymore. Making this change in the scope of this PR would be too much, but we should really have a proper system wrapper for HAL_System_Info() that would dynamically allocate all the nested structures of hal_system_info_t so that we could extend any of them individually in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's add a comment and add _unstable suffix 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed with _unstable suffix.

@avtolstoy avtolstoy force-pushed the feature/refactor-system-info-generation branch from 0d2538c to f3f7ca4 Compare August 24, 2021 17:48
@avtolstoy avtolstoy merged commit fcc76ac into develop Aug 24, 2021
@avtolstoy avtolstoy deleted the feature/refactor-system-info-generation branch August 24, 2021 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants