Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow task pin to core, helper for spi_flash_* operations in task without set core affinity #258

Closed
wants to merge 1 commit into from
Closed

Allow task pin to core, helper for spi_flash_* operations in task without set core affinity #258

wants to merge 1 commit into from

Conversation

nathanjel
Copy link
Contributor

@nathanjel nathanjel commented Jan 17, 2017

As I was experimenting with esp-idf in mutlicore mode, I have noticed that certain issues are more/less likely to occur when the task affinity is being set to a specific core, rather to be allowing task run on any core. As I did not want my application tasks to be permanently tied to specific core, due to huge workload on the system, I have created this small component that allows to pin a running task to specific core, and macros to enter/exit a block during which task is guaranteed to be pinned to specific core.

This adds a menuconfig option to enable/disable these functions.

This was a must to be able to run the spi_flash functions frequently in a task with no core affinity, as these functions are likely to fail otherwise (they have a check that aborts if they started and stopped on different cores). I got such issues prior to implementing this.

Example:

  1. Include, considering this might not be available at all in given esp-idf clone
#if CONFIG_PINCORE_MODULE_ENABLED
#include "pincore.h"
#else
#define PINCORE_START_CORE_PINNED_SECTION() \
	do {
#define PINCORE_STOP_CORE_PINNED_SECTION() \
	} while(0)
#define PINCORE_RETURN_FROM_SECTION(x) \
	return x
#define PINCORE_EXIT_FROM_SECTION() \
	return
#endif
  1. Actual usage in module
	PINCORE_START_CORE_PINNED_SECTION();
	if (mptrs[fd].partition_ptr == NULL) {
		r = spi_flash_erase_range(file_start_address_page_start, FFS_ESP_FLASH_WRITE_BOUNDARY);
	} else {
		r = esp_partition_erase_range(mptrs[fd].partition_ptr,
		                              file_start_address_page_start, FFS_ESP_FLASH_WRITE_BOUNDARY);
	}
	PINCORE_STOP_CORE_PINNED_SECTION();
	FILE_CHECK_IF_SPI_OK(r)
	// write
	ESP_LOGV("FFS", "Writing %x bytes at %x%s%s",
	         FFS_ESP_FLASH_WRITE_BOUNDARY, file_start_address_page_start,
	         (mptrs[fd].partition_ptr == NULL) ? "" : " partition ",
	         ffs_file_metadata[fd].plabel);
	PINCORE_START_CORE_PINNED_SECTION();
	if (mptrs[fd].partition_ptr == NULL) {
		r = spi_flash_write(file_start_address_page_start, tmp, FFS_ESP_FLASH_WRITE_BOUNDARY);
	} else {
		r = esp_partition_write(mptrs[fd].partition_ptr,
		                        file_start_address_page_start, tmp, FFS_ESP_FLASH_WRITE_BOUNDARY);
	}
	PINCORE_STOP_CORE_PINNED_SECTION();

@igrr
Copy link
Member

igrr commented Jan 18, 2017

spi_flash routines suspend the scheduler for the duration of the flash operation, so they can be used in non-pinned tasks.
It appears that you are encountering a problem in theis case though; but this PR helps mask this problem by pinning tasks for the duration of flash operation. I feel it would be more useful to get down to the root cause of this.

@nathanjel
Copy link
Contributor Author

Sorry, I just realized I did not add the docs to this component. Please comment if that is anywhere close to esp-idf team interest, and I'll be happy to update the submission.

@nathanjel
Copy link
Contributor Author

Hi igrr,

This is my code that uses flash (esp-idf-compatibile vfs over spi_flash_* functions) - https://github.com/nathanjel/ffs

If I disable the commited module, which basically empties the macros, i will get my project to do this boo-boo (just tested)

  1. This is repeatable
abort() was called at PC 0x40081192
Guru Meditation Error: Core  1 panic'ed (abort)

Backtrace: 0x40085661:0x3ffcde00 0x40080fb6:0x3ffcde20 0x40082603:0x3ffcde40 0x40081e9c:0x3ffcde60 0x400f7b9a:0x3ffcdeb0 0x40101cce:0x3ffcded0 0x400f8530:0x3ffcdf00 0x4000bd86:0x3ffcdf20 0x40001180:00
  1. This is repeatable
nathan@ldev:~/xnode/SNESP32$ addr2line -e ./build/stripnode.elf 
0x40081192
/home/nathan/xnode/esp-idf/components/esp32/./intr_alloc.c:632
  1. This comes a bit random
xTaskResumeAll () at /home/nathan/xnode/esp-idf/components/freertos/./tasks.c:2204
2204	}
(gdb) bt
#0  xTaskResumeAll () at /home/nathan/xnode/esp-idf/components/freertos/./tasks.c:2204
#1  0x40080fb6 in ipc_task (arg=0x1) at /home/nathan/xnode/esp-idf/components/esp32/./ipc.c:63
#2  0x40082603 in spi_flash_enable_interrupts_caches_and_other_cpu () at /home/nathan/xnode/esp-idf/components/spi_flash/./cache_utils.c:141
#3  0x40081e9c in spi_flash_guard_end () at /home/nathan/xnode/esp-idf/components/spi_flash/./flash_ops.c:115
#4  spi_flash_write (dst=3215360, srcv=0x3fff0418, size=<optimized out>) at /home/nathan/xnode/esp-idf/components/spi_flash/./flash_ops.c:212
#5  0x400f7b9a in esp_partition_write (partition=0x3ffbb1d4, dst_offset=<optimized out>, src=0x3fff0418, size=4096) at /home/nathan/xnode/esp-idf/components/spi_flash/./partition.c:247
#6  0x40101cce in ffs_interface_write (fd=1, data=<optimized out>, size=<optimized out>) at /home/nathan/xnode/esp-idf/components/ffs/./ffs.c:232
#7  0x400f8530 in esp_vfs_write (r=<optimized out>, fd=<optimized out>, data=0x3ffb23b8 <MN_Configuration>, size=640) at /home/nathan/xnode/esp-idf/components/vfs/./vfs.c:214
#8  0x4000bd86 in ?? ()
#9  0x40001180 in ?? ()
#10 0x40058af4 in ?? ()
#11 0x401090e0 in _fwrite_r (ptr=0x3ffce150, buf=<optimized out>, size=700, count=1, fp=0x3ffce470) at ../../../.././newlib/libc/stdio/fwrite.c:170
#12 0x4010912c in fwrite (buf=0x3ffb23b8 <MN_Configuration>, size=700, count=1, fp=0x3ffce470) at ../../../.././newlib/libc/stdio/fwrite.c:211
#13 0x400fb862 in configuration_save () at /home/nathan/xnode/SNESP32/main/./status.c:170
#14 0x400fbf44 in StatusTask (pvParameters=<optimized out>) at /home/nathan/xnode/SNESP32/main/./status.c:420
(gdb) 

  1. or this
0x40008155 in ?? ()
(gdb) bt
#0  0x40008155 in ?? ()
#1  0x4008205e in spi_flash_read (src=<optimized out>, dstv=0x3ffb5fc0 <xYieldPending>, size=<optimized out>) at /home/nathan/xnode/esp-idf/components/spi_flash/./flash_ops.c:343
#2  0x40082603 in spi_flash_enable_interrupts_caches_and_other_cpu () at /home/nathan/xnode/esp-idf/components/spi_flash/./cache_utils.c:141
#3  0x400820d4 in spi_flash_guard_end () at /home/nathan/xnode/esp-idf/components/spi_flash/./flash_ops.c:115
#4  spi_flash_read (src=<optimized out>, dstv=<optimized out>, size=<optimized out>) at /home/nathan/xnode/esp-idf/components/spi_flash/./flash_ops.c:377
#5  0x400f7c92 in esp_partition_read (partition=0x3ffbb1d4, src_offset=4096, dst=0x3fff02f4, size=4096) at /home/nathan/xnode/esp-idf/components/spi_flash/./partition.c:211
#6  0x40101b6a in ffs_interface_write (fd=1, data=0x3ffb23b8 <MN_Configuration>, size=<optimized out>) at /home/nathan/xnode/esp-idf/components/ffs/./ffs.c:193
#7  0x400f8530 in esp_vfs_write (r=<optimized out>, fd=<optimized out>, data=0x3ffb23b8 <MN_Configuration>, size=640) at /home/nathan/xnode/esp-idf/components/vfs/./vfs.c:214
#8  0x4000bd86 in ?? ()
#9  0x40001180 in ?? ()
#10 0x40058af4 in ?? ()
#11 0x401090e0 in _fwrite_r (ptr=0x3ffce154, buf=<optimized out>, size=700, count=1, fp=0x3ffce474) at ../../../.././newlib/libc/stdio/fwrite.c:170
#12 0x4010912c in fwrite (buf=0x3ffb23b8 <MN_Configuration>, size=700, count=1, fp=0x3ffce474) at ../../../.././newlib/libc/stdio/fwrite.c:211
#13 0x400fb862 in configuration_save () at /home/nathan/xnode/SNESP32/main/./status.c:170
#14 0x400fbf44 in StatusTask (pvParameters=<optimized out>) at /home/nathan/xnode/SNESP32/main/./status.c:420
(gdb) 

@igrr
Copy link
Member

igrr commented Jan 18, 2017

I have a fix for the underlying issue, this fix doesn't require pinning the task during flash operations. It will land in master branch once code review is finished.

@nathanjel
Copy link
Contributor Author

Could You share the draft? I'm tight on timelines.

@igrr
Copy link
Member

igrr commented Jan 18, 2017

@nathanjel
Copy link
Contributor Author

Thank You very much. This helps. It seems I can drop my pull request now.

@nathanjel nathanjel closed this Jan 18, 2017
igrr added a commit that referenced this pull request Jan 19, 2017
spi_flash_enable_interrupts_caches_and_other_cpu function used to enable
non-IRAM interrupts after giving up flash operation lock, which would
cause problems if another task was waiting on the lock to start a flash
operation. In fact, non-IRAM interrupts should be re-enabled before the
task scheduler is resumed. Otherwise non-pinned task can be moved to the
other CPU due to preemption, causing esp_intr_noniram_enable to be
called on the other CPU, causing an abort to be triggered.

Fixes the issue reported in
#258
igrr added a commit that referenced this pull request Jan 19, 2017
fixes for issues observed when using spi_flash

This MR fixes three unrelated issues:

- Race condition in spi_flash_enable_interrupts_caches_and_other_cpu
  when operations on unpinned tasks are performed.
  The issue is reported in #258

- esp_intr_noniram_disable doesn’t disable interrupts when compiled in
  release mode. This issue manifested itself with an illegal instruction
  exception when task WDT ISR was called at the time when flash was
  disabled.
  Fixes #263.

- Tick hooks on CPU1 were not called if CPU0 scheduler was disabled for
  significant amount of time (which could happen when doing flash erase).
  The issue manifested itself as “INT WDT timeout on core 1” error.
  Fixes #219.

See merge request !441
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants