New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: mtk_bmt: refactor to avoid deep recursion #12494
Closed
kempniu
wants to merge
1
commit into
openwrt:master
from
kempniu:mtk_bmt-refactor-to-avoid-deep-recursion
Closed
kernel: mtk_bmt: refactor to avoid deep recursion #12494
kempniu
wants to merge
1
commit into
openwrt:master
from
kempniu:mtk_bmt-refactor-to-avoid-deep-recursion
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A Linksys E8450 (mt7622) device running current master has recently started crashing: [ 0.562900] mtk-ecc 1100e000.ecc: probed [ 0.570254] spi-nand spi2.0: Fidelix SPI NAND was found. [ 0.575576] spi-nand spi2.0: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64 [ 0.583780] mtk-snand 1100d000.spi: ECC strength: 4 bits per 512 bytes [ 0.682930] Insufficient stack space to handle exception! [ 0.682939] ESR: 0x0000000096000047 -- DABT (current EL) [ 0.682946] FAR: 0xffffffc008c47fe0 [ 0.682948] Task stack: [0xffffffc008c48000..0xffffffc008c4c000] [ 0.682951] IRQ stack: [0xffffffc008008000..0xffffffc00800c000] [ 0.682954] Overflow stack: [0xffffff801feb00a0..0xffffff801feb10a0] [ 0.682959] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G S 5.15.107 #0 [ 0.682966] Hardware name: Linksys E8450 (DT) [ 0.682969] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.682975] pc : dequeue_entity+0x0/0x250 [ 0.682988] lr : dequeue_task_fair+0x98/0x290 [ 0.682992] sp : ffffffc008c48030 [ 0.682994] x29: ffffffc008c48030 x28: 0000000000000001 x27: ffffff801feb6380 [ 0.683004] x26: 0000000000000001 x25: ffffff801feb6300 x24: ffffff8000068000 [ 0.683011] x23: 0000000000000001 x22: 0000000000000009 x21: 0000000000000000 [ 0.683017] x20: ffffff801feb6380 x19: ffffff8000068080 x18: 0000000017a740a6 [ 0.683024] x17: ffffffc008bae748 x16: ffffffc008bae6d8 x15: ffffffffffffffff [ 0.683031] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000f00000101 [ 0.683038] x11: 0000000000000449 x10: 0000000000000127 x9 : 0000000000000000 [ 0.683044] x8 : 0000000000000125 x7 : 0000000000116da1 x6 : 0000000000116da1 [ 0.683051] x5 : 00000000001165a1 x4 : ffffff801feb6e00 x3 : 0000000000000000 [ 0.683058] x2 : 0000000000000009 x1 : ffffff8000068080 x0 : ffffff801feb6380 [ 0.683066] Kernel panic - not syncing: kernel stack overflow [ 0.683069] SMP: stopping secondary CPUs [ 1.648361] SMP: failed to stop secondary CPUs 0-1 [ 1.648366] Kernel Offset: disabled [ 1.648368] CPU features: 0x00003000,00000802 [ 1.648372] Memory Limit: none Several factors contributed to this issue: 1. The mtk_bmt driver recursively calls its scan_bmt() helper function during device initialization, while looking for a valid block mapping table (BMT). 2. Commit fa4dc86 ("kernel: backport MEMREAD ioctl"): - increased the size of some stack-allocated structures (like struct mtd_oob_ops, used in bbt_nand_read(), which is indirectly called from scan_bmt()), - increased the stack size for some functions (for example, spinand_mtd_read(), which is indirectly called from scan_bmt(), now uses an extra stack-allocated struct mtd_ecc_stats). 3. OpenWRT currently compiles the kernel with the -fno-optimize-sibling-calls flag, which prevents tail-call optimization. Collectively, all of these factors caused stack usage in the mtk_bmt driver to grow excessively large, triggering stack overflows. Recursion is not really necessary in scan_bmt() as it simply iterates over flash memory blocks in reverse order, looking for a valid BMT. Refactor the logic contained in the scan_bmt() and read_bmt() functions in target/linux/generic/files/drivers/mtd/nand/mtk_bmt_v2.c so that deep recursion is prevented (and therefore also any potential stack overflows it may cause). Link: https://lists.openwrt.org/pipermail/openwrt-devel/2023-April/040872.html Signed-off-by: Michał Kępień <openwrt@kempniu.pl>
github-actions
bot
added
the
kernel
pull request/issue with Linux kernel related changes
label
Apr 29, 2023
f00b4r0
approved these changes
Apr 29, 2023
mark! |
I also encountered this problem a few days ago |
neheb
approved these changes
Apr 29, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A Linksys E8450 (mt7622) device running current master has recently started crashing:
Several factors contributed to this issue:
The mtk_bmt driver recursively calls its scan_bmt() helper function during device initialization, while looking for a valid block mapping table (BMT).
Commit fa4dc86 ("kernel: backport MEMREAD ioctl"):
increased the size of some stack-allocated structures (like struct mtd_oob_ops, used in bbt_nand_read(), which is indirectly called from scan_bmt()),
increased the stack size for some functions (for example, spinand_mtd_read(), which is indirectly called from scan_bmt(), now uses an extra stack-allocated struct mtd_ecc_stats).
OpenWRT currently compiles the kernel with the -fno-optimize-sibling-calls flag, which prevents tail-call optimization.
Collectively, all of these factors caused stack usage in the mtk_bmt driver to grow excessively large, triggering stack overflows.
Recursion is not really necessary in scan_bmt() as it simply iterates over flash memory blocks in reverse order, looking for a valid BMT. Refactor the logic contained in the scan_bmt() and read_bmt() functions in target/linux/generic/files/drivers/mtd/nand/mtk_bmt_v2.c so that deep recursion is prevented (and therefore also any potential stack overflows it may cause).
Link: https://lists.openwrt.org/pipermail/openwrt-devel/2023-April/040872.html
See #12472 for the previous (ineffective) attempt at fixing the same issue. One of the comments on that PR confirms that the revised approach proposed here appears to work.