Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teensy 4.1: extra flash, optional external RAM not addressable #86

Open
mciantyre opened this issue Nov 30, 2020 · 12 comments
Open

Teensy 4.1: extra flash, optional external RAM not addressable #86

mciantyre opened this issue Nov 30, 2020 · 12 comments

Comments

@mciantyre
Copy link
Owner

The teensy4-bsp supports both Teensy 4.0 and 4.1 boards. We achieve this with a single linker script. However, the common support means that we are not using the Teensy 4.1's

  • larger flash
  • (optional) external RAM
  • (optional) extra flash

Users who need more than ~2MB flash for their Teensy 4.1 programs, or who want to use the pads for extra RAM and flash, may find that today's BSP doesn't support these features.

This issue tracks support for extra storage on the Teensy 4.1.

@mciantyre
Copy link
Owner Author

Note that this issue doesn't affect usage of the SD card on the Teensy 4.1. I believe using the SD card will need a uSDHC driver.

@tyalie
Copy link

tyalie commented Dec 12, 2021

How would one go one to enable this? I'm actually running into hard memory limits whilst speaking on the Teensy as I had assumed to have at least 1mb of RAM available for the Heap implementation.

@mciantyre
Copy link
Owner Author

mciantyre commented Dec 14, 2021

If you're interested in hardware modifications that increase RAM, the Teensy 4.1 supports external RAM. Here are the recommended parts and installation instructions. Otherwise, the on-chip RAM (OCRAM) is the same for both Teensy 4 models.

I had assumed to have at least 1mb of RAM available for the Heap implementation.

Sorry, but I'm not sure we can provide 1MB of RAM solely for heap. The official Teensy 4 runtime provides up to 512KB of OCRAM for the heap. See the Teensy 4.1's memory map, here. #110 proposes something similar for this implementation.

@tyalie
Copy link

tyalie commented Dec 14, 2021

Ah thank you. Now I understand the title better. Sorry for posting it in the wrong thread. I assumed it was different to the Twensy 4.0 as the flash has increased immensely in comparison.

@cstrahan
Copy link
Contributor

cstrahan commented Feb 13, 2023

@mciantyre Should it suffice to replace this line

RuntimeBuilder::from_flexspi(Family::Imxrt1060, 1984 * 1024)

with

RuntimeBuilder::from_flexspi(Family::Imxrt1060, 16384 * 1024)

for the Teensy MicroMod?

(EDIT: I suppose the FlexSPI Configuration Block in teensy4-fcb would also need to be modified to reflect the different flash_size?)

Also, since we're also talking about RAM here...

I got my FlexIO+DMA LCD driver working, and implemented a driver for Slint, and it works:

teensy-micromod-slint

But that's just running the small demo here: https://github.com/slint-ui/slint-mcu-rust-template/blob/225ab463511411ffb4b26f71b77e2215717e8667/ui/appwindow.slint

I wanted to try running their printer UI demo, but I run into linking issues:

error: linking with `rust-lld` failed: exit status: 1
  |
  = note: LC_ALL="C" PATH="/home/cstrahan/.rustup/toolchains/nightly-2023-02-13-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/home/cstrahan/.deno/bin:/run/user/1000/fnm_multishells/15290_1676178469019/bin:/home/cstrahan/.fnm:/home/cstrahan/.local/share/pnpm:/home/cstrahan/.asdf/shims:/home/cstrahan/.asdf/bin:/home/cstrahan/.local/bin:/usr/local/bin:/run/user/1000/fnm_multishells/6622_1676176825985/bin:/home/cstrahan/.cargo/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/cstrahan/go/bin:/usr/local/go/bin:/home/cstrahan/.fzf/bin" VSLANG="1033" "rust-lld" "-flavor" "gnu" "/tmp/rustcWpV0mK/symbols.o" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/deps/hello_teensy-b04e3cee815147c1.hello_teensy.09139a8b-cgu.0.rcgu.o" "--as-needed" "-L" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/deps" "-L" "/home/cstrahan/src/hello-teensy/target/release/deps" "-L" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/build/cortex-m-090e74d85d092cc7/out" "-L" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/build/cortex-m-rt-def1062c5ff082b2/out" "-L" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/build/defmt-475418a5914e0621/out" "-L" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/build/imxrt-ral-72f39907aa255519/out" "-L" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/build/teensy4-bsp-2c6edf65dc997284/out" "-L" "/home/cstrahan/.rustup/toolchains/nightly-2023-02-13-x86_64-unknown-linux-gnu/lib/rustlib/thumbv7em-none-eabihf/lib" "-Bstatic" "/tmp/rustcWpV0mK/libcortex_m-51776dd45413cd5e.rlib" "/home/cstrahan/.rustup/toolchains/nightly-2023-02-13-x86_64-unknown-linux-gnu/lib/rustlib/thumbv7em-none-eabihf/lib/libcompiler_builtins-cb44e1eeaeda502e.rlib" "-Bdynamic" "--eh-frame-hdr" "-znoexecstack" "-L" "/home/cstrahan/.rustup/toolchains/nightly-2023-02-13-x86_64-unknown-linux-gnu/lib/rustlib/thumbv7em-none-eabihf/lib" "-o" "/home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/deps/hello_teensy-b04e3cee815147c1" "--gc-sections" "-Tt4link.x" "-Tdefmt.x"
  = note: rust-lld: warning: section type mismatch for .uninit.defmt-rtt.BUFFER
          >>> /home/cstrahan/src/hello-teensy/target/thumbv7em-none-eabihf/release/deps/hello_teensy-b04e3cee815147c1.hello_teensy.09139a8b-cgu.0.rcgu.o:(.uninit.defmt-rtt.BUFFER): SHT_PROGBITS
          >>> output section .uninit: SHT_NOBITS
          
          rust-lld: warning: section type mismatch for .got
          >>> <internal>:(.got): SHT_PROGBITS
          >>> output section .got: SHT_NOBITS
          
          rust-lld: warning: section type mismatch for .got.plt
          >>> <internal>:(.got.plt): SHT_PROGBITS
          >>> output section .got: SHT_NOBITS
          
          rust-lld: warning: section type mismatch for .got
          >>> <internal>:(.got): SHT_PROGBITS
          >>> output section .got: SHT_NOBITS
          
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 220 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7408 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7484 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7488 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7492 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7494 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7500 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7502 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7776 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7860 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7874 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7932 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 7940 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 8188 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 8216 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 9828 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 9900 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 9952 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 10004 bytes
          rust-lld: error: section '.text' will not fit in region 'ITCM': overflowed by 10044 bytes
          rust-lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors)
          

warning: `hello-teensy` (bin "hello-teensy") generated 38 warnings
error: could not compile `hello-teensy` due to previous error; 38 warnings emitted

Tried adding this to Cargo.toml:

[profile.release]
opt-level = "z" 
lto = true

but still no luck. Any advice?

@cstrahan
Copy link
Contributor

They run the demo on a STM32H735IGK6U MCU (564kB of SRAM) and a Raspberry Pi Pico (264kB SRAM), so it seems like there should be some way to pull this off on the TMM. Certainly they aren't doing something like keeping code on flash instead of SRAM 🤔

I figured I'd do something like cargo nm --release -- --print-size --size-sort | grep ' \(t\|T\) ' to see the worst offenders, but I can't do that if the whole thing fails to link 😅.

@cstrahan
Copy link
Contributor

diff --git a/build.rs b/build.rs
index 201ff96..3f74126 100644
--- a/build.rs
+++ b/build.rs
@@ -13,7 +13,7 @@ fn main() {
         .stack(Memory::Dtcm)
         .stack_size(16 * 1024)
         .vectors(Memory::Dtcm)
-        .text(Memory::Itcm)
+        .text(Memory::Flash)
         .data(Memory::Dtcm)
         .bss(Memory::Dtcm)
         .uninit(Memory::Ocram)

This lets me compile and link without error, but flashing the device doesn't give any indication that anything is happening (no RTT output, and nothing on screen). Dunno if I'm either missing something there, or if the RT1062 needs to be configured differently to run from flash, or something else.

@mciantyre
Copy link
Owner Author

mciantyre commented Feb 13, 2023

Well done 🎉 very cool to see a UI framework running on these MCUs!

Yup, I'd expect changes to the flash size in the RuntimeBuilder::from_flexspi call, and also the FCB. However, I'm not expecting this to be necessary until we see a linker warning indicate we're running out of FLASH. (The previous linker error indicates we're out of ITCM; increasing the flash size won't fix that.)


From the unmodified runtime:

teensy4-rs/build.rs

Lines 6 to 10 in 6f38334

.flexram_banks(FlexRamBanks {
ocram: 0,
itcm: 6,
dtcm: 10,
})

Every FlexRAM bank is 32 KiB of some kind of RAM. If you add a bank to ITCM, you'll get an additional 32 KiB for instructions. But, the chip only has 16 banks. So if you add a bank to ITCM, you'll need to take a bank away from DTCM.

I'm hoping that there's a balance of FlexRAM banks where those "section '.text' will not fit in region 'ITCM'" errors go away.


An extreme alternate approach: How about putting everything into OCRAM? We can express that with the RuntimeBuilder. This should be the way to give ourself 1024 KiB of contiguous RAM.

RuntimeBuilder::from_flexspi(Family::Imxrt1060, 1984 * 1024)
    .flexram_banks(FlexRamBanks {
        ocram: 16,
        itcm: 0,
        dtcm: 0,
    })
    .heap(Memory::Ocram)
    .heap_size(16 * 1024)
    .stack(Memory::Ocram)
    .stack_size(16 * 1024)
    .vectors(Memory::Ocram)
    .text(Memory::Ocram)
    .data(Memory::Ocram)
    .rodata(Memory::Ocram)
    .bss(Memory::Ocram)
    .uninit(Memory::Ocram)
    .linker_script_name("t4link.x")
    .build()
    .unwrap();

Naively, I expect this XIP configuration would work. But, I haven't played around enough with XIP to know its proper setup and limitations. (If I remember correctly, whenever I tried XIP on my 1010EVK, I'd eventually fault with undefined instructions.)

@cstrahan
Copy link
Contributor

Thanks @mciantyre for your help thus far!

I tried again with the XIP config, but using the simple demo UI, and that did work.

So I decided to put aside the big printer UI demo and see if I could reproduce my problems by just duplicating the text and button field on that UI; sure enough, past a certain number of elements (~10 text boxes or buttons) the program crashes again.

Attached a debugger and found that when it does fail, it happens right when making a function call. Execution jumps straight to the HardFault handler. So my hunch is something like I can't call a function if the address is too high.

My OOM handler isn't called, so it's not that (and besides, this is really early on in the initialization of the program -- not much has or will be allocated at point).

In case it was a problem with XIP, I also tried with this config (only slightly modified from your suggestion):

#[cfg(feature = "rt")]
fn main() {
    use imxrt_rt::{Family, FlexRamBanks, Memory, RuntimeBuilder};

    RuntimeBuilder::from_flexspi(Family::Imxrt1060, 1984 * 1024)
        .flexram_banks(FlexRamBanks {
            ocram: 16,
            itcm: 0,
            dtcm: 0,
        })
        .heap(Memory::Ocram)
        .heap_size(16 * 1024)
        .stack(Memory::Ocram)
        .stack_size(16 * 1024)
        .vectors(Memory::Ocram)
        .text(Memory::Ocram)
        .rodata(Memory::Flash) // <--- rodata still won't fit, so put it in flash
        .data(Memory::Ocram)
        .bss(Memory::Ocram)
        .uninit(Memory::Ocram)
        .linker_script_name("t4link.x")
        .build()
        .unwrap();
}

I feel like it's sooo close!

If you have any other suggestions, please do let me know. Thanks!

@cstrahan
Copy link
Contributor

Realized it could be a stack overflow, so I tried doubling up on stack (.stack_size(32 * 1024)) and what wasn't working is working now using the last config I mentioned!

So my next order of business is figuring out how I can get a heads-up when the stack overflows, because that's super not fun to debug.

@mciantyre
Copy link
Owner Author

Awesome! Good call putting read-only data into flash. I've generally had better luck fetching data over FlexSPI than fetching instructions over FlexSPI.


One stack overflow detection strategy that comes to mind is to use the MPU. A MemManage fault could be that heads-up. But for the MPU to be effective, I think we'd need to define all of the accessible memory regions.

I'd been thinking about exposing all the memory regions through imxrt-rt for this purpose. The crate already has APIs for the heap start and end; it could also make available the endpoints for stack, text, data, uninit, etc. This might let users implement their own MPU policy. We could also provide a default policy in teensy4-bsp or imxrt-rt.

(My suggestion ignores a potential problem of "what does a helpful MemManage / HardFault response look like when we're out of stack?")

@cstrahan
Copy link
Contributor

cstrahan commented Feb 27, 2023

@mciantyre I'll have to read up on the MPU -- thanks for the suggestion!

Also, since you might be interested, a video of me running their printer demo UI: https://twitter.com/charlesstrahan/status/1630026224356474881

Just recently wrapped up the last pieces I needed to get everything working :). Hoping to soon tease apart the 8080 bus, Slint backend, Goodix touchscreen and RA8876 (and ILI9486) controller code into separate crates and opensource everything. After that, I'm hoping to design a PCB to facilitate direct connection to this display (and maybe a couple others), so that someone wanting to write fancy (and responsive) UIs for embedded projects can hit the ground running from day 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants