Skip to content

Conversation

@ArcaneNibble
Copy link
Collaborator

This depends on the in-progress USB and MMU work, but this enables the caches on the EV3 (caching the main system RAM).

@ArcaneNibble
Copy link
Collaborator Author

fwiw this seems to improve system performance so much that pybricks/support#2303 doesn't manifest anymore (at least, not in a way that is visible to the naked eye)

@coveralls
Copy link

coveralls commented Aug 2, 2025

Coverage Status

coverage: 56.973%. remained the same
when pulling 2000de2 on ArcaneNibble:cache
into 1a71f8e on pybricks:master.

@ArcaneNibble ArcaneNibble force-pushed the cache branch 5 times, most recently from 04240ad to 22c31be Compare August 2, 2025 13:06
@laurensvalk
Copy link
Member

laurensvalk commented Aug 4, 2025

fwiw this seems to improve system performance so much that pybricks/support#2303 doesn't manifest anymore (at least, not in a way that is visible to the naked eye)

Can confirm visually that this is way better!

If you look carefully you can still see it with the NXT color sensor but there should be other ways to smooth it out if we want to. (EDIT: if I apply another pending patch on top of yours to use pbio_os_timer_extend in the ADC loop it evens out even more.)

EDIT2: Woah, this cuts the ADC read time from 2100 us to 86 us 😄

@laurensvalk
Copy link
Member

The new 3 commits rebase cleanly. Let me know if you'd like me to rebase and merge this yet or refine further.

This function flushes the D-cache, not the I-cache.
This allows the main memory to be write-back cached.

Special note regarding DMA buffers and the uncached alias mapping:
DMA buffers and uncached data cannot safely share cache lines
with unrelated data! We use a special macro to align these buffers
so that issues do not occur.

If DMA RX buffers share cache lines with other data, it can lead to
race conditions which corrupt data. If we flush and invalidate
the cache before performing the DMA, reading the unrelated data
can cause the cache line to be loaded back into cache, cancelling
out the invalidation. If we wait until after performing the DMA
and perform a flush and invalidate, the flush can corrupt part of
the data which was written by the DMA. If we only perform an invalidate,
writes to the unrelated data can be lost.

A similar race condition can happen if an uncached alias is used
to read/write to data which sits next to unrelated data.
This currently applies to the ADC RX buffer.

The USB driver's CPPI descriptors are safe as they happen to already
be cache line aligned.

The solution we use here is to place each buffer in its own
cache-line-aligned block of memory and only perform an invalidate
upon DMA completion. This makes sure that no unrelated data can
get caught up in the same cache lines.

This issue does not affect DMA TX buffers written using the
normal cached mapping. Those are handled by flushing the cache.
@ArcaneNibble ArcaneNibble marked this pull request as ready for review August 5, 2025 14:30
@ArcaneNibble
Copy link
Collaborator Author

I think this is fine. I just wanted someone else to look over it in case I missed a location where a driver depended on DMA.

I also noticed that the PRU soft UART doesn't use DMA, which is potential room for improvement in the future.

@laurensvalk laurensvalk merged commit 2000de2 into pybricks:master Aug 5, 2025
17 checks passed
@laurensvalk
Copy link
Member

Yeah, and the PRU writes only one byte at the time. This was copied from other existing implementations (and stripped down by about 75% to drop all kinds of Linux wrappers), but it should be able to write more at once.

            // Attempt to write one byte with the PRU UART. We could be sending
            // more than one byte at a time, but we keep it consistent with the
            // current PRU API. We don't send much data to EV3 sensors anyway.
            if (pbdrv_uart_ev3_pru_write_bytes(pdata->peripheral_id, &uart->write_buf[uart->write_pos], 1)) {
                uart->write_pos++;
            }

Another issue is that it has two ringbuffers due to how the APIs are glued together, which is also redundant.


Thanks, merged too!

void pbdrv_cache_prepare_after_dma(const void *buf, size_t sz);

// Accesses a variable via the uncached memory alias
#define PBDRV_UNCACHED(x) (*(volatile __typeof__(x) *)((uint32_t)(&(x)) + 0x10000000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very specific to EV3, so we should rename things accordingly. Or just put this in the tiam1808 library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not in tiam1808, since it's specific to the way we've specifically programmed the MMU. PBDRV_EV3_UNCACHED?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. Then I would expect a macro for the 0x10000000 value then that is shared between this and where we program the MMU.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering about the uint32_t. Would it make more sense to use uintptr_t?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants