Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Refactoring - SmartMatrix 3.x #25

Closed
embedded-creations opened this issue Jul 15, 2015 · 119 comments

Comments

@embedded-creations
Copy link
Contributor

@embedded-creations embedded-creations commented Jul 15, 2015

https://github.com/pixelmatix/SmartMatrix/tree/sm3.0

I'm working on a major revision to the SmartMatrix Library that simplifies the SmartMatrix Class into just refreshing the display, and moves the foreground and background layers into separate Layer classes. Instead of using a unique header file to describe the hardware, the configuration is set in the user's sketch, so you can have one version of the library and use it with different sizes of displays. You can create your own Layer outside of the library - for example I made a Fadecandy Layer class that I used to convert the buffers from a port of Fadecandy to SmartMatrix - and you can make your own stackup of layers, with multiple Foreground layers for example.

This requires a lot more code to set up the classes in the sketch, which I was trying to minimize with the initial release of the library, but I think this more flexible approach supports more use cases. The code is hidden away by using a couple #defines that support the most common use cases, and we can add more. I chose to declare buffers in the sketch instead of using malloc in the classes as I wanted the large buffers to show up in memory usage at the end of compilation, and there's no malloc support for creating DMAMEM buffers.

After separating into classes, refactoring to be more generic (no dependencies on MATRIX_WIDTH/HEIGHT defines) and testing on both 16x32 and 32x32 panels, I found the classes worked with 32x64 with barely any changes. I didn't get a chance to use any code from the >32x32 size pull requests from @ncortot, @GaryBoone, and @mrwastl, though maybe I will when adding support for C-shape chaining.

I have the library in a decent state, and FeatureDemo has been tested on 16x32, 32x32, and 32x64 panels. There's an occasional flicker in the 32x64 I haven't tracked down yet. Only the FeatureDemo example is updated to use the new library, though it should be easy to update the rest. (Except for FastLED_Controller, which will need an update to the FastLED Library with SmartMatrix 3.x support)

It's probably a month or more away from an official release as I'm going to be traveling without a laptop for a couple weeks, and want to add support for more display configurations (especially C-shape paneling), to do more testing, and make some efficiency improvements after I get back.

If there are any other features you'd like to see in the next major release, let's start discussing.

Not yet supported:

  • C-shape paneling for >32-pixel height
  • 24-bit (or fewer) color - this is probably best handled through C++ templates which I haven't used before, and isn't a high priority for me, let me know if it is for you (@mrwastl)
  • FastLED_Controller example
This was referenced Jul 15, 2015
@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Jul 18, 2015

24-bit or fewer: yes, it is essential because of the limited memory that is available when using a teensy 3.1. 3x8bit per pixel (including foreground-functionality) is rather on the brink when using 96x64. 2x8bit or even 1x8bit (with or without foreground-functionality) is for testing sketches that include other memory consuming libraries (these are compromises with lower performance anyways - but at least 3x8bit should be supported).

maybe the whole colour stuff could be made a little bit more colour-depth agnostic.

c-shape vs. z-shape: the difference is very small: see SmartMatrix::convertToHardwareXY() at MatrixGraphics.cpp at my fork. the question is: do you want to integrate this seemlessly (for the user) or like in my fork (that originates from GaryBoone's work) that uses an extra call to convert real x/y to internal x/y coordinates?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Jul 18, 2015

@mrwastl Long term I definitely want to add in support for lower color depth, I'm just on the fence as to if it's a requirement for the initial 3.0 release.

maybe the whole colour stuff could be made a little bit more colour-depth agnostic.

Agreed, but I think the right solution to that is to use C++ templates, which I've never used before, so this could be a big project for me. If you have ideas of an alternate way, please let me know.

Thinking right now... Maybe it could be done using different constructors for the SmartMatrix and Layer classes. You could pass in a buffer with rgb24 or rgb565 type for the Layer class and that would change behavior and obviously the buffer would be smaller. That doesn't seem too bad. The classes would still need code to handle all types of color data, so it wouldn't be the most efficient, and (I think) templates would be an improvement as it would only compile the code needed for the data type chosen by the user, but that's an improvement that could be added later.

the question is: do you want to integrate this seemlessly (for the user)

I was thinking the Layer classes would work on the actual coordinates e.g. 96x64, and the user would have to tell the SmartMatrix class (through a parameter in the constructor or a function call in setup()) to translate the data before shifting it out to the panels. It makes more sense to me that the SmartMatrix class which deals with the hardware should take care of the translation, not the Layer class which should be hardware agnostic. Once they setup the classes, the user shouldn't need to do any translation in their sketch.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Jul 19, 2015

i just did some tests with my panels (i've ordered some cheap panels from china so now i can easily fool around with different sizes):
'old' big display: 96x64, and a quick and dirty assembled one with 96x32.
cpu speed: 144 mhz.
24-bit colour-depth doesn't seem to work at all: feature demo: after some seconds: hang up, some random leds, game over (cpu set to other speeds: no change), reset teensy: same again
36/48-bits: works fine with 96x32 (but it crashes when using height=64).
flickering: yes, i seem to have this problem too (or, it is more a 'glitch' than flickering), but only very, very seldom.

a 'wish' for the feature demo: could you make the durations of the scroll texts depending on the display width? because the test is ended before it has finished in many cases (when using width=96).

do you plan to make foreground-layer optional (eg: by using a #define) - to save memory?
(in your code i see SMARTMATRIX_SETUP_DEFAULT_LAYERS (allocates back/foreground layers) and SMARTMATRIX_ALLOCATE_FOREGROUND_LAYER, but no 'SMARTMATRIX_SETUP_BACKGROUND_LAYER or so). additionally, the foreground-specific functions could be made unavailable).

a question: memory usage seems to be noticable lower when using the new version of your library (tested with FeatureDemo (old lib vs. new lib, same width/height, same colour-depth)). but i don't understand why (foreground layers are already using optmised size in my fork) - what do i miss?

EDIT1:
c++-templates: well, i'm not very familiar with these, but i'm always willing to learn ...

seemless conversion of coordinates: this sounds very good indeed.

EDIT2:
tested width width > 3_32: crash (height=32). so it seems that using > 3 panels crashes the library (memory should be no problem according to compile note (eg. compiling for width=32_4: "Global variables use 43,864 bytes (66%) of dynamic memory")

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Jul 20, 2015

I think I broke 24-bit color as there wasn't a quick hack to make it work after the refactoring . I'll work on that before the release, and maybe getting 24-bit color will open the doors to the other lower color options.

could you make the durations of the scroll texts depending on the display width?

Yes, I noticed that as well on a 64-bit display.

do you plan to make foreground-layer optional (eg: by using a #define) - to save memory?

Yes, I only made SETUP() #defines for the minimum use cases I needed for my examples. We can add more, and you don't need to use the defines, just put the code in your sketch if you have a use case that's not covered. If you want to use just the background layer, I think you can just delete the three lines related to the foreground in SMARTMATRIX_SETUP_DEFAULT_LAYERS(), and I think useDefaultLayers() should work as expected though I didn't intend for useDefaultLayers() to be used with just the background layer. I'll fix it if it's broken.

additionally, the foreground-specific functions could be made unavailable).

I believe the linker should remove them if the sketch doesn't call those function, I could be wrong.

memory usage seems to be noticable lower

That's surprising, I expected the opposite, though not a significant change. I didn't compare memory usage myself.

tested width width > 332: crash (height=32)

OK, I'll test this myself at some point. You might try increasing this #define in hardware.h, and if you have a scope, check out the timing of the data relative to latch. It could be something completely different though.
MIN_BLOCK_PERIOD_PER_PIXEL_NS

Try not use asterisks to indicate multiplication in your comments as they are interpreted as italics by the GitHub Markdown parser. e.g. it looks like you said width "332" when you said 3*32. Not a big deal but it keeps confusing me until I realize what happened.

Thanks for testing out the latest code! I'm unfortunately running out of time to work on things before my trip, so I won't be able to make any changes or do any testing for a few weeks, but I made a note of your feedback and test results.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Jul 20, 2015

arrrgh. i overlooked the markdown-effect on "x * 32". but it's easier if you are testing different widths (and i'm a lazy arse :)

have fun at your travel! i'll try to find the > 3 * 32 problem in the meantime.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 3, 2015

Some short 'interim report':

  • 6 panels work when modifying refresh rate (setRefreshRate()) and/or MIN_BLOCK_PERIOD_PER_PIXEL_NS
  • memory usage IS noticably lower (i could even configure 36 bit colour-depth AND foreground-layer together when driving 6 panels)
  • 24 bit colour is broken (colours do not fit at all - louis is aware of this)
  • 36 bit colour seems to be considerably slower than with v2
  • 24 bit colour is REALLY slow (with v2 it is quite the opposite: 24bit is rather fast, all other colour depths are slower than 24bit)
  • at the moment, only height<=32 is working (i did the real x/y to internal x/y coordinates conversion in my test-sketch and 'faked' a height=64 display back to my library)
  • tests:
    • test-patterns generated by my library 'serdisplib', transmitted via my alpha-stage protocol 'VSSDCP'
    • music videos streamed by mplayer via VSSDCP/serdisplib -> teensy.

when fooling around i've made an observation that i don't understand (maybe someone has an idea):

when i write

const uint8_t height = 32; 
const uint8_t width = (32*6);
const uint8_t depth = 36;
const uint8_t rows = 4;
static DMAMEM uint32_t matrixUpdateData[rows * width * (depth/3 / sizeof(uint32_t)) * 2];
static DMAMEM uint8_t matrixUpdateBlocks[(sizeof(matrixUpdateBlock) * rows * depth/3) + (sizeof(addresspair) * height/2) + (sizeof(timerpair) * depth/3)];

in the global variables definition part and than in setup():

matrix = new SmartMatrix(width, height, depth, rows, matrixUpdateData, matrixUpdateBlocks);
/* some initialisation stuff */
matrix->begin();

initialisation of SmartMatrix only works once (than I've to call another working sketch, eg Louis' FeatureDemo, than i can successfully call my sketch - but only once. and so on - even recompilation and disconnect/reconnect doesn't change that)

but if i write:

const uint8_t height = 32; 
const uint8_t width = (32*6);
const uint8_t depth = 36;
const uint8_t rows = 4;
static DMAMEM uint32_t matrixUpdateData[rows * width * (depth/3 / sizeof(uint32_t)) * 2];
static DMAMEM uint8_t matrixUpdateBlocks[(sizeof(matrixUpdateBlock) * rows * depth/3) + (sizeof(addresspair) * height/2) + (sizeof(timerpair) * depth/3)];
SmartMatrix matrix_inst(width, height, depth, rows, matrixUpdateData, matrixUpdateBlocks);

and in setup():

matrix = &matrix_inst;
/* some initialisation stuff */
matrix->begin();

everything works as expected. race-condition? or s'thing else that i don't see?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 6, 2015

@mrwastl Thanks for testing and the report! I'm back from my trip but probably won't have time to get back into code until next week.

36 bit colour seems to be considerably slower than with v2

What do you mean by "slower", and how are you measuring?

initialisation of SmartMatrix only works once

Interesting, I haven't tried using new with Arduino/Teensy. It will be using malloc to allocate memory, and I'm not sure of the side effects of if that's a problem. I'll think about this more when I look at the code later. Why do you want to use new in setup() instead of using matrix_inst as a global variable?

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 8, 2015

it seems that i've found better refreshrate / MIN_BLOCK_PERIOD_PER_PIXEL_NS combinations. the result are much better now. but i will observe this furthermore.

my measuring:
my sketch which gets the data via my protocol contains a simple fps-calculation that may be enabled if required. the result (fps) is painted directly into the display content.
calculation:
ten frames are taken for an FPS-calculation. the time required for these frame updates (swapBuffers) is measured and divided by 10.

testing is done by streaming a video to the teensy (see prev. posting).

here you can have a look at the sketch (this one is for SmartMatrix v2): https://github.com/mrwastl/VSSDCP/blob/master/VSSDCP_base/examples/VSSDCP_SmartMatrix.ino

DEBUG_FPS enables the fps-calculation.

this sketch also shows why i wanted matrix to be an instance reference (because it fits better into the OO-structure of my VSSDCP-protocol stuff. but yes, it isn't really necessary. but more beautiful (well. on the other side: the memory stuff is initialised outside of any class anyways ...)

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 10, 2015

i've just seen that you're enhancing sm3 step by step (waiting for hardware mapping :)

my erraneous assumption that sm3 is slower than sm2 seems to be based on this interesting discovery that i've just made and that i don't understand at all:
MIN_BLOCK_PERIOD_PER_PIXEL_NS is set to: (10000/40)

setRefreshRate(X) -> FPS

X FPS
50 26.7 (but unusable, flickering as hell)
60 22.9
65 24.2
68 25.0
69 25.3
70 25.5 (best compromise between flickering and FPS)
71 20.6 (why the drop?)
72 20.8
73 21.0
74 21.2
75 21.4
76 21.6
77 21.7
78 21.9
79 18.4 (drop again)
80 18.6

the measured FPS-values seem plausible to what i 'detect' with my eyes. and the values in question have been measured more than once (to exclude CPU peeks a.s.o.)

maybe you have any idea why there're these drops between 70, 71 and 78,79? i expected the FPS values to be linear or a curve.

EDIT: i forgot the display-size: 96x64

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 10, 2015

@mrwastl Thanks for the data, that's really useful! I don't have any idea why there would be sudden drops. I'll look into it and let you know.

The actual FPS will be some percentage lower than what you pass into setRefreshRate() because there are some delays added to the refresh code to make sure each bit meets MIN_BLOCK_PERIOD_PER_PIXEL_NS. I think it's around 10-20% off now. That's on the list to fix.

I'm still finishing up all the support for layers, there are still some bugs, then I will move on to supporting the code that refreshes and supports larger displays. Thanks for your patience!

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 10, 2015

Just a hunch, but those small FPS changes could be straddling the threshold for where another bit of the color depth needs to be padded to meet the minimum MIN_BLOCK_PERIOD_PER_PIXEL_NS. I'll look into it.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 14, 2015

SM3 is slower than SM2, I added a countFPS() function to FastLED_Functions and saw 62fps with SM2, 48fps for SM3 with a 32x32 panel. 64x32 only got 5 fps. I traced the biggest slowdown to the code translating from layers to the refresh buffer on each pixel, they're taking 40% longer. I just changed the refresh code to read in a full row of pixels instead of one at a time, and got the frame rate up to 55 fps. I'll continue on this (and check in my code) on Monday.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 16, 2015

@SLOWER sm3: i'm curious about the improvements on monday :)

@dithering: i'm very curious about what dithering can improve in a true-colour environment? maybe it blurs colour-step effects or so that are created by our visual centre?

@different colour-spaces support: i'm a little bit confused: your class SMLayerBackground uses rgb24-background buffers - no matter what SmartMatrix(w,h,depth, ...) is set to (the background layer is created outside of SmartMatrix()). so i assume this colour depth setting is only relevant for colour-mixing?
if i want to save memory i've to touch SMLayerBackground and play around with the background buffer and how to read from / write to it.
if i want to gain speed i've to play around with colour mixing (rgb48 vs. rgb24). am i getting it right or do i understand it all wrong?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 16, 2015

@dithering - it's unnecessary, Greg realized there was 48-bit color support and is using that now.

@different - the bitmap is 24-bit, but when after color correction, the lowest bits drop out unless using 36 or 48-bit color. Also there's an option to dim the background layer where color will be lost unless using a higher color depth. I want to allow the user to select a color depth for each layer as well as for refreshing. I'll be working on this next week and it should go quick after Greg's help with #27.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 16, 2015

wow. that really sounds very impressive and promising. can't wait to play around with these additions.
i guess that after the addition of hardware-coordinate correction all my 'hacks' will no longer be required (and your rotation-code is already better working than mine :)

i will continue with my serdisplib-protocol stuff. maybe i'll finally get network-based communication working in a useful way. one 'milestone': being able to stream video-output to my rgb matrix-panel via network.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 16, 2015

@mrwastl not sure if you know, but there are two other streaming protocols that work with SmartMatrix: TPM2 and Fadecandy. TPM2 is included in our Aurora firmware, and there are details on compatible software on the Aurora wiki. Fadecandy works just like the Fadecandy project and has interpolation and a very efficient USB protocol. You can find two repos for Fadecandy under Pixematix in GitHub. The firmware is a separate repo.

I'm going to be focusing on the Fadecandy USB protocol and OPC for projects in the future. I'm not happy with the currently available software that uses TPM2.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 16, 2015

@embedded-creations: no, didn't know these. but i think that they have a very different focus than my protocol has / will have: as far as i've seen by having a short look at these protocols they support small pixel amounts and one direction, main focus on intelligent LEDs (be it octo/neopixel-stuff) and rgb matrix led panels. but i'm thinking about streaming the content of my library (serdisplib) to an arbitrary display module/device (via full frame updates but also region updates) AND also including support for GPIs/GPOs (general purpose inputs/outputs like touchscreens, buttons, ...) - controlling a 480x320x24 display that is controlled by a teensy 3.1 is already working using my protocol (but only via USB - and without support for the touchpanel :(

another usage for the protcol: in the (far) future it should completely replace the existing remote-driver found in my library (i'm not happy with this driver).

what this protocol will NOT support: different layers (because my library doesn't support it :) - that's why i only need the background layer when combining this protocol and SmartMatrix.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 17, 2015

@mrwastl Yeah, these protocols don't seem suited for a larger display. I did a project using the RFB protocol used in VNC to send updates over USB to a larger display but the project is incomplete. Here are some links in case anything might be useful to you:
http://www.embedded-creations.com/projects/microvnc-revisited
https://github.com/embedded-creations/Networked-Display

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 18, 2015

@mrwastl Take a look at the updates that Greg made that I merged into sm3.0, you can now independently select color depths for the layers and refresh. I think the templates can be expanded to include the <24 bit color that you want. Please take a look and let me know what you think. I spent all my time today working on this, and I'll get back to efficiency updates later in the week. At Greg's suggestion I merged his pull request first and I'm glad I did because there were changes to almost every file in the repo, and a merge conflict would be a nightmare.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 19, 2015

@embedded-creations before the backgroundbrightness-patch there was a big problem with colours when using COLOR_DEPTH 24 (colours seemed to be extremely 'overdriven'). but now everything seems to be fine again (i've testet COLOR_DEPTH 24 and REFRESH_DEPTH 36, cc24 and cc48).
refreshrate seems to be much better now, FPS-tests will follow (seems that some side-effect broke my FPS-test/display code - i've adapted your countFPS to display the result on-screen (and not via serial)).

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 19, 2015

I've yet to commit the efficiency improvements, so expect major FPS increases tomorrow after I do that

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 19, 2015

OK, first major efficiency improvement is committed, still slower than SmartMatrix 2.x.

The color correction was broken, adding background brightness control didn't in itself fix it, but hid the problem. Now fixed.

I'm thinking about color correction in general and how it relates to different depths for storage in the layer and the color depth of refresh. Right now you can choose several options for ccmode including none, but is on/off good enough? The color correction table could be automatically selected based on the refresh color depth. I can't think of any real use for applying a 12-bit color correction table to a rgb24 image except to demonstrate how much better the 24-bit or 48-bit color correction table looks. Conversely, applying a 24-bit or 48-bit table when refreshing with 12-bit color isn't going to improve the image.

New to this release is support for 48-bit image storage in layers, which @gregfriedland added for his use case of sending already (heavily) gamma/color corrected video to the display. There's no need to apply color correction to rgb48 pixels, and there's no good way to do it anyway as the lookup table would be huge and computing on the fly would be too slow.

I plan on simplifying color correction to either enabled/disabled, and it will be automatically disabled for rgb48 layers. Let me know if this seems ok, I won't start on this until tomorrow at the earliest.

I'm also thinking about how to better handle refresh color depth. Currently 24, 36, and 48-bit color depths are supported, but an rgb48 array is used to collect data from the layers regardless. 24-bit color should use a rgb24 array, and maybe in the future supporting 12-bit or lower color depths with a smaller data type might make sense. Refresh color depth is set at compile time, but there are still comparisons and shifting done as each pixel is loaded into the refresh buffer, which could be eliminated if there were unique loadMatrixBuffers() functions for each supported refresh color depth. I'll look into how this could be done using templates as I think it's a good improvement for both memory and CPU usage.

I'm hoping that the templates Greg used to enable 48-bit color also will work to enable RGB565 and RGB332, please let me know if you take a look.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 19, 2015

slower maybe (have to do comparable measurements yet), but already very, verrry usable.

one note to FeatureDemo: maybe you could add a define at the beginning (MAX_REFRESH_RATE or so), set the initial rr to this and also maxRefreshRate in DEMO_REFRESH_RATE to this value.

when using a resolution that is too big my teensy crashes (well, teensy and driving 6 panels is always a little bit borderline :)

adding other colour depths: to be honest: i must first understand the new code (the whole template stuff). but then it shouldnt't be that hard i guess.
in my sm2-fork 16-bit colour-depth looked quite ok, 8-bit not so good ... but i added this to sm2 to be able to test some examples that needed a lot of space - and the support wasn't speed-efficient at all.
i guess when using greg's template-base this should no longer be the case.

colour-correction: unfortunately i can't test it with my 6 panel display (out of mem), but i can test the other combinations.
maybe i'll set up a 4 panel display with my other panels. than i should be able to test 48-bit col.depth.

btw: a perfect video (in my opinion) for testing colours and colour correction is 'Kylie Minogue - Put Yourself in My Place' (https://www.youtube.com/watch?v=q9t6wef5xlE) because it contains very 'shiny' colours. it's my favourite testvideo to reveal misconfiguration or colour problems in general (pink vs. blue vs. orange, scenes with fading from dark to glowing white, ...).

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 19, 2015

For your setup, do you need to set refresh rate before calling matrix.begin()? Maybe I should make the refresh rate part of the constructor instead of hardcoding a default.

If you just want to test something with a different resolution, you can run it on a larger (or smaller) physical setup. I've been testing 32x32 resolution on my 32x64 panel that happens to be plugged in at the moment. The second panel will show something that's a bit off, and it probably gets worse down the chain, but it's not going to break anything.

Packing and unpacking bits and/or shifting data into place with 8-bit and 16-bit color may be inefficient.

I'll give that video a try, thanks!

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Aug 19, 2015

i also thought that this could be a problem but it seems not to be. my usual setup is:

  • matrix.begin()
  • matrix.setBrightness()
  • matrix.setColorCorrection()
  • matrix.setRefreshRate()

yep, testing with different resolutions works fine (doing that quite often), but it looks crappy if you simulate a 2x2 panel display on a 3x2 one (because the chain-ordering stands in the way). but for quick tests and a little imagination it's ok :)
i can also rewire the chaining to get a perfect 2x2 panel output.

(2x edit because of wrong method-name / description)

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Aug 19, 2015

Maybe I misread your message but is the refresh rate demo in FeatureDemo causing your Teensy to crash? Where do you set refresh rate to change it from the default?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 4, 2015

@mrwastl OK, this is weird, putting static rgb24 tempRow2[0]; before or after the other tempRows works, but not a uint8_t array of small or large size. Compiler weirdness, or something else? Need to figure out how to generate a .map file and take a look at it.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 4, 2015

@mrwastl Got the .map file (open platforms.txt inside Arduino, add to this line: teensy31.build.flags.ld=-Wl,
-Map={build.path}/output.map,
(or another directory if you don't want it in the frequently changing build.path

There's 4 bytes (pointer?) allocated for tempRow2, so it does shift memory around. Given that shifting memory seems to be key to making things work I tried various sizes for tempRow2. 0 and 1 work, 2 doesn't, 3, 4, and 5 do, 6 doesn't. Now I'm thinking it's not bad pointer math, but bad alignment of something, probably DMA wants its addresses to be aligned in some way. Will dig into that later.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 5, 2015

@mrwastl That was a difficult bug to track down. I didn't realize timerPairIdle needed to be aligned in RAM for DMA to work. It's now stored as DMAMEM and FeatureDemo (At least DEMO_INTRO) at 96x64 is working for me with 24 and 36-bit refreshDepth.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 5, 2015

hmm. maybe there's still at least one side effect:

feature demo now works (with kRefreshDepth = 24 and kRefreshDepth = 36), but my streaming stuff immediately crashes with all 24 and 36 vs. kRefreshDepth=2/3/4-combinations.

but: it works with kRefreshDepth = 48 and kDmaBufferRows = 3 !
but kRefreshDepth = 48 and kDmaBufferRows = 2 -> crash.

the initialisation is more or less the same as in feature demo (except that i don't allocate and add indexedLayer and scrollingLayer).

the hint with the .map-file sounds interesting. might help for tracing problems like the one that i've described above (variable order).

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 6, 2015

did some experiments (or better 'some poking around in the dark' :) to isolate the crashes and made the following observation (maybe it gives a hint where to search for the source of the crashes):

i switched back to an older git-revision (where my streaming stuff was working w/o crashes and also faster than now) but where FeatureDemo crashed with most kRefreshDepth/kDmaBufferRows-combinations.

when using kRefreshDepth = 24 FeatureDemo was working for some seconds but crashed then. i played around with the demos (enable/disable, ...) and noticed, that fillScreen() seemed to crash the sketch (the 1st fillScreen() worked, but then -> crash). i tried fillRectangle(0, 1, w, h-1, col) instead -> no more crash (with kRefreshDepth = 24). i disabled all drawHardware * () functions (by inserting an early return): many patterns were processed w/o crash. also kRefreshDepth = 36 succeeded with some demos.

i switched back to the current sm3.0 revision (where FeatureDemo works but my streaming-stuff doesn't): i deactivated all fullScreen()-calls in my streaming-stuff -> no more crash (the only drawing-method that is now used: drawPixel()). the video is finished without a crash (though: the speed-up that i've mentioned at sept 3rd is still gone...).

note: for all tests i've used the FeatureDemo-sketch that belonged to the respective sm3.0 revision.
(with width/height/kRefreshDepth/kDmaBufferRows adapted)

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 7, 2015

@mrwastl Thanks for troubleshooting, I haven't looked at this at all as I've been spending all my time getting 64x64-pixel animated GIFs to play (not an easy problem, requires a lot of RAM). Got it working now.

I'll take a look at those drawHardware functions. I did notice fillScreen wasn't working well in the demo, but didn't realize it was causing crashes.

Try playing around with different settings of refreshRate, keeping in mind that the old refreshRate is somewhere around 10% slower than the more accurate refreshRate now (e.g. your old 100Hz refreshRate setting might produce the same 90Hz as setting refreshRate to 90Hz now). For 64x64-pixel AnimatedGIFs I saw very poor performance in GIF decoding when depending on the SmartMatrix library to lower the refresh rate. It would lower it just enough so there was some CPU time for the application, but not enough to decode GIFs that looked good. I set the refreshRate manually to 90 Hz and now the GIFs run smoothly. In particular I noticed GIFs were playing with lower FPS at 24-bit color than 36-bit color, but when manually setting the frameRate at 90Hz they both looked the same.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 7, 2015

yep, you're right. when using refreshrate = 67 instead of 70 with the current version i get the same results (higher FPS-value, no whining mplayer, playing time = 4:13) as described above.

note: i try to add the git ref to every reference to a certain sm3.0 version to make things clearer (e.g. from now 'current' links to the reference that was used for testing). hope that helps.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 7, 2015

@mrwastl Definitely helps. So aside from fixing the drawHardware methods, is there anything else you think needs to be in a pre-release version of SM 3.0? I still have improving FeatureDemo for larger displays on my list, but that might have to wait for the actual release of SM 3.0.

@wangnick

This comment has been minimized.

Copy link

@wangnick wangnick commented Sep 7, 2015

@mrwastl @embedded-creations I just ran the sm3_FeatureDemo on a teensy3.1@144MHz onto a 64x64 3mm pitch panel. With kRefreshDepth = 36 I'm seemingly achieving 83Hz refresh rate. Impressive. Very rarely I observe some tearing (just now once during DEMO_DRAWING_ROUNDRECT), but I'm known for being quite sensitive to flicker, and this might also be a hardware issue of my temporary setup (http://sebastian.wangnick.de/20150907_205114.jpg).
Is there any way to tweak the timings so as to get a still higher refresh rate, more in the 120Hz range or so (without compromising the 36 bit depth that is needed so that very lowly lit pixels dont start to posterize)?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 7, 2015

@wangnick Nice setup! I have an old Logic 8, and want to get one of the new Logic Pros at some point.

As you have a logic analyzer setup, you can turn on the debug pins using DEBUG_PINS_ENABLED in MatrixHardware_KitV1.h. With DEBUG_PIN_2 (Teensy pin 17) you can see how much time is being spent in the ISR calculating data to shift out, and how much time is left for the sketch to run. That's the code that is limiting the refresh rate. I haven't spent a ton of time optimizing that code so I'm sure there's room for it to run faster.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 7, 2015

@embedded-creations the only little small thing that comes into my mind: rotation 90 / 270 scrolling doesn't go very smooth.. (don't know how to describe it: it looks like 'stepping' or a little bit like tearing or so).

it's noticeable but i wouldn't count it as a 'showstopper bug'.

if you're done with defining the API i will try to add different colour spaces (16bit and 8bit) - if i get used to the template-stuff ... and also port the extended spectrum analyzer and my teensy-rtc based MatrixClock3 to SM3.0 (don't know yet if i should create a separate git-project for each of these or a tools-project that includes all these sketches)

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 8, 2015

@mrwastl Yes, I'm done defining the API.

I think now that templates are in place adding more color spaces I hope should be pretty easy, though I may be overlooking something. MatrixCommon.h has the definitions for color spaces. There's a struct, operators to convert to other color spaces, and a colorCorrection function that need to be created for each new color space. The color correction in SMLayerBackground::fillRefreshRow may cause some trouble. Consider replacing lookups in backgroundColorCorrectionLUT with one of the static lightPowerMap tables for now, and I will think more about how to improve background color correction in a future release so it can be used with other color spaces.

If you would like the extended spectrum analyzer and teensy-rtc based MatrixClock sketches to be included in a future SmartMatrix release as replacements/improvements to the existing examples, then you could work on them as branches of SmartMatrix. If not, separate repos would probably be better than one repo.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 8, 2015

@mrwastl I unfortunately can't reproduce the drawHardware* crashing. I did see a crash during DEMO_DRAWING_INTRO but I'm not sure what caused that as so many drawing methods are tested there. Can you upload a GIST of your FeatureDemo.ino that is crashing? If possible, just enable one or a few of the demos that are causing crashing so it's quick to reproduce.

I reviewed all the code related to fillScreen, drawFast*Line and drawHardware and only found one mistake: fillScreen tries to fill the size of the screen plus one line on each side, but drawFastHLine limits the size to the actual screen coordinates, so that's not a source of crashing.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 8, 2015

@mrwastl

rotation 90 / 270 scrolling doesn't go very smooth

I didn't see any obvious bugs when I changed featureDemo to run at 90/270 rotation, but the text did look like it was jumping a bit.

I think this is related to the movement of the scrolling and the multiplexing of the panels. As the text is moving, the scanlines are also moving in either the same or opposite direction (depending on 90/270). At the middle of each panel, the top half and bottom half meet. Row 15 is refreshed at the end of the frame, row 16 is refreshed at the beginning of the frame, so text may have moved in the very short time between row 15 being visible and row 16 being visible. There's not much I can do about this. I tried randomizing or using a pattern to go through the rows instead of just scanning from top to bottom, but this looked much worse, especially when moving your head or at lower refresh rates.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 8, 2015

@embedded-creations
jumping text: yes, that's what i meant. only a 'visual' flaw, nothing more.

crashes: now THAT was a WTF-experience:

i played around with my fillScreen-method and narrowed the crash to this (with kRefreshDepth = 24):

void VSSDCP_smartmatrix::fillScreen (uint32_t col) {
  col2RGB(col);
  //backgroundLayer.fillScreen(tempcol);
  backgroundLayer.fillRectangle(0, 0, width-1, height-5, tempcol);
  backgroundLayer.drawFastHLine(0, width/2+24, height-4, tempcol);
  backgroundLayer.drawFastHLine(width/2+26, width-1, height-4, tempcol);
  backgroundLayer.fillRectangle(0, height-3, width-1, height-1, tempcol);
  // backgroundLayer.drawPixel(width/2+25, height-4, tempcol);   // enabling this one: crash
}

(= the whole screen is filled except one pixel)

kRefreshDepth = 36 seems to crash on a different position.

that was a real big WTF!

so i scanned through my protocol library again: there is one malloc (it allocates 256 bytes) that assignes an array to a member variable that is defined in the super-class.
so i added a static array to the class and assigned this one to the super-class member var. (to avoid malloc) -> no more crash.

memory shortage should not be a problem (around 12k still free).
and the malloc was there for quite a long time ...
but suddenly (from one version to another): crash.
i don't get it. some DMA / malloc interference?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 8, 2015

@mrwastl the call to malloc is only made once? at what point in the program flow?

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 9, 2015

@embedded-creations yes, in the class-constructor of the super-class (VSSDCP_base -> VSSDCP_serial -> VSSDCP_smartmatrix, malloc was in the class-constructor of VSSDCP_serial.

update:
to be more precisely: the buffers / layer are 'allocated' outside of any class directly in the main sketch (through macros SMARTMATRIX_ALLOCATE_BUFFERS and SMARTMATRIX_ALLOCATE_BACKGROUND_LAYER: therefore
layer/buffer-memory -> class-constructors -> initialisation of SmartMatrix -> matrix.begin()

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 9, 2015

@mrwastl Unfortunatlely I don't have any insight here. It sounds like a bug in either the Teensy code, maybe in the linker script, or possibly GCC, but I think you'd want to have a simplified example to prove to someone that there's a problem that needs to be fixed. Maybe you could make a simple sketch that allocates some large DMAMEM buffers, fills them with a pattern, uses malloc, and shows that the pattern is changed after. I'm not sure what to suggest right now other than don't use malloc.

edit: changed "don't think" to "think"

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 9, 2015

@mrwastl I'm currently tracking down crashing from fillScreen in the Bitmaps example at 64x64 resolution. It seems to crash when fillScreen on the second half of backgroundBuffer. If I replace the drawHardwareHLine direct buffer writes with calls to drawPixel, it still crashes. If I have fillScreen call memset() it doesn't crash. Not sure what's going on, it's hard to track down, but it's likely this is what's causing your crashes, not a malloc/DMA conflict.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 9, 2015

@mrwastl Tracing it down to the rgb24 assignment operator.
After modifying drawHardwareHLine to call drawPixel, and this modification to drawPixel, fillScreen works without crashing:

    currentDrawBufferPtr[(hwy * this->matrixWidth) + hwx].red = color.red;
    currentDrawBufferPtr[(hwy * this->matrixWidth) + hwx].green = color.green;
    currentDrawBufferPtr[(hwy * this->matrixWidth) + hwx].blue = color.blue;

This original code crashes:

    currentDrawBufferPtr[(hwy * this->matrixWidth) + hwx] = color;

I'm guessing the assignment operator is stepping on other memory. Need to figure out how assignment operators work...

edit: not copy constructor, assignment operator

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 9, 2015

@mrwastl Pushed the fix - I can't explain it but the implicit assignment operator for rgb24 seems to be stepping on other memory. Added my own definition and it works. Hopefully you can verify this fixes your issue with malloc.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 9, 2015

@embedded-creations yep, this one did the trick. no more crash with fillScreen() and malloc.

@mrwastl

This comment has been minimized.

Copy link

@mrwastl mrwastl commented Sep 10, 2015

@embedded-creations as you're polishing the code for release: one tiny typo: the comments for width and height seem to be swapped.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 10, 2015

@mrwastl Good catch, thanks!

@gregfriedland

This comment has been minimized.

Copy link
Contributor

@gregfriedland gregfriedland commented Sep 11, 2015

I haven't been following this discussion fully so apologies if the
following question is obvious.

I'm trying to see how many frames per second I can get with a 64x32 array
(with latest on sm3.0) and am trying various parameters. COLOR_DEPTH of 48
is slower than 24 which makes sense but changing kRefreshDepth from 24 to
48 seems to have no effect. Does that make sense and if so, then what's the
advantage of the lower refresh depths?

On Thu, Sep 10, 2015 at 11:18 AM, Louis Beaudoin notifications@github.com
wrote:

@mrwastl https://github.com/mrwastl Good catch, thanks!


Reply to this email directly or view it on GitHub
#25 (comment)
.

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 12, 2015

@gregfriedland It's not obvious. The difference is CPU time (though it sounds like this is minor), and memory usage: 24 uses a smaller temp buffer than 36 and 48 and the buffers set by kDmaBufferRows scale with refreshDepth.

The more bits you have the dimmer the display at 100% brightness. This comment might explain more:
#25 (comment)

@gregfriedland

This comment has been minimized.

Copy link
Contributor

@gregfriedland gregfriedland commented Sep 13, 2015

Those other differences make sense, although I'm curious why the cpu time isn't drastically different. I think you are probably using Bit Angle Modulation (is that right?), so shouldn't it be possible to refresh the display at refreshDepth = 24 bits twice as fast as refreshDepth = 48 bits?

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 14, 2015

Yes, using Binary Code Modulation (BCM) (aka BAM), but the limitation on refresh rate is CPU time not bit depth. Each row is computed as needed, once per frame, and after a certain point the CPU is spending 100% of the time just preparing rows for refresh.

shouldn't it be possible to refresh the display at refreshDepth = 24 bits twice as fast as refreshDepth = 48 bits

SmartMatrix assigns a number of timer ticks per row based on the refresh rate and number of rows to refresh. It takes the row time and divides it up into the number of BCM bits, keeping in mind there's a minimum time to shift out data, so each bit has a minimum time. It's a little extra CPU time to prep 36 or 48-bit vs 24-bit, loading a little more data into the buffers, but the rest is taken care of by DMA with little effect on CPU. Here's an example of refreshing focused on a single row with 36-bit color. You can see 12-bits all updated sequentially. If it were 24-bit color, the time between the "1" and "2" flags would be the same, but there would only be 8 latches and each bit would be a bit wider. Sorry this probably isn't a good explanation, I need to do a long writeup on how everything works, it's hard to explain in a single paragraph with one picture.
qfyhb2p5tkwirwpaigjyjmnwo9ihuadmt2zciizrchg

@embedded-creations

This comment has been minimized.

Copy link
Contributor Author

@embedded-creations embedded-creations commented Sep 16, 2015

Beta version is released:
https://github.com/pixelmatix/SmartMatrix/releases/tag/3.0-b1

I'm going to close this conversation as it's gotten extremely long, thanks for all the contributions, feedback, and testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.