Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for plots with millions of points to render, without requirement of 32bit index support in imgui #41

Merged
merged 7 commits into from
Jun 7, 2020

Conversation

sergeyn
Copy link
Contributor

@sergeyn sergeyn commented May 21, 2020

No description provided.

@sergeyn
Copy link
Contributor Author

sergeyn commented May 21, 2020

Test code:

ImPlot::PlotLine("spiral",
         [](void*, int idx) {
            float r = 0.9f; // outer radius
            float a = 0; // inner radius
            float b = 0.05f; // incerement per rev
            float n = (r - a) / b; // number  of revolutions
            double th = 2 * n * M_PI; // angle
            float Th = float(th * idx / (1000000 - 1));
            return ImVec2(0.5f+(a + b*Th / (2.0f * (float) M_PI))*cosf(Th), 0.5f + (a + b*Th / (2.0f * (float)M_PI))*sinf(Th));            
         }, nullptr, 1000000);

This version is also slightly faster than current master (around 3% faster).

This code uses knowledge of how PrimReserve function works, which is not a good thing and probably ImGui itself needs to get refactored in this area to better support scenarios when widgets push a ton of stuff to render.

Edit:
You also need ImGui's PR 3232 ocornut/imgui#3232 for this to render without issues

Edit:
Code Formatting

@epezent
Copy link
Owner

epezent commented May 21, 2020

At a high level, can you explain what this is doing before dig into the code. Also, consexpr if is C++17, which does not conform to ImGui's standard.

@sergeyn sergeyn changed the title Support for plots with millions of points to render, without requirement of 32bit index support in imgpui Support for plots with millions of points to render, without requirement of 32bit index support in imgui May 21, 2020
@sergeyn
Copy link
Contributor Author

sergeyn commented May 21, 2020

The problem with original code is with PrimReserve - you can't use it for more points than current index size can fit. If you try to do that, you get triangle mess on the screen. In short new logic checks how many points you can fit in current draw command and does as many different reservations as needed to add more points without violating index size restrictions.
In addition there are few minor perf tweaks and RenderLineStrip function is now templated on the various features it has (again, for speed).

I'll fix constexpr, it's only there to remove visual studio compiler warning about constant conditions

@epezent
Copy link
Owner

epezent commented May 21, 2020

Got it, sounds like a great addition. I'll get around to merging this in a few days. Thank you for the work!

@epezent
Copy link
Owner

epezent commented May 29, 2020

This code uses knowledge of how PrimReserve function works, which is not a good thing and probably ImGui itself needs to get refactored in this area to better support scenarios when widgets push a ton of stuff to render.
Edit:
You also need ImGui's PR 3232 ocornut/imgui#3232 for this to render without issues

I hadn't noticed this before. We can't merge code that relies on an open ImGui PR, so I suggest we wait and see what their response to the issue is. Otherwise, my initial feedback is to:

  1. remove if constexpr entirely (could template specializations be used here?)
  2. break up RenderLineStrip into smaller parts
  3. this will be difficult to follow and maintain:
static void(*render_fn[32])(Getter & getter, ImDrawList & DrawList, int count, int offset, float line_weight, ImU32 col_line, int y_axis) =
{
    &RenderLineStrip<0>,&RenderLineStrip<1>,&RenderLineStrip<2>,&RenderLineStrip<3>, &RenderLineStrip<4>,&RenderLineStrip<5>,&RenderLineStrip<6>,&RenderLineStrip<7>,
    &RenderLineStrip<8>,&RenderLineStrip<9>,&RenderLineStrip<10>,&RenderLineStrip<11>, &RenderLineStrip<12>,&RenderLineStrip<13>,&RenderLineStrip<14>,&RenderLineStrip<15>,
    &RenderLineStrip<16>,&RenderLineStrip<17>,&RenderLineStrip<18>,&RenderLineStrip<19>,&RenderLineStrip<20>,&RenderLineStrip<21>,&RenderLineStrip<22>,&RenderLineStrip<23>,
    &RenderLineStrip<24>,&RenderLineStrip<25>,&RenderLineStrip<26>,&RenderLineStrip<27>,&RenderLineStrip<28>,&RenderLineStrip<29>,&RenderLineStrip<30>,&RenderLineStrip<31>
};

int which_fn_to_use = (RenderLineStrip_Offset & -(offset != 0))
    | (RenderLineStrip_Extents & -(gp.FitThisFrame != false))
    | (RenderLineStrip_LogX & -(HasFlag(plot->XAxis.Flags, ImPlotAxisFlags_LogScale) != false))
    | (RenderLineStrip_LogY & -(HasFlag(plot->YAxis[y_axis].Flags, ImPlotAxisFlags_LogScale) != false))
    | (RenderLineStrip_Cull & -(cull != false));

@sergeyn
Copy link
Contributor Author

sergeyn commented May 29, 2020 via email

@epezent
Copy link
Owner

epezent commented May 29, 2020

  1. regarding constexpr - without it compiler generates a bunch of useless
    warnings. I don't like warnings and I also don't like muting warnings
    either. I do empty it out with define for pre c++17 code. Is there a
    problem with constexpr now ?

I saw the empty define, and I'm not a fan of it. I'm sure there are other ways to write this code warning free without constexpr.

Then a few flags I consider relatively useless: cull -
just always use culling, what's the purpose of not using it?

Fair point, this was previously in place because our culling scheme was imperfect and didn't always look right. This was fixed, and so we can probably just always cull now

Offset flag is completely useless imho, and if it is needed, it can be achieved using
callback interface of ImPlot.

Are you suggesting we remove offset from the API? It's not useless -- it's the primary mechanism that allows realtime plots to work efficiently with circular buffers.

"Fit this frame" flag is relatively inexpensive and can always be computed (and then results discarded when notneeded).

Agreed.

Then you'd get with 4 combinations (LogX, LogY) - sounds ok ?

Sounds better, but I'm still not seeing the point entirely. Is the goal to move all the Transformer operations into the body of the function?

@sergeyn
Copy link
Contributor Author

sergeyn commented May 29, 2020 via email

@sergeyn
Copy link
Contributor Author

sergeyn commented May 30, 2020

@epezent - the refactoring steps I described are enough to get this merged in ? Or you don't feel like letting transformers go in RenderLineStrip ?

@epezent
Copy link
Owner

epezent commented May 31, 2020

I would prefer to keep Transformers separate for ease of maintenance and understanding. However, I don't want to do redundant calculations either.

I'm looking at TransformerLinLin, TransformerLogLin, TransformerLinLog, and TransformerLogLog, and I think they are each optimal. I don't see what calculations could be saved on all plot points. Can you elaborate on what you meant by this? If it's possible to save operations, then they could be cached in member variables of these structs, performed on construction and use in subsequent calls to operator().

@sergeyn
Copy link
Contributor Author

sergeyn commented May 31, 2020

In one of your 'log' transformers for example:
'float t = log10(y / gp.CurrentPlot->YAxis[y_axis].Range.Min) / gp.LogDenY[y_axis];'
this line has 2 divisions and 5 extra memory fetches, 3 of them (maybe) will get optimized out. These can probably/maybe be refactored to the state when compiler doesn't generate any extra code per iteration. And then, at some point, for further perf increase you will want vectorization, which is another refactor of transformers and you end up in a situation where it takes much more code to support transformer abstraction, given the fact that they are technically should only be called only once when rendering any type of graphs(you do not want to call your getters and then apply logarithms again to render markers for instance). So in this case, given that they are used at one place only, it makes sense to encode the logic they support in one configurable function, which is more vectorize-friendly.
That all only applies if you care about performance much. Code-wise I'm on a more extreme side of things when it comes to performance.

@epezent
Copy link
Owner

epezent commented May 31, 2020

Would caching the memory fetches into member variables of the structs improve performance and vectorization in this case.

Also, is it not the case that the compiler will inline the operator() into the RenderLineStrip function since Transformers are templated arguments?

@sergeyn
Copy link
Contributor Author

sergeyn commented May 31, 2020 via email

@epezent
Copy link
Owner

epezent commented May 31, 2020

Would you care to try some of this (caching inside of Transform structs, etc.) in a separate PR?

@sergeyn
Copy link
Contributor Author

sergeyn commented May 31, 2020 via email

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 1, 2020

Transformers are back. I also noticed you have added RenderLineFill function, which should suffer from same issues as RenderLineStrip and I assume based on this commit you can figure out how to fix RenderLineFill as well.

@epezent
Copy link
Owner

epezent commented Jun 1, 2020

Before I merge, is this dependent on a particular version of ImGui or PR?

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 1, 2020 via email

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

I have a couple questions:

1)

int cnt = (int)ImMin(size_t(prims), (((size_t(1) << sizeof(ImDrawIdx) * 8) - 1 - DrawList._VtxCurrentIdx) / 4));

Is the 4 at the very end the vertex count per segment, or is it related to something else?

2)

if (cnt >= ImMin(64, prims))

What is the meaning of the number 64 here?

3)

cnt = (int)ImMin(size_t(prims), (((size_t(1) << sizeof(ImDrawIdx) * 8) - 1 - 0/*DrawList._VtxCurrentIdx*/) / 4));

For 0/*DrawList._VtxCurrentIdx*/, is this hard coded because you can assume that DrawList._VtxCurrentIdx is 0?

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 3, 2020

1 - yes, as each new segment will up the index by 4
3 - yes - this branch always allocates new command, which comes with _VtxCurrentIdx being 0. Assert would be a nice addition here.
2 - when current draw command command can fit at least 64 segments (but only when we want to render more than 64) then reuse current draw command. if not, and we want to render more than 64, allocate new draw command. The logic with 64 elements is to prevent a following case:
imagine you are in a state where current draw command is at the end of the buffer, and you have space for 1 segment to render only. Imagine also that your plot you are going to render is a) - huge,b) - not visible, i.e. - will be culled. If you would only check if 1 segment can be fit, you would always be taken slow code path of checking if new command needs to be allocated, because you'll be always left with space for 1 segment only. With 64 you make sure that for big plots you'll have inner loop be invoked with at least 64 entries, and 64x times less checks for available space. This basically reduces cost if you happen to have draw command almost filled up by the time you decide to render a plot.

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

Thanks for the clarification. This is resulting in some strange artifacting when zooming in on the benchmark plot from the demo. Can you confirm if this also happens for you?

image

image

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

Also, please check "Allow edits from maintainers" so that I can push my edits so far.

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 3, 2020

Ok, how do I do both of these things :) ? Zooming doesn't work for me on a benchmark plot

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

Comment out this line:

SetNextPlotLimits(0,1,0,1, ImGuiCond_Always);

For the second, it should be on the right side of this page for you.

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

I should note that when I have ImPlot_AntiAliased enabled, everything renders fine with 16-bit indices. So the issue seems to be isolated to this new code and not ImGui.

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 3, 2020

I see can these problems , but with a fix applied the problem goes away. You should also see the same problem without any of my changes. With your original code try pan the plot so that it's completely hidden (Culled out), and you'll see same garbage lines.
ImPlotAntialiased does not do PrimUnreserve, so it doesn't trigger that bug of ImGui.

I've enabled checkbox to allow you to commit.

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

I see. So, essentially this PR is incomplete without the fix on ImGui's side.

I see two short term solutions:

  1. disable culling so that PrimUnreserve isn't needed, or
  2. have a check for 16-bit indices and default to the AntiAliased method

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 3, 2020

3rd option would be to apply same fix to ImGui but on the ImPlot side. Not sure which out of the 3 solutions is ugliest.

I don't understand why that 2 line fix doesn't get merged either.

@epezent
Copy link
Owner

epezent commented Jun 3, 2020

They are all pretty ugly. 😄

Option 4 is to somehow count the number of culled segments before PrimReserve. The naive approach would be to have two loops; one that counts the number of culls need for PrimReserve, and one that does the rendering. That would require two calls to Getter/Transformer for every point though. The best alternative I can come up with is to do an inverse transformer operation on the plot bounds instead, and then check the untransformed points against this. That saves one call to Transformer per point, but still requires two calls to Getter (which isn't that expensive). Thoughts?

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 3, 2020

I suggest to wait until my PR fixing this issue gets merged in or it gets fixed some other way. I don't see any way to upvote an importance of a PR other than complaining in the comment section. I've poked them once again, and at this point I don't feel like spending any more time on other workarounds.

@epezent
Copy link
Owner

epezent commented Jun 4, 2020

I'm curious what the cost of doing a PrimReserve for each point that passes the cull check would be. It might have overhead the first time through, but not too bad thereafter.

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 4, 2020

it's not difficult to try it out and see how much the fps drops. Though doing it per segment is not a scalable approach. I like current idea of preallocating space because it has a lot of optimization potential.

@epezent
Copy link
Owner

epezent commented Jun 4, 2020

Method FPS (Steady State) Artifacting
DrawList.AddLine w/ ImGui AntiAliasing (ImPlotAntiAliased) 150 No
DrawList.AddLine w/o ImGui AntiAliasing 225 No
One call to PrimReserve + PrimUnreserve 430 Yes
Per Point call to PrimReserve 360 No
Two loops + one call to PrimReserve 380 No

Though not optimal, I think we could temporarily go with either the 4th or 5th option until ImGui is fixed.

@epezent
Copy link
Owner

epezent commented Jun 4, 2020

Two Loops (based off master branch code):

int will_render = 0;
const ImVec2 uv = DrawList._Data->TexUvWhitePixel;
for (int i1 = i_start; i1 != i_end; i1 = i1 + 1 < count ? i1 + 1 : i1 + 1 - count) {
    ImVec2 p2 = transformer(getter(i1));
    if (!cull || gp.BB_Grid.Overlaps(ImRect(ImMin(p1, p2), ImMax(p1, p2))))
        will_render++;
    p1 = p2;
}
ImVec2 p1 = transformer(getter(offset));
DrawList.PrimReserve(will_render * 6, will_render * 4);
for (int i1 = i_start; i1 != i_end; i1 = i1 + 1 < count ? i1 + 1 : i1 + 1 - count) {
    ImVec2 p2 = transformer(getter(i1));
    if (!cull || gp.BB_Grid.Overlaps(ImRect(ImMin(p1, p2), ImMax(p1, p2))))
        RenderLine(DrawList, p1, p2, line_weight, col_line, uv);
    p1 = p2;
}

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 6, 2020

I've tested latest master and problem seem to have been fixed.

@epezent
Copy link
Owner

epezent commented Jun 6, 2020

Sounds good. I need to rework it a bit to accommodate the other plots which use PrimReserve directly, and then I will merge this.

@epezent
Copy link
Owner

epezent commented Jun 7, 2020

@sergeyn, I made two substantial edits:

  1. Abstracted RenderLineStrip into RenderPrimtives. This now takes a Renderer type, such as LineRenderer or FillRenderer. This was done so that I can use your code for rendering other plot types. I plan on migrating some of the other plot functions to use this in favor of DrawList.AddXX
  2. I moved "offsetting" into the Getters. I think we discussed this before, and it makes the most sense for that to be handled there. As such, RenderPrimitives just iterates from 0 to getter.Count now. All of the index offsetting happens at the Getter level.

I think everything still works as expected in the demo, and indeed this fixed 16-bit indices when used with Omar's latest commit to ImGui. Would you mind giving it a thorough review one last time before we merge?

Thanks again for your hard work!

PS: Compared with master, I see an increase in FPS on the benchmark plot from 390 to 430 when using 16-bit indices and your rendering algorithm!

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 7, 2020

I've added 'static' as you don't want to have integer division when not necessary.

@epezent epezent merged commit c0bea59 into epezent:master Jun 7, 2020
charlesdaniels added a commit to charlesdaniels/giu that referenced this pull request Jun 10, 2020
This may reduce performance, but I was able to get implot to run out of
vertices with a relatively small dataset of a few tens of thousands of
points.

The upstream is working on this:

epezent/implot#41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants