very slow on initializing gpmodel for non-consecutive group_data #4

aprilffff · 2020-06-12T08:38:03Z

As in #3, the speed for initializing consecutive group_data is tested great, but it seems not properly handeled non-consecutive data.
I have tested this on a super cool machine, and the performance between consecutive group data and non-consecutive ones are 0.4s and 1800+s.

pls help on this, because lgb model requires the data in its query order, which might be very different with the group_data requested in gaussian model.

fabsig · 2020-06-12T08:52:03Z

Just to double check, by non-consecutive you mean e.g. [1, 1, 5, 60, 60] instead of [1, 1, 2, 3, 3]? Can you provide a minimal working example?

aprilffff · 2020-06-12T08:53:04Z

like [1,2,3,1,2,3]

fabsig · 2020-06-12T08:54:59Z

Which setting do you use? Number of samples and number of different groups?

aprilffff · 2020-06-12T09:03:12Z

what do you mean by setting?
[1,2,3,1,2,3] means 6 samples, the first and fourth belong to group 1,second adn fifth to group2,third and last to group3.

fabsig · 2020-06-12T09:07:02Z

[1,2,3,1,2,3] means a total of 6 samples and 3 different groups. I guess this gives no problem :-). When do you experience performance issues?

aprilffff · 2020-06-12T09:14:48Z

That happens everytime I initialize gpboost. My dataset is as large as 10million samples,2000+groups, and samples in every group is not consecutive just like the example above.

fabsig · 2020-06-12T09:28:20Z

I see. 10'000'000 samples and 2'000 groups. I will investigate this.

aprilffff · 2020-06-12T09:57:48Z

btw, I cannot save the gp_model as well. I'm using the python wrapper of gpboost, but gpb.save_model() could only save the gbdt part, not with the gaussian model.

fabsig · 2020-06-12T10:03:54Z

Yes, that is correct. Saving of the GPModel is not implemented. But this is another topic. Please open another issue if this feature is desirable for you.

fabsig · 2020-06-12T14:37:55Z

I have fixed this now. Initialization of a GPModel with group_data that is not ordered takes now approximately the same time as in the ordered case (see #3 (comment)).

@aprilffff : Many thanks for raising this issue!

aprilffff mentioned this issue Jun 12, 2020

cannot save gp_model #5

Closed

fabsig closed this as completed Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow on initializing gpmodel for non-consecutive group_data #4

very slow on initializing gpmodel for non-consecutive group_data #4

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020 •

edited

Loading

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020 •

edited

Loading

fabsig commented Jun 12, 2020

very slow on initializing gpmodel for non-consecutive group_data #4

very slow on initializing gpmodel for non-consecutive group_data #4

Comments

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020 • edited Loading

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020

aprilffff commented Jun 12, 2020

fabsig commented Jun 12, 2020 • edited Loading

fabsig commented Jun 12, 2020

fabsig commented Jun 12, 2020 •

edited

Loading

fabsig commented Jun 12, 2020 •

edited

Loading