Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try harder to plot all areas of overlap #84

Open
larseggert opened this issue Aug 26, 2021 · 4 comments
Open

Try harder to plot all areas of overlap #84

larseggert opened this issue Aug 26, 2021 · 4 comments

Comments

@larseggert
Copy link

larseggert commented Aug 26, 2021

It seems that eulerr simply tries to minimize the residuals when laying out the plot. For plot with many areas of overlap, some of which can be small, eulerr thus doesn't plot some of these small areas of overlap, which can be problematic.

It might make sense to add a mode where plotting all areas of overlap is given priority over making all area sizes as close as possible to the data?

Here's an example:

        original fitted residuals regionError
A            534    534         0       0.000
B             22     22         0       0.000
C            346    346         0       0.000
D              0      0         0       0.000
A&B           57     57         0       0.000
A&C          132    132         0       0.000
A&D           41     41         0       0.000
B&C           23     23         0       0.000
B&D            0      0         0       0.000
C&D            2      2         0       0.000
A&B&C         86     86         0       0.000
A&B&D         13     13         0       0.000
A&C&D         23     23         0       0.000
B&C&D          1      0         1       0.001
A&B&C&D       36     36         0       0.000

The B&C&D overlap is completely missing from the plot:
Rplots
(A is Path1, B=2, C=3, D=Volunteers)

@jolars
Copy link
Owner

jolars commented Aug 26, 2021

Yes, this is basically related to #67, and I'm definitely open to alternatives to the least-squares function that exists. It is, howver, often impossible to plot all intersections, so it's not possible to achieve a loss function that always ensures all overlaps exist in the plot. Do you have any particular loss function in mind? Maybe just some combination of proportional and absolute loss I guess.

@jmw86069
Copy link

I just came here to look for a similar workaround, wondering if I could circumvent the loss function to "push harder" to display overlaps. It looks like the magic happens inside eulerr:::fit_diagram? I'm not sure how to edit the algorithm but it could be useful to do so at some point.

@jolars
Copy link
Owner

jolars commented Sep 28, 2021

The part in the code where this is computed is here:

eulerr/src/optim_final.cpp

Lines 125 to 131 in 625d7ff

// return sums of squared errors
return std::inner_product(fit.begin(),
fit.end(),
areas.begin(),
0.0,
std::plus<double>(),
[](double a, double b) { return (a - b)*(a -b); });

So adding new loss functions should be quite easy.

@GDCCP
Copy link

GDCCP commented Oct 4, 2021

With this:

// compute loss between the actual and desired areas
//          original is minimizing sum of square of area difference
//          modification:
//             Let x_ be the estimate of x -- the region area
//             given 0 < x <= 1, x_ >= 0, with sum(all x) == 1
//          A perfect fit would have
//             x = some constant * x_ for every x_,x pair
//          Rewrite to
//             x / x_ = some constant r, r > 0
//          The estimated r_ is sum(x)/sum(x_)
//          If we minimize sum of square of (x/x_ - r_) is not good as each fit
//          has different r_. So better use the expression (x/x_ - r_)/r_,
//          So loss is:  sum of square ((x/x_ - r_)/r_)
//
//          Note that x_ can be zero. In that case, a small value will be used instead
//          This tweat is by design to discourage/remove empty region in the
//          final solution. This is why x/x_ is used instead of x_/x
//
// [[Rcpp::export]]
double optim_final_loss(const std::vector<double>& par,
                        const std::vector<double>& areas,
                        const bool circle)
{
  const auto small_value = 1e-10/areas.size();
  auto fit = intersect_ellipses(par, circle, false);
  auto sum_areas = std::accumulate(areas.begin(),areas.end(),0.0);
  auto sum_fit = std::accumulate(fit.begin(),fit.end(),0.0);
  auto x = areas; std::transform(x.begin(),x.end(),x.begin(),[sum_areas](double x){ return x/sum_areas; });
  auto x_ = fit; std::transform(x_.begin(),x_.end(),x_.begin(),[sum_fit](double x){ return x/sum_fit; });
  auto r_ = std::accumulate(x.begin(),x.end(),0.0)/std::accumulate(x_.begin(),x_.end(),0.0);
  // now adjust the tiny values in x to small_value * r_ so that if the we get r_ when x_ is close to 0
  // Also keep this loss function continuous
  std::transform(x.begin(),x.end(),x.begin(),[small_value,r_](double a)->double{return std::max(a,small_value * r_);});
  // now adjust x_
  std::transform(x_.begin(),x_.end(),x_.begin(),[small_value](double a)->double{return std::max(a,small_value);});
  auto ratios = x; std::transform(ratios.begin(),ratios.end(),x_.begin(),ratios.begin(),std::divides<double>());
  std::transform(ratios.begin(),ratios.end(),ratios.begin(),[r_](double r)->double{auto diff=(r-r_)/r_; return diff*diff;});
  
  return std::accumulate(ratios.begin(),ratios.end(),0.0);
}

it seems to improve on plots with 4 sets with respect to plotting all areas, but no good results with 5 or more sets. My test cases are randomly generated. I am not sure if there is no much better solution or the solver cannot find it.

                            original fitted orig_area fitted_area residuals
data03                            28 78.049     0.168       0.167    -0.001
data01&data03                     23 50.110     0.138       0.107    -0.031
data01&data03&data04              19 39.643     0.114       0.085    -0.029
data03&data04                     17 39.789     0.102       0.085    -0.017
data01                            13 49.783     0.078       0.106     0.029
data01&data04                      9 25.432     0.054       0.054     0.000
data01&data02&data03               9 34.934     0.054       0.075     0.021
data02&data03&data04               9 24.660     0.054       0.053    -0.001
data04                             8 23.370     0.048       0.050     0.002
data02&data03                      8 17.020     0.048       0.036    -0.012
data01&data02&data03&data04        7 37.344     0.042       0.080     0.038
data02                             6 21.075     0.036       0.045     0.009
data01&data02                      6 13.565     0.036       0.029    -0.007
data02&data04                      3  8.613     0.018       0.018     0.000
data01&data02&data04               2  4.725     0.012       0.010    -0.002
                                   original fitted orig_area fitted_area
data03                                   17  9.817     0.098       0.057
data01&data03&data05                     16 20.604     0.092       0.119
data03&data05                            11 16.887     0.064       0.097
data01&data03&data04                     11  0.000     0.064       0.000
data03&data04&data05                     10  9.598     0.058       0.055
data01                                    8 19.694     0.046       0.113
data01&data03&data04&data05               8 11.174     0.046       0.064
data01&data03                             7  0.479     0.040       0.003
data01&data04                             7  0.000     0.040       0.000
data03&data04                             7 19.592     0.040       0.113
data05                                    6  5.569     0.035       0.032
data01&data02&data03                      6  4.185     0.035       0.024
data01&data05                             5  3.441     0.029       0.020
data04&data05                             5  0.000     0.029       0.000
data02&data03&data04                      5  5.981     0.029       0.034
data01&data02&data03&data04               5  6.955     0.029       0.040
data01&data02                             4  6.790     0.023       0.039
data02&data03                             4  0.000     0.023       0.000
data02&data03&data05                      4  0.000     0.023       0.000
data02&data03&data04&data05               4  0.050     0.023       0.000
data02                                    3  9.593     0.017       0.055
data04                                    3  6.743     0.017       0.039
data02&data04                             3  1.768     0.017       0.010
data02&data05                             3  0.000     0.017       0.000
data01&data02&data03&data05               3  2.539     0.017       0.015
data01&data02&data05                      2  0.000     0.012       0.000
data01&data04&data05                      2  0.000     0.012       0.000
data01&data02&data03&data04&data05        2 12.137     0.012       0.070
data01&data02&data04                      1  0.001     0.006       0.000
data01&data02&data04&data05               1  0.000     0.006       0.000
data02&data04&data05                      0  0.000     0.000       0.000
                                   original fitted orig_area fitted_area
data03&data05                            20 32.289     0.120       0.203
data03&data04&data05                     15 20.102     0.090       0.127
data03&data04                            14 10.133     0.084       0.064
data03                                   13  3.671     0.078       0.023
data03&data06                            11  6.609     0.066       0.042
data05                                    9  1.262     0.054       0.008
data04                                    8 14.441     0.048       0.091
data02&data03&data04                      8  9.972     0.048       0.063
data02                                    7 12.443     0.042       0.078
data03&data05&data06                      7 11.153     0.042       0.070
data02&data03                             6  1.253     0.036       0.008
data04&data05                             6  0.000     0.036       0.000
data02&data03&data05                      5  0.000     0.030       0.000
data02&data04                             4  2.819     0.024       0.018
data02&data03&data06                      4  6.088     0.024       0.038
data03&data04&data06                      4  0.000     0.024       0.000
data02&data03&data04&data05               4  5.331     0.024       0.034
data02&data05                             3  0.000     0.018       0.000
data03&data04&data05&data06               3  2.356     0.018       0.015
data06                                    2  6.701     0.012       0.042
data04&data06                             2  0.000     0.012       0.000
data05&data06                             2  0.000     0.012       0.000
data02&data05&data06                      2  0.000     0.012       0.000
data02&data03&data04&data06               2  1.536     0.012       0.010
data02&data03&data05&data06               2  2.196     0.012       0.014
data02&data03&data04&data05&data06        2  7.310     0.012       0.046
data04&data05&data06                      1  0.000     0.006       0.000
data02&data04&data05&data06               1  0.000     0.006       0.000
data02&data06                             0  1.052     0.000       0.007
data02&data04&data05                      0  0.000     0.000       0.000
data02&data04&data06                      0  0.000     0.000       0.000

I am working on this as a low priority task, so I am not ready to go deep into the solver yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants