Difference between anon function results and normal function results. Anon function giving 0 where as normal function result is a higher magnitude value(which is not any where close to 0) #17

AbhishekNalamothu · 2020-02-10T17:59:27Z

Anon function is giving 0 where as normal function result is a higher magnitude value (for example : 20, 30, -25, -40).

Example: difference between anon function and normal function results
anon function : anon_F(D)-> 0   
normal function : F(D) -> 20

Providing differential privacy enabled aggregated data with the above difference (in example) to an end user might mislead him/her during their analysis.

Is there a way to handle this?

The text was updated successfully, but these errors were encountered:

celiayz · 2020-02-18T17:11:52Z

Hi Abhishek,

Differential privacy hides the contribution of any single user. If the original function and anon function give dramatically different results so that analysis is misleading, then the original did not contain enough contributing users to make anonymous analysis useful.

AbhishekNalamothu · 2020-02-20T18:46:09Z

Thanks @celiayz .
How can I avoid this zero problem? If you have any suggestions, could you please suggest?

This modified number_of_carrots_eaten data and the following query on this data reproduces this zero problem.

select d.animal_group, count(1), sum(case when count_carrots_eaten = 0 THEN 1 ELSE 0 END) as zero_counts,((1-avg(d.count_carrots_eaten))*100) as Zero_percent, avg(d.count_carrots_eaten),
sum(d.COUNT_CARROTS_EATEN) as carrots_eaten,
anon_sum(d.COUNT_CARROTS_EATEN, 5) as anon_carrots_eaten
from animals_and_carrots_bin_new d group by d.animal_group order by carrots_eaten;

In the animal data set, groups with 70, 80, 90 percent values as zero are showing the 0 problem.

By doing other experiments, I realized that the 0 problem depends on different factors. Number of contributing users and how many of those users having zeros in that group.

I would like to know how can I avoid this problem?

Once again thank you so much.

celiayz · 2020-02-20T19:23:04Z

The best way to avoid the problem is to add more data. You can also try increasing the value of epsilon and using manually-specified bounds (ex. use ANON_SUM(column, lower, upper, epsilon)).

AbhishekNalamothu · 2020-02-20T19:36:45Z

Thanks @celiayz for your prompt response.

Suppose we have a larger dataset, aggregated based on 'n' number of groups. When 'm' number of groups have very few smaller data points compared to the rest (n-m) groups. Do we expect those 'm' groups to have 0 value upon aggregation?

What if I do not have more data to add?
My use case requires not providing bounds. I want to use approx_bounds provided by google to automatically detect the bounds.
Also, I am afraid increasing epsilon may cause security problem.

celiayz · 2020-02-20T20:40:48Z

Yes, for the 'm' groups that have fewer contributing users to the group, we expect that the data could return null or 0 for that group.

If there is no more data to add, then unfortunately the data set is too small to be able to hide the contribution of a single user statistically. Then differentially private analysis is probably not appropriate when working with the data set.

AbhishekNalamothu · 2020-02-20T21:01:05Z

@celiayz , returning null would be fine with analysis but returning 0 misleads the analysts. Is there a way to fix such that instead of returning 0 it returns null?
Also, If there is an error, or not enough data to process, then isn't it ideal to return an “error”, not 0 because 0 is not an error value.

Thank you @celiayz

celiayz · 2020-02-20T22:04:59Z

I see, the fact that it is returning 0 is likely an implementation detail to do with the noising + snapping mechanisms. When the value is close enough to 0, the answer gets snapped down to 0. Since the aggregation functions do not know that there isn't enough data, nor is there any error, it will return 0 instead of null. Therefore, I don't see any meaningful way to get the library to return null instead of 0.

dibakch · 2021-07-14T07:02:57Z

Closing this for now. Feel free to re-open.

dibakch closed this as completed Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between anon function results and normal function results. Anon function giving 0 where as normal function result is a higher magnitude value(which is not any where close to 0) #17

Difference between anon function results and normal function results. Anon function giving 0 where as normal function result is a higher magnitude value(which is not any where close to 0) #17

AbhishekNalamothu commented Feb 10, 2020 •

edited

celiayz commented Feb 18, 2020

AbhishekNalamothu commented Feb 20, 2020

celiayz commented Feb 20, 2020

AbhishekNalamothu commented Feb 20, 2020

celiayz commented Feb 20, 2020

AbhishekNalamothu commented Feb 20, 2020

celiayz commented Feb 20, 2020

dibakch commented Jul 14, 2021

Difference between anon function results and normal function results. Anon function giving 0 where as normal function result is a higher magnitude value(which is not any where close to 0) #17

Difference between anon function results and normal function results. Anon function giving 0 where as normal function result is a higher magnitude value(which is not any where close to 0) #17

Comments

AbhishekNalamothu commented Feb 10, 2020 • edited

celiayz commented Feb 18, 2020

AbhishekNalamothu commented Feb 20, 2020

celiayz commented Feb 20, 2020

AbhishekNalamothu commented Feb 20, 2020

celiayz commented Feb 20, 2020

AbhishekNalamothu commented Feb 20, 2020

celiayz commented Feb 20, 2020

dibakch commented Jul 14, 2021

AbhishekNalamothu commented Feb 10, 2020 •

edited