-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Frequency
merge operations
#263
Comments
A difficulty that I'm not sure how to tackle is the required bandwidth equality on continuous data. If you want to add the following histograms: let a =
[(0.1,1);(0.2,1);(0.3,1)] //bandwith = 0.1
|> Map.ofList
let b =
[(0.15,1);(0.3,1)] //bandwidth = 0.15 or 0.05, nobody knows..
|> Map.ofList
merge a b
// result: [(0.1,1);(0.15,1);(0.2,1);(0.3,2)] is not valid!! Histograms (regardless if they are Solution
|
For now I decided to go with an unsatisfactory hybrid of (B) and (C). I added a parameter that requests the user to specify if the maps are based on equal binning or if it is categorical data. If its continuous data with unequal binning, the merge fails with a description explaining the issue. In future a procedure could be implemented that dissect both maps and creates a new one with a new binning. If my understanding is correct the bandwidth must be double the maximal bandwidth that is observed in the input maps. |
Merge operations for Maps
Data can be sorted into bins of predefined width using the
Frequency
orEmpiricalDistribution
module. If two datasets are binned and should be merged afterwards, several merging strategies are possible. A simple merge offreqA
andfreqB
is straightforward with keys that are present infreqA
andfreqB
are replaced with the values offreqB
.results in the following combination with
("k2",3)
froma
being replaced by("k2",1)
fromb
:Generic formulation of merge operations
I'm in the process of adding a generic function that gets an additional function that handles key duplicates. E.g.:
resulting in the combination of a and b with
("k2",3)
froma
being added to("k2",1)
fromb
:While this is trivial, I'm not sure how to handle a subtraction. Should the result from
subtract a b
result in:val it: Map<float,int> = map [("k1", 1); ("k2", 2); ("k3", 4)]
a
are subtracted by the corresponding values fromb
if keys are present in both mapsa
that are not present inb
are untouchedval it: Map<float,int> = map [("k1", 1); ("k2", 2); ("k3", -4)]
a
are subtracted by the values fromb
, even for keys that are not present ina
The latter option (b) makes no sense to me since frequency counts should not be negative, but I cannot think of applications in which the result of (a) makes any sense. Maybe the subtract function is not the best to start with because in this post they implemented (a) with addition and multiplication examples. Especially for the addition, a and b would give the correct result and I think it is intuitive to just apply the function to values of keys that are present in both maps.
@HarryMcCarney, do you know use cases that use
subtract
? Do you have any thoughts about this? I would suggest to add version (a) toFrequency
as well asEmpiricalDistribution
Additional remark: When applied to continuous data bandwidths must be equal, to not merge counts from overlapping bins!
The text was updated successfully, but these errors were encountered: