-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dictionary keys can be non-unique #959
Comments
Let's raise an error when duplicated keys are discovered on the same level. I can make a function to merge duplicated keys, but it can be fairly complex, because merging keys can cause new duplication in their sub-keys. |
Actually, we might want to allow duplicated keys across different levels of hierarchy. Currently it seems to be working as expected: dfm_lookup(dfm(txt), dic1)
# Document-feature matrix of: 2 documents, 3 features (16.7% sparse).
# 2 x 3 sparse Matrix of class "dfm"
# features
# docs A.X B.Y A.Y
# doc1 3 2 2
# doc2 0 2 2
dfm_lookup(dfm(txt), dic1, levels = 1)
Document-feature matrix of: 2 documents, 2 features (0% sparse).
# 2 x 2 sparse Matrix of class "dfm"
# features
# docs A B
# doc1 5 2
# doc2 2 2
dfm_lookup(dfm(txt), dic1, levels = 2)
# Document-feature matrix of: 2 documents, 2 features (25% sparse).
# 2 x 2 sparse Matrix of class "dfm"
# features
# docs X Y
# doc1 3 4
# doc2 0 4 A use-case? Could be:
(or could be reversed.) If you wanted an aggregate at level 2, to combine the values for sport across different countries, you want the values pooled for the duplicate key at level 2. So maybe we should only issue a warning, and only if the flattened key is non-unique? For a level-1-only dictionary, we would simply combine the keys by |
It is true that keys can be unique when dictionary is flattened, but I cannot think of reasons to do
instead of
Can you give me an example? That said, we can allow to use duplicated keys, because identical keys are merged in recompilation in |
I fully agree with that. But I was thinking more of the example where the top level keys were different, and the level 2 keys were identical. But that's not really a problem either, since we have the |
Note also that although identical keys are already merged, and therefore there may not be any problem identical keys, we still have a problem with the indexing (example above). |
Example:
We have tested whether this creates a problem for the
*l_lookup
functions, and it does not:but it does create a problem for indexing.
The text was updated successfully, but these errors were encountered: