You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Test data for the Apriori algorithm
# One transaction per line, items are separated with whitespaces
bread butter sugar
coffee milk sugar
bread coffee milk sugar
coffee milk
run result is :
frequentItemSets.size():4
result .............[coffee, milk]
result .............[sugar]
result .............[milk]
result .............[coffee]
frequentItemSetCount =1 but frequentItemSets.size()=4
The text was updated successfully, but these errors were encountered:
This is not a bug, but normal behavior. Specifying a frequentItemSetCount does not guarantee, that exactly that many frequent item sets are found. It is just away to avoid that very few item sets are found, if the minimum confidence threshold has been chosen too restrictively. Depending on the given data set, it might not be possible to find as many item sets as specified. On very small data sets such as the one you used, it is very likely that more frequent item sets are returned. This isn't a bug either. The algorithm just successively decreases the minimum confidence (starting with 1.0 in your example) until enough item sets have been found. If the minimum confidence, which is used in that last iteration, is reached by more item sets than specified, all of them are returned. This is intentional, because the algorithm cannot decide, which ones to include (they all reach the same minimum confidence and there is no criteria for measuring their quality besides that). Furthermore, if association rules should be generated in a second step, all of the frequent item sets must be used, otherwise the learned rules will be incomplete. If you only want to find a single item set in the given example, you must decide on your own, which item set to keep (probably the first one, because it is the only one including two items).
As a future improvement, it would be possible to return a custom implementation of the type SortedSet, which provides sort- and filter-methods such as the class RuleSet does. This would ease to manually filter the returned item sets, if too many are returned. The progress on that enhancement is from now on tracked here: #4
Furthermore, I added additional information to the library's README to avoid future misunderstandings.
Junit test code is:
data1.txt content is:
run result is :
frequentItemSetCount =1 but frequentItemSets.size()=4
The text was updated successfully, but these errors were encountered: