-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QUESTION #19
Comments
Thank you @NaifahNurya for your interest. The easiest would be to find the frequent patterns using Seq2Pat and then to do a quick post-processing to remove undesired patterns based on custom preferences like the above. |
@skadio , Thank you for quick reply, let me work on it. Also I have another issue, when I use the Seq2Pat on few dataset (with short length in each sequence) i can get the result. However in a large dataset (with many sequences having long length in some sequences) I got the following error:
Can you help to Identify the cause for this. |
Thanks @NaifahNurya . This might be caused by the memory exhaust issue when dataset contains many long sequences. It seems relevant to a previous discussion #14. To alleviate such memory issues, I would suggest to add constraints to further reduce the number of search paths. Also what we can do is to limit the number of columns, or apply data sampling before the mining, to better work with the memory resources. |
Sampling the columns/limiting columns sound reasonably and might be required. You can start with small samples 10-20 columns and see how it behaves in your application before including more. @takojunior you have an interesting suggestion on adding a span constraint to limit the search on columns. Do we have an example of how to add that constrained somewhere? |
Right, so one way is to enforce a span constraint to an attribute created by the order of items, e.g. How to enforce such constraint can be referred to this example notebook: dichotomic_pattern_mining.ipynb. @skadio @NaifahNurya |
@takojunior and @skadio , Thank you very much for this suggestion, let me work on it then I will share the feedback. |
Closing the issue per discussion. Hope this helped! |
Thank you very much for the nice work.
I have a question(suggestion/feature request if not present).
Assume we have the following dataset as presented in this repo:
[["A", "A", "B", "A", "D"],
["C", "B", "A"],
["C", "A", "C", "D"]]
with the following attribute (time-attribute)
[[1, 1, 2, 3, 3],
[3, 8, 9],
[2, 5, 5, 7]].
Is there a possibility of avoiding/restricting patterns that are not with the same item? for example, ignore a pattern that will result in [A, A] or [C, C] etc. i.e item(i) != item( i+1).
Is a there possibility of restricting patterns to be generated if only started by a certain item. For example,
generate a sequence if only started by A or B or C etc.
Is there a possibility of restricting patterns to end by a certain value? For example,
generate a pattern only if it end with one item item, C or D or A.
Can you share on how to deal with above scenario.
Thank you.
The text was updated successfully, but these errors were encountered: