Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION #19

Closed
NaifahNurya opened this issue Apr 9, 2022 · 7 comments
Closed

QUESTION #19

NaifahNurya opened this issue Apr 9, 2022 · 7 comments

Comments

@NaifahNurya
Copy link

Thank you very much for the nice work.

I have a question(suggestion/feature request if not present).

Assume we have the following dataset as presented in this repo:

[["A", "A", "B", "A", "D"],
["C", "B", "A"],
["C", "A", "C", "D"]]
with the following attribute (time-attribute)

[[1, 1, 2, 3, 3],
[3, 8, 9],
[2, 5, 5, 7]].

  1. Is there a possibility of avoiding/restricting patterns that are not with the same item? for example, ignore a pattern that will result in [A, A] or [C, C] etc. i.e item(i) != item( i+1).

  2. Is a there possibility of restricting patterns to be generated if only started by a certain item. For example,
    generate a sequence if only started by A or B or C etc.

  3. Is there a possibility of restricting patterns to end by a certain value? For example,
    generate a pattern only if it end with one item item, C or D or A.

Can you share on how to deal with above scenario.

Thank you.

@skadio
Copy link
Contributor

skadio commented Apr 9, 2022

Thank you @NaifahNurya for your interest.

The easiest would be to find the frequent patterns using Seq2Pat and then to do a quick post-processing to remove undesired patterns based on custom preferences like the above.

@NaifahNurya
Copy link
Author

@skadio , Thank you for quick reply, let me work on it.

Also I have another issue, when I use the Seq2Pat on few dataset (with short length in each sequence) i can get the result. However in a large dataset (with many sequences having long length in some sequences) I got the following error:

patterns = seq2pat.get_patterns(min_frequency=65)
File "C:\Users\NaifahNurya\anaconda3\envs\Seq2Pat\lib\site-packages\sequential\seq2pat.py", line 411, in get_patterns
 patterns = self._cython_imp.mine()
File "sequential\backend\seq_to_pat.pyx", line 31, in sequential.backend.seq_to_pat.PySeq2pat.mine
**RuntimeError: bad allocation**

Can you help to Identify the cause for this.
If you give me your email I can send to you a sample txt file.

@takojunior
Copy link
Contributor

Thanks @NaifahNurya .

This might be caused by the memory exhaust issue when dataset contains many long sequences. It seems relevant to a previous discussion #14.

To alleviate such memory issues, I would suggest to add constraints to further reduce the number of search paths. Also what we can do is to limit the number of columns, or apply data sampling before the mining, to better work with the memory resources.

@skadio
Copy link
Contributor

skadio commented Apr 26, 2022

Sampling the columns/limiting columns sound reasonably and might be required. You can start with small samples 10-20 columns and see how it behaves in your application before including more.

@takojunior you have an interesting suggestion on adding a span constraint to limit the search on columns. Do we have an example of how to add that constrained somewhere?

@takojunior
Copy link
Contributor

Right, so one way is to enforce a span constraint to an attribute created by the order of items, e.g. [A, B, C, D] has the attribute [0, 1, 2, 3]. Enforcing a maximum span constraint will control the length of mined patterns in mining, and thus reduce the search space.

How to enforce such constraint can be referred to this example notebook: dichotomic_pattern_mining.ipynb. @skadio @NaifahNurya

@NaifahNurya
Copy link
Author

@takojunior and @skadio , Thank you very much for this suggestion, let me work on it then I will share the feedback.

@skadio
Copy link
Contributor

skadio commented May 7, 2022

Closing the issue per discussion. Hope this helped!

@skadio skadio closed this as completed May 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants