QUESTION #19

NaifahNurya · 2022-04-09T00:50:26Z

Thank you very much for the nice work.

I have a question(suggestion/feature request if not present).

Assume we have the following dataset as presented in this repo:

[["A", "A", "B", "A", "D"],
["C", "B", "A"],
["C", "A", "C", "D"]]
with the following attribute (time-attribute)

[[1, 1, 2, 3, 3],
[3, 8, 9],
[2, 5, 5, 7]].

Is there a possibility of avoiding/restricting patterns that are not with the same item? for example, ignore a pattern that will result in [A, A] or [C, C] etc. i.e item(i) != item( i+1).
Is a there possibility of restricting patterns to be generated if only started by a certain item. For example,
generate a sequence if only started by A or B or C etc.
Is there a possibility of restricting patterns to end by a certain value? For example,
generate a pattern only if it end with one item item, C or D or A.

Can you share on how to deal with above scenario.

Thank you.

skadio · 2022-04-09T01:05:01Z

Thank you @NaifahNurya for your interest.

The easiest would be to find the frequent patterns using Seq2Pat and then to do a quick post-processing to remove undesired patterns based on custom preferences like the above.

NaifahNurya · 2022-04-09T03:44:38Z

@skadio , Thank you for quick reply, let me work on it.

Also I have another issue, when I use the Seq2Pat on few dataset (with short length in each sequence) i can get the result. However in a large dataset (with many sequences having long length in some sequences) I got the following error:

patterns = seq2pat.get_patterns(min_frequency=65)
File "C:\Users\NaifahNurya\anaconda3\envs\Seq2Pat\lib\site-packages\sequential\seq2pat.py", line 411, in get_patterns
 patterns = self._cython_imp.mine()
File "sequential\backend\seq_to_pat.pyx", line 31, in sequential.backend.seq_to_pat.PySeq2pat.mine
**RuntimeError: bad allocation**

Can you help to Identify the cause for this.
If you give me your email I can send to you a sample txt file.

takojunior · 2022-04-26T14:29:11Z

Thanks @NaifahNurya .

This might be caused by the memory exhaust issue when dataset contains many long sequences. It seems relevant to a previous discussion #14.

To alleviate such memory issues, I would suggest to add constraints to further reduce the number of search paths. Also what we can do is to limit the number of columns, or apply data sampling before the mining, to better work with the memory resources.

skadio · 2022-04-26T16:11:42Z

Sampling the columns/limiting columns sound reasonably and might be required. You can start with small samples 10-20 columns and see how it behaves in your application before including more.

@takojunior you have an interesting suggestion on adding a span constraint to limit the search on columns. Do we have an example of how to add that constrained somewhere?

takojunior · 2022-04-26T16:30:20Z

Right, so one way is to enforce a span constraint to an attribute created by the order of items, e.g. [A, B, C, D] has the attribute [0, 1, 2, 3]. Enforcing a maximum span constraint will control the length of mined patterns in mining, and thus reduce the search space.

How to enforce such constraint can be referred to this example notebook: dichotomic_pattern_mining.ipynb. @skadio @NaifahNurya

NaifahNurya · 2022-05-01T00:55:23Z

@takojunior and @skadio , Thank you very much for this suggestion, let me work on it then I will share the feedback.

skadio · 2022-05-07T17:41:43Z

Closing the issue per discussion. Hope this helped!

skadio closed this as completed May 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUESTION #19

QUESTION #19

NaifahNurya commented Apr 9, 2022

skadio commented Apr 9, 2022

NaifahNurya commented Apr 9, 2022

takojunior commented Apr 26, 2022

skadio commented Apr 26, 2022

takojunior commented Apr 26, 2022

NaifahNurya commented May 1, 2022

skadio commented May 7, 2022

QUESTION #19

QUESTION #19

Comments

NaifahNurya commented Apr 9, 2022

skadio commented Apr 9, 2022

NaifahNurya commented Apr 9, 2022

takojunior commented Apr 26, 2022

skadio commented Apr 26, 2022

takojunior commented Apr 26, 2022

NaifahNurya commented May 1, 2022

skadio commented May 7, 2022