Skip to content

A paper about sequential data mining and experiments of some algorithms.

Notifications You must be signed in to change notification settings

ibrahimerdem/AdvancedPatternMining

Repository files navigation

ADVANCED PATTERN MINING: THE EVALUATION OF SEQUENTIAL PATTERN MINING ALGORITHMS

Abstract

As a problem, in order to extract sequential patterns from a sequence database, there are plenty of algorithms introduced so far. They use different techniques in terms of scanning database, support counting, etc. It is possible to obtain different performance measurement with different sequential pattern mining algorithms and different size of data. In this paper it is tried to measure the efficiency of three algorithms: SPADE, PrefixSpan, and CM-SPADE. To do this, it is used three different real data set from UCI Machine Learning Repository: 1) MSNBC, 2) Online Retail, 3) DNA Sequence, with an open source data mining tool, SPMF, specialized in pattern mining. It is obvious that each algorithm have its own advantages and drawbacks. Whereas some features seem to be advantageous for a specific type of data set, some are disadvantageous. It is obtained that the runtime efficiency of one algorithm not only depends upon the data set characteristics, but also minimum support threshold values are also significant impact on process times.

Keywords. Frequent Pattern Mining, Sequential Pattern Mining, PrefixSpan, SPADE, CM-SPADE, Efficiency of Algorithms

Data Source

Preprocessed Data

Tool

About

A paper about sequential data mining and experiments of some algorithms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published