Spark-FIM

Spark-FIM is a library of scalable frequent itemset mining algorithms based on Spark. It includes:

PHybridFIN - A parallel frequent itemset mining algorithm based on a novel data structure named HybridNodeset to represent itemsets. It achieves a significantly better performance on different datasets when the minimum support decreases comparing to the FP-Growth algorithm which is implemented in Spark MLlib.

Examples

Scala API

val minSupport = 0.85
val numPartitions = 4

val spark = SparkSession
  .builder()
  .appName("PHyrbidFINExample")
  .master("local[*]")
  .getOrCreate()

val schema = new StructType(Array(
  StructField("features", StringType)))
val transactions = spark.read.schema(schema).text("data/chess.csv").cache()
val numTransactions = transactions.count()
val startTime = currentTime
val freqItemsets = new PHybridFIN()
  .setMinSupport(minSupport)
  .setNumPartitions(transactions.rdd.getNumPartitions)
  .setDelimiter(" ")
  .transform(transactions)

val numFreqItemsets = freqItemsets.count()
val endTime = currentTime
val totalTime: Double = endTime - startTime

println(s"====================== PHybridFIN - STATS ===========================")
println(s" minSupport = " + minSupport + s"    numPartition = " + numPartitions)
println(s" Number of transactions: " + numTransactions)
println(s" Number of frequent itemsets: " + numFreqItemsets)
println(s" Total time = " + totalTime/1000 + "s")
println(s"=====================================================================")

spark.stop()

Requirements

Spark-FIM is built against Spark 2.1.1.

Build From Source

sbt package

Licenses

Spark-FIM is available under Apache Licenses 2.0.

Contact & Feedback

If you encounter bugs, feel free to submit an issue or pull request. Also you can mail to:

hibayesian (hibayesian@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
project		project
src		src
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-FIM

Examples

Scala API

Requirements

Build From Source

Licenses

Contact & Feedback

About

Releases

Packages

Languages

License

hibayesian/spark-fim

Folders and files

Latest commit

History

Repository files navigation

Spark-FIM

Examples

Scala API

Requirements

Build From Source

Licenses

Contact & Feedback

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages