Relational clustering over neighbourhood trees

A versatile relational clustering toolbox. It contains several clustering algorithms, check the original paper for details (see below). Still under development.

Installation

get SBT
clone this repository
position in the root folder and
- build a jar with dependencies sbt assembly
- build a jar without dependencies sbt package

Usage

Arguments

Usage: RelationalClustering.jar [OPTIONS]

OPTIONS

--db filepath                                                       database(s) with data to cluster (May be specified multiple times.)

--declarations file path                                            predicate declarations

--query comma-separated list                                        domains to query

--similarity [RCNT|HS|HSAG|CCFonseca|RKOH]                          similarity measure

--domain filepath                                                   predicate definitions

--aggregates comma-separated list [mean/min/max]                    a list of aggregator functions to use for the numerical attributes

--algorithm [Spectral|Hierarchical|DBscan|Affinity]                 clustering algorithm

--bagCombination [union|intersection]                               multiset combination method

--bagSimilarity [chiSquared|maximum|minimum|union]                  multiset similarity measure

--clauseLength n                                                    (CC and RKOH) maximal length of clause/walk 

--damping d                                                         damping parameter for Affinity Propagation

--definitionsDeviance Double                                        maximum standard deviation for a numeric attribute to be preserved (in % of the mean value)

--definitionsK Int                                                  top K most occurring tuples to select

--depth n                                                           depth of the neighbourhood tree

--eps d                                                             eps value for DBscan

--labels filepath                                                   labels for the query objects (May be specified multiple times.)

--linkage [average|complete|ward]                                   (Hierarchical) linkage

--preference d                                                      (Affinity Propagation) preference parameter

--root filePath                                                     temporary folder to use

--selection [model|saturation]                                      method to choose a single clustering

--selectionValidation [intraCluster|silhouette]                     evaluation criteria for clustering selection

--validationMethod [ARI|RI|intraCluster|majorityClass]              cluster validation method

--vertexCombination [avg|min|max]                                   how to combine the similarities of individual vertices in a hyperedge

--weights Array[Double]                                             
                                                                    comma-separated list of weights [attributes,attribute distribution,connections,vertex neighbourhood,edge distribution]

--exportNTrees flag                                                 export neighbourhood trees as gspan

--findDefinitions flag                                              extract definitions of clusters

-k n                                                                number of clusters to create

--localRepo  flag                                                   use local NodeRepository for all neighbourhood trees

--selectSingle  flag                                                select single clustering

--validate  flag                                                    perform clustering validation

Input files

Knowledge base/graph containing the facts in a domain (*.db)

Movie(Aoceanstwelve,Anelsonpeltz)
Movie(Aplayerthe,Awhoopigoldberg)
Movie(Apelicanbriefthe,Ajuliaroberts)
Movie(Aoceanstwelve,Ajuliaroberts)
...
Gender_male(Adavidsontag)
Gender_male(Arobertculp)
Gender_female(Acynthiastevenson)
Gender_male(Afredward)
Gender_female(Adinamerrill)
...
Genre(Asoderberghsteven,Acrime)
Genre(Apakulaalanj,Adrama)
Genre(Apakulaalanj,Amystery)
Genre(Aaltmanroberti,Adrama)
Workedunder(Aminianden,Asoderberghsteven)
Workedunder(Acaseyaffleck,Asoderberghsteven)
Workedunder(Aelliottgould,Asoderberghsteven)
Workedunder(Adenzelwashington,Apakulaalanj)
...

Definitions file specifying the domains of objects (*.def)

Gender_male(person)
Gender_female(person)
Genre(person,genre)
Movie(movie,person)
Workedunder(person,person)

Declarations file specifying the meaning of the arguments of predicates (*.dcl)

Gender_male(name)
Gender_female(name)
Genre(name,attr)
Movie(name,name)
Workedunder(name,name)

The arguments can have the following roles:

name - identifier of an objects/instance/example; this is essentially treated as the name of an instance
attr - identifies a discrete attribute value. Attribute name is given by the name of a predicate. It needs to have exactly one name argument.
number - identifies a continuous attribute value. Attribute name is given by the name of a predicate. It needs to have exactly one name argument.

These roles influence the way a neighbourhood tree is constructed.

Similarity measures

The following similarity measure a currently supported:

Relational clustering over neighbourhood trees (see Citing section)
Hybrid similarity measure introduced in Neville, Adler and Jensen: Clustering Relational Data Using Attribute and Link Information. Text Mining and Link Analysis Workshop, ICAI 2003
Hybrid similarity measure for annotated graphs introduced in Witsenburg and Blockeel: Improving the accuracy of similarity measures by using link information. Foundations of Intelligent Systems 2001
Conceptual clustering introduced in Fonseca, Santos Costa, Camacho: Conceptual clustering of multi-relational data. ILP 2011
[Not ready yet] Relational instance based learning
Graph kernels:

Rooted Kernel for ordered hypergraphs from Wachman, Khardon: Learning from Interpretations: A Rooted Kernel for Ordered Hypergraphs. ICML 2007

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request

Contact

If you have any question, feel free to send then at sebastijan.dumancic@cs.kuleuven.be

Citing

Please cite the following paper if you are using the code

@article{,
    author =       {Dumancic, Sebastijan and Blockeel, Hendrik},
    title =        {An expressive dissimilarity measure for relational clustering over neighbourhood trees},
    journal =      {Machine Learning journal},
    year =         {2017},
    url =          {https://lirias.kuleuven.be/handle/123456789/582293}
}

License

Release under Apache License, version 2.

Name		Name	Last commit message	Last commit date
Latest commit History 392 Commits
project		project
src		src
LICENCE		LICENCE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relational clustering over neighbourhood trees

Installation

Usage

Arguments

Input files

Similarity measures

Contributing

Contact

Citing

License

About

Releases

Packages

Languages

License

sebdumancic/ReCeNT

Folders and files

Latest commit

History

Repository files navigation

Relational clustering over neighbourhood trees

Installation

Usage

Arguments

Input files

Similarity measures

Contributing

Contact

Citing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages