GitHub - mdgoldberg/clustering: Graph Clustering in OCaml - Markov- and Kruskal-based clustering algorithms for general graphs

mdgoldberg / clustering Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Graph Clustering in OCaml - Markov- and Kruskal-based clustering algorithms for general graphs

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
testing		testing
.gitignore		.gitignore
Kruskal.ml		Kruskal.ml
Makefile		Makefile
README		README
cartesian_graph.ml		cartesian_graph.ml
main.ml		main.ml
markov.ml		markov.ml
matrix.ml		matrix.ml
signatures.ml		signatures.ml
stats.ml		stats.ml

Repository files navigation

This was my final project for CS51. I worked in a group with two friends in the
course; I implemented all of the Markov clustering algorithm (found in
markov.ml), parts of the matrix module (found in matrix.ml), and all of the
main script (found in main.ml). We wrote the signatures.ml file together, which
provides signatures for the modules we used.

INSTALLATION:

First things first: enter the code directory and type "make". This
will compile our .ml files (this Makefile uses the corebuild tool to compile
OCaml files).

To run the program, just use ./main.native with all necessary options (see
below). We have provided three simple, example csv files, called example1.csv,
example2.csv, and example3.csv. The following commands will run our algorithms
on these files: 

./main.native ../data/example1.csv --has-labels int kruskal|markov
./main.native ../data/example2.csv --has-labels --matrix int kruskal|markov
./main.native ../data/example3.csv --has-labels int kruskal|markov

where kruskal|markov means either type kruskal or type markov (NOT BOTH),
depending on which you want to run.

Any of the above commands can also be given additional options, such as --arg1,
--arg2, and --verbose (see below).


USAGE: ./main.native csvfile float|int kruskal|markov [options]

VALID OPTIONS:

--matrix
This means that the csv file already represents a matrix of distances between
nodes.  If this flag isn't set, then we interpret the file as a list of points
in R^n, which we then translate into a matrix of distances.

--arg1=A
If this flag is set, then the first argument to the clustering algorithm will be
A. For Kruskal, this means it will cluster the data into A clusters.  For
Markov, this means the expansion parameter will be A. If the flag is absent, it
will default to 2 for Markov and N/2 for Kruskal (where N is the number of
nodes).

--arg2=B
If this flag is set, then the second argument to the clustering algorithm will
be B.  This is meaningless for Kruskal, which only takes one argument, so
Kruskal's algorithm will ignore this flag.

--has-labels
If this flag is set, then the csv file has labels in its first column. We take
this into account by skipping the first column during calculation, and then
using these labels as names for each node when we output the results.
NOTE: LABELS CANNOT HAVE THE DELIMITER CHARACTER IN THEM.

--has-header
If this flag is set, then the csv file has category headers in its first row. We
take this into account by skipping the first row in our calculations.

--verbose
If this flag is set, then the Markov algorithm will print out the current matrix
on each iteration. This flag has no implications for Kruskal's algorithm.

--delim='C'
If this flag is set, then the csv file is separated by C instead of by commas.
For example, if a csv used semicolons to separate its data, then you would use
the option --delim=';' so that we can take this into account.