Skip to content

A Pyspark implementation of the CNGF Algorithm used for Link Prediction

License

Notifications You must be signed in to change notification settings

plotlabs/link-prediction-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

link-prediction-pyspark

A Pyspark implementation of the CNGF Algorithm used for Link Prediction.

CNGF Algorithm
The algorithm helps in predicting which nodes in a graph are most likely to be connected in the future. This can be used for Social Networks to envision connection between various entities.

The algorithm proves to be more efficient than traditional algorithms as it uses the subgraph of two nodes x and y and their common neighbours to forsee their connection in future and not the whole graph. It first calculates the Guidance by dividing the degree of a common neighbour in the subgraph with the log of degree of that neighbour in the whole graph. Then it takes the sum of guidances of all common neighbours of x and y to compute Similarity. Higher the similarity, more the chance of a connection in future.

Requires

  1. Python 2.7+
  2. Apache Spark 2.0.0+
  3. Graphframes 0.2.0+

Usage

To run the program, clone the repository and run the following command:

$SPARK_HOME/bin/spark-submit --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 cngf.py file_path separator

It requires 2 arguments:

  • file_path: path of the file containing the data.
  • separator: the column separator used in the file

Example

As an example, the program can be run on the example txt file - example.txt. The columns in this file are separated using space, so to run the program on this file, run the following command:

$SPARK_HOME/bin/spark-submit --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 cngf.py "example.txt" " "

Reference

Liyan Dong, Yongli Li, Han Yin, Huang Le, and Mao Rui, “The Algorithm of Link Prediction on Social Network,” Mathematical Problems in Engineering, vol. 2013, Article ID 125123, 7 pages, 2013. https://doi.org/10.1155/2013/125123.

About

A Pyspark implementation of the CNGF Algorithm used for Link Prediction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages