Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete example #12

Closed
akastrin opened this issue Apr 21, 2017 · 25 comments
Closed

Complete example #12

akastrin opened this issue Apr 21, 2017 · 25 comments

Comments

@akastrin
Copy link

akastrin commented Apr 21, 2017

I would like to use your link prediction library, but I am missing the complete example, including evaluation code. For now I have the following code:

import linkpred
import random

# Read network
G = linkpred.read_network('linkpred-master/examples/inf1990-2004.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))

# Exclude test network from learning phase
simrank = linkpred.predictors.SimRank(G, excluded=test.edges())
simrank_results = simrank.predict(c=0.5)

Could you please provide full example, i.e., how to calculate precision, recall, ROC curve, etc.

@rafguns
Copy link
Owner

rafguns commented Apr 25, 2017

Hi, unfortunately this is a bit ugly and I would like to rework the whole evaluation logic at some point to use scikit-learn's implementation of evaluation measures.

That being said, here is a full example:

import linkpred
import random
from matplotlib import pyplot as plt

random.seed(100)

# Read network
G = linkpred.read_network('examples/inf1990-2004.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test_set)

plt.plot(evaluation.recall(), evaluation.precision())

Note that the excluded argument to a predictor (SimRank in this case) is intended to exclude certain edges from appearing in the results, not to exclude them during training.

@rafguns
Copy link
Owner

rafguns commented Apr 25, 2017

Things to do here:

  • Provide full example like the one above in README
  • Remove need to have a line like test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges()). We should just be able to pass test.edges() to EvaluationSheet.
  • Somehow clarify what excluded is for. Maybe another name would work better?

@menghaoli001
Copy link

menghaoli001 commented May 6, 2018

I followed @rafguns example but got a key error message:

KeyError Traceback (most recent call last)
in ()
7
8 simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
----> 9 simrank_results = simrank.predict(c=0.5)
10
11 test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())

~\Anaconda3\lib\site-packages\linkpred\predictors\base.py in predict_and_postprocess(*args, **kwargs)
62 def add_postprocessing(func):
63 def predict_and_postprocess(*args, **kwargs):
---> 64 scoresheet = func(*args, **kwargs)
65 for u, v in self.excluded:
66 try:

~\Anaconda3\lib\site-packages\linkpred\predictors\eigenvector.py in predict(self, c, num_iterations, weight)
95 # upper triangle in the matrix, excluding the diagonal:
96 # sim(a, a) = 1.
---> 97 u = nodelist[i]
98 for j in range(i + 1, n):
99 if sim[i, j] > 0:

~\Anaconda3\lib\site-packages\networkx\classes\reportviews.py in getitem(self, n)
176
177 def getitem(self, n):
--> 178 return self._nodes[n]
179
180 # Set methods

KeyError: 0

@rafguns
Copy link
Owner

rafguns commented May 7, 2018

Thanks for pointing that out, @menghaoli001. At first sight, I think this is due to changes in networkx 2.x, where G.nodes() returns a node view instead of a list. If so, the fix is probably very simple. I'll take a closer look soon.

@rafguns
Copy link
Owner

rafguns commented May 9, 2018

@menghaoli001 I spun the issue you mention off into a separate bug (#15), which has now been fixed on master.

@bk3karim
Copy link

i have linkpred & dataset(test and train) , I want to convert dataset(just a part, inf_test_0) to list with python , and i want to account score precision ,auc-roc ,graph,table, (fp,fn,tp,tn) if you can send me the full exemple with source code as soon as possible , with all my respects.

@rafguns
Copy link
Owner

rafguns commented Jun 5, 2018

Not sure if I fully understand your question, especially the part about "convert dataset(just a part, inf_test_0) to list." Can you show me (part of) the dataset you have (e.g. post it as gist)?

Precision, ROC, and (tp, tn, fp, fn) tables are in linkpred. AUC currently is not. Not sure what you mean by "graph" in this context - some kind of visualization? Linkpred does not do that but if you have a nx.Graph, you can always draw it with nx.draw(G). That being said, networkx's visualization capacities are relatively limited and sometimes it is more useful to save the file to disk and use sofwtare like Gephi for visualization.

@bk3karim
Copy link

bk3karim commented Jun 6, 2018 via email

@StefanBloemheuvel
Copy link

hi! i am also trying to predict new connections in my network file and i would like to ask how i can use other techniques such as jaccard or adamic adar, and not only simrank. thanks in advance!

@StefanBloemheuvel
Copy link

@bk3karim could you perhaps share how you succeeded with the scikit-learn website link?

@rafguns
Copy link
Owner

rafguns commented Jun 25, 2018

@intStdu It works the same way for other predictors. That is, these lines:

simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)

can be replaced with (e.g.):

jaccard = linkpred.predictors.Jaccard(G, excluded=test.edges())
jaccard_results = jaccard.predict()

The only difference between predictors is that their predict methods sometimes take additional parameters.

@bk3karim
Copy link

bk3karim commented Jun 25, 2018 via email

@bk3karim
Copy link

bk3karim commented Jun 25, 2018 via email

@StefanBloemheuvel
Copy link

hi @rafguns i used your linkpred package for my bachelor thesis, how could i cite your work best to your preference? I work with bibtex!

@rafguns
Copy link
Owner

rafguns commented Jul 3, 2018

@intStdu Thanks, good question. The best way is probably to cite this book chapter, which is a basic introduction to link prediction using linkpred (open access version).

@rafguns
Copy link
Owner

rafguns commented Aug 21, 2018

I am going to close this issue, because its focus has become unclear. I will file separate issues for the oustanding items from #12 (comment).

@yanjies
Copy link

yanjies commented May 2, 2019

AssertionError Traceback (most recent call last)
in
16
17 simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
---> 18 simrank_results = simrank.predict(c=0.5)
19
20 test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())

d:\for_python_3.7\lib\site-packages\linkpred\predictors\base.py in predict_and_postprocess(*args, **kwargs)
65 for u, v in self.excluded:
66 try:
---> 67 del scoresheet[(u, v)]
68 except KeyError:
69 pass

d:\for_python_3.7\lib\site-packages\linkpred\evaluation\scoresheet.py in delitem(self, key)
193
194 def delitem(self, key):
--> 195 return dict.delitem(self, Pair(key))
196
197 def process_data(self, data, weight='weight'):

d:\for_python_3.7\lib\site-packages\linkpred\evaluation\scoresheet.py in init(self, *args)
125 "init() takes 1 or 2 arguments in addition to self")
126 # For link prediction, a and b are two different nodes
--> 127 assert a != b, "Predicted link (%s, %s) is a self-loop!" % (a, b)
128 self.elements = self._sorted_tuple((a, b))
129

AssertionError: Predicted link (NARIN F, NARIN F) is a self-loop!

how to deal with the problem of self-loop?

@bk3karim
Copy link

bk3karim commented May 2, 2019 via email

@bk3karim
Copy link

bk3karim commented May 2, 2019 via email

@ramthottempudi12
Copy link

I would like to use your link prediction library, but I am missing the complete example, including evaluation code. For now I have the following code:

import linkpred
import random

# Read network
G = linkpred.read_network('linkpred-master/examples/inf1990-2004.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 300))

# Exclude test network from learning phase
simrank = linkpred.predictors.SimRank(G, excluded=test.edges())
simrank_results = simrank.predict(c=0.5)

Could you please provide full example, i.e., how to calculate precision, recall, ROC curve, etc.

Hi I am unable to run SimRank for a given graph. Can you please help me. I am using lates anaconda. But there is a unicode error while reading a graph.

@rafguns
Copy link
Owner

rafguns commented Aug 22, 2019

Can you show what you are doing (code+data) and which error you encounter?

@ramthottempudi12
Copy link

ram1

@ramthottempudi12
Copy link

But my taks is given a list of edges from a text file. And apply Simrank for all pairs until converge.
how to give tolerance, and c value.

@rafguns
Copy link
Owner

rafguns commented Aug 22, 2019

That error has nothing to do with linkpred. It is because the backslahes in your string are interpreted as escape sequences, see https://docs.python.org/3/reference/lexical_analysis.html#literals. Either write "C:\\Users\\TR (etc.)" or writer"C:\Users\TR (etc.)".

@ramthottempudi12
Copy link

ramthottempudi12 commented Aug 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants