New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageRank query results are inconsistent(1.APOC 2.Extended algorithmic package) #716

Open
crazyyanchao opened this Issue Sep 25, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@crazyyanchao

crazyyanchao commented Sep 25, 2018

  1. I have installed two extendsion packages
    apoc-3.4.0.1-all.jar
    graph-algorithms-algo-3.4.7.0.jar

  2. Performing pagerank on the same dataset varies hugely.(Database version:neo4j-community-3.4.7)
    2.1、apoc-3.4.0.1-all.jar

MATCH (n:专题) WITH collect(n) as nodes CALL apoc.algo.pageRank(nodes) YIELD node,score RETURN node.name,score ORDER BY score DESC
node.name score
"十一长假4" 11013.60778
"LDR测试2" 10587.83657
"自用_181" 7248.36549
"东沟岭农贸市场发现一女尸" 6147.92054
"981钻井平台_618" 4663.55536
"公安满意度4" 4086.03851
"LDR模糊4" 3917.40468
"APEC0" 3845.58618
"取消军训我3" 3799.40371
"政法系统满意程度1" 3787.1927

2.2、graph-algorithms-algo-3.4.7.0.jar存在的过程(ALL score is 0.15000000000000002 )

CALL algo.pageRank.stream(‘专题’,NULL,{iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node.name, score ORDER BY score DESC
node.name score
"武警广西总队停止有偿工作" 0.15000000000000002
"武警广西总队停止有偿工作" 0.15000000000000002
"981钻井平台" 0.15000000000000002
"土耳其反华" 0.15000000000000002
"抗战纪念日" 0.15000000000000002
"境外" 0.15000000000000002
"天津爆炸案" 0.15000000000000002
"泰国爆炸" 0.15000000000000002
"南海问题" 0.15000000000000002
"泰国遣返维吾尔族人" 0.15000000000000002
"缅甸特赦非法伐木工" 0.15000000000000002
"缅甸特赦非法伐木工(新)" 0.15000000000000002

The results were quite different ! Please tell me WHY? Thanks!!!

@tomasonjo

This comment has been minimized.

Show comment
Hide comment
@tomasonjo

tomasonjo Sep 25, 2018

Collaborator

Pagerank value of 0.15000000000000002 is the default value for nodes with no incoming relationships... seems like that no relationships get projected in the graph, which is weird given that you set NULL for relationship type, which should load all.

Collaborator

tomasonjo commented Sep 25, 2018

Pagerank value of 0.15000000000000002 is the default value for nodes with no incoming relationships... seems like that no relationships get projected in the graph, which is weird given that you set NULL for relationship type, which should load all.

@mneedham

This comment has been minimized.

Show comment
Hide comment
@mneedham

mneedham Sep 25, 2018

Collaborator

Hi @crazyyanchao,

Would you be able to share a small sample dataset that we can recreate this problem with? As @tomasonjo says it's weird why all the nodes have the initial PageRank value.

Collaborator

mneedham commented Sep 25, 2018

Hi @crazyyanchao,

Would you be able to share a small sample dataset that we can recreate this problem with? As @tomasonjo says it's weird why all the nodes have the initial PageRank value.

@crazyyanchao

This comment has been minimized.

Show comment
Hide comment
@crazyyanchao

crazyyanchao Sep 25, 2018

@mneedham @tomasonjo
If I run:
CALL algo.pageRank.stream(NULL,NULL,{iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node.name, score ORDER BY score DESC
The label ’专题‘ can figure out a value that looks reasonable.
Dataset maybe can not share,sorry!
Thanks for you reply!

crazyyanchao commented Sep 25, 2018

@mneedham @tomasonjo
If I run:
CALL algo.pageRank.stream(NULL,NULL,{iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node.name, score ORDER BY score DESC
The label ’专题‘ can figure out a value that looks reasonable.
Dataset maybe can not share,sorry!
Thanks for you reply!

@crazyyanchao

This comment has been minimized.

Show comment
Hide comment
@crazyyanchao

crazyyanchao Sep 27, 2018

@mneedham @tomasonjo @jexp @akollegger
I execute two cypher on the same linkedin dataset,but the result vary enormously!

1.The first way

CALL algo.pageRank('LinkedinID', NULL,  {iterations:20, dampingFactor:0.85, write: true,writeProperty:'pagerank'}) YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty
MATCH (n:LinkedinID) RETURN n.name,n.pagerank ORDER BY n.pagerank DESC LIMIT 10
n.name n.pagerank
"Dr. Imani Ma'at_29489954" 238797044.98089278
"Kristina Tanasichuk_21342877" 205712106.4265581
"Andy Jabbour_408109800" 175523863.48403177
"Kim Proctor_2794998" 170649994.17900914
"Michael Jacobs_3967109" 142688564.25065896
"Adele Canetti_11160947_" 105116298.79254237
"Marcia Stepanek_14481523n" 90105381.10887711
"Christy Riccardi_11084249" 78076928.37071984
"Gregg H._3628386" 78046192.97161181
"Hollis Thomases_245341" 75175480.38489856
"Jeff Molter_1411602" 73882728.68542062
"Terezie Mosby_119305546" 73044631.96094015
"Troy Stiner_91210468" 71168889.09655812
"John Robitscher, MPH_8334935" 70542084.97194709

2.The second way

CALL apoc.algo.pageRankWithCypher({iterations:20, write:true})
MATCH (n:LinkedinID) RETURN n.name,n.pagerank ORDER BY n.pagerank DESC LIMIT 10
n.name n.pagerank
"Bill Gates_0" 118.07033
"Richard Branson_0" 101.64432
"Pete Brownell_18332101" 77.71179
"Chuck Brooks_4888851" 74.96686
"Dr. Nicholas R. Scheidt, PsyD, AADP_26394892" 72.11293
"Mark Cuban_0" 67.6066
"Frank T. Mitchell_14176906" 67.3042
"Arianna Huffington_0" 66.41209
"Jack Welch_0" 63.05521
"Tarek Sobh_1564329" 62.50482

Finally

I think the second way is more reasonable! But why didi that happen in the first way? I don't understand! Can you explain that? Thanks :)

crazyyanchao commented Sep 27, 2018

@mneedham @tomasonjo @jexp @akollegger
I execute two cypher on the same linkedin dataset,but the result vary enormously!

1.The first way

CALL algo.pageRank('LinkedinID', NULL,  {iterations:20, dampingFactor:0.85, write: true,writeProperty:'pagerank'}) YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty
MATCH (n:LinkedinID) RETURN n.name,n.pagerank ORDER BY n.pagerank DESC LIMIT 10
n.name n.pagerank
"Dr. Imani Ma'at_29489954" 238797044.98089278
"Kristina Tanasichuk_21342877" 205712106.4265581
"Andy Jabbour_408109800" 175523863.48403177
"Kim Proctor_2794998" 170649994.17900914
"Michael Jacobs_3967109" 142688564.25065896
"Adele Canetti_11160947_" 105116298.79254237
"Marcia Stepanek_14481523n" 90105381.10887711
"Christy Riccardi_11084249" 78076928.37071984
"Gregg H._3628386" 78046192.97161181
"Hollis Thomases_245341" 75175480.38489856
"Jeff Molter_1411602" 73882728.68542062
"Terezie Mosby_119305546" 73044631.96094015
"Troy Stiner_91210468" 71168889.09655812
"John Robitscher, MPH_8334935" 70542084.97194709

2.The second way

CALL apoc.algo.pageRankWithCypher({iterations:20, write:true})
MATCH (n:LinkedinID) RETURN n.name,n.pagerank ORDER BY n.pagerank DESC LIMIT 10
n.name n.pagerank
"Bill Gates_0" 118.07033
"Richard Branson_0" 101.64432
"Pete Brownell_18332101" 77.71179
"Chuck Brooks_4888851" 74.96686
"Dr. Nicholas R. Scheidt, PsyD, AADP_26394892" 72.11293
"Mark Cuban_0" 67.6066
"Frank T. Mitchell_14176906" 67.3042
"Arianna Huffington_0" 66.41209
"Jack Welch_0" 63.05521
"Tarek Sobh_1564329" 62.50482

Finally

I think the second way is more reasonable! But why didi that happen in the first way? I don't understand! Can you explain that? Thanks :)

@tomasonjo

This comment has been minimized.

Show comment
Hide comment
@tomasonjo

tomasonjo Sep 27, 2018

Collaborator

Can you share this linkedin dataset?

Collaborator

tomasonjo commented Sep 27, 2018

Can you share this linkedin dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment