GtCSVReader problems with jgrapht ConnectivityInspector #44

florisheijmans · 2021-01-14T16:10:55Z

This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.

During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.

For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs (which should in turn produce 2225 clusters of size 2).

Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?

mthanos · 2021-01-14T18:01:34Z

Hi, The problem indeed occurs because of the jgrapht update. The ' " ' characters are by default trimmed by the updated ConnectivityInspector and thus the ids are not recognized as existing keys when processed by the gt reader. We will have a more detailed look on this. To resolve this issue for now you can remove those characters from the DPLP input file, or of course use a generally modified format. Best regards, Manos

…

________________________________ From: florisheijmans <notifications@github.com> Sent: Thursday, January 14, 2021 5:11 PM To: scify/JedAIToolkit <JedAIToolkit@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [scify/JedAIToolkit] GtCSVReader problems with jgrapht ConnectivityInspector (#44) This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java. During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work. For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs. Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#44>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEVOMDYDXNAD4UCSR4JJ3BLSZ4JSDANCNFSM4WCVBWCA>.

florisheijmans · 2021-01-15T08:38:02Z

Thank you! That fixes the problem.

florisheijmans closed this as completed Jan 15, 2021

mthanos mentioned this issue Jan 19, 2021

Cannot read ground truth #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GtCSVReader problems with jgrapht ConnectivityInspector #44

GtCSVReader problems with jgrapht ConnectivityInspector #44

florisheijmans commented Jan 14, 2021 •

edited

Loading

mthanos commented Jan 14, 2021 via email

florisheijmans commented Jan 15, 2021

GtCSVReader problems with jgrapht ConnectivityInspector #44

GtCSVReader problems with jgrapht ConnectivityInspector #44

Comments

florisheijmans commented Jan 14, 2021 • edited Loading

mthanos commented Jan 14, 2021 via email

florisheijmans commented Jan 15, 2021

florisheijmans commented Jan 14, 2021 •

edited

Loading