-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GtCSVReader problems with jgrapht ConnectivityInspector #44
Comments
Hi,
The problem indeed occurs because of the jgrapht update. The ' " ' characters are by default trimmed by the updated ConnectivityInspector and thus the ids are not recognized as existing keys when processed by the gt reader. We will have a more detailed look on this.
To resolve this issue for now you can remove those characters from the DPLP input file, or of course use a generally modified format.
Best regards,
Manos
…________________________________
From: florisheijmans <notifications@github.com>
Sent: Thursday, January 14, 2021 5:11 PM
To: scify/JedAIToolkit <JedAIToolkit@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: [scify/JedAIToolkit] GtCSVReader problems with jgrapht ConnectivityInspector (#44)
This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.
During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.
For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs.
Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#44>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEVOMDYDXNAD4UCSR4JJ3BLSZ4JSDANCNFSM4WCVBWCA>.
|
Thank you! That fixes the problem. |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.
During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.
For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs (which should in turn produce 2225 clusters of size 2).
Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?
The text was updated successfully, but these errors were encountered: