Table dataset for disambiguation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
AmericanWarsFewer1000death.csv
AsterixUderzoOnly.csv
CPU.csv
CartoonAcedemyAwards.csv
ChemicalElements.csv
EnglishFootballClubs.csv
FailedAssassinations.csv
GrammyAwardForSongOfTheYear.csv
HouseOfAlpin.csv
Hurricane.csv
IndianRailways.csv
NurnbergProcesses.csv
README.txt
README.txt~
RelatedDrugDeath.csv
TennisPlayersbyGrandSlames.csv
Universities.csv
Windmills.csv
academyAwardForBestActor.csv
americanTelevisionEpisodesWithLGBTThemes.csv
anthemsOfUnitedNationsMemberStates.csv
battlesofAmericanCivilWar.csv
cathedralsInItaly.csv
countiesOfIreland.csv
eurovisionSongContestWinners.csv
fatalFormulaOneAccidents.csv
formulaOneCircuits.csv
jamesBondTitleThemes.csv
languagesWithOfficialStatusInIndia.csv
losAngelesHistoricCulturalMonuments.csv
mostExpensiveCarsSoldInAuction.csv
mostViewedYouTubeVideos.csv
mountainPeaksOfTheUnitedStates.csv
nationalCapitalsOfCountriesInEurope.csv
newberyMedalWinners.csv
nuclearResearchReactors.csv
politicalPartiesInGermany.csv
premierLeagueClubs.csv
presidentialLibraries.csv
simpsonsEpisodes(19).csv
simpsonsEpisodes(24).csv
simpsonsEpisodes(4).csv
sixFlagsAmusementParks.csv
summerOlympicGames.csv
superBowlBroadcasters.csv
tallestBuildingsInLosAngeles.csv
unitedStateGovernors.csv
vicePresidentsOfTheUnitedStates.csv
westWingActors.csv
wiiGames.csv
winterOlympicGames.csv
worldWarIIPistolsOfGermany.csv

README.txt

This data set has been compiled from wikipedia online version in June 2013. The table columns have been manually annotated with dcterms:subject and rdf:type annotations.

Format of the data:
- Each file contains one table
- Each line in the file represents a row in the table
- The first row corresponds to the groundtruth annotations of rdf:type types 
- The second row corresponds to the groundtruth annotations of dcterms:subject types
- Table columns are separated by semi-colon ";"
- Multiple entries in one cell are seperated by comma (",")
- The last line contains the corresponding column headers

Related publications:
under submission

Contact person:
Media Computer Science
University of Passau
stefan.zwicklbauer-at-uni-passau.de



Number of tables: 50

Column statistics:
min: 1
max: 5
mean: 2.64
total columns: 132

row statistics:
min: 10
max: 232
mean: 54.14
total rows: 2707


Annotation statistics:
total rdf:type annotations: 169
total dcterms:subject annotations: 160
mean rdf:type annotations: 1.29
mean dcterms:subject annotations: 1.21
total annotations: 329
mean annotations: 2.49