Skip to content
Paul Houle edited this page Feb 7, 2014 · 2 revisions

SubjectiveEye3D is a subjective importance score derived from Wikipedia usage information. SubjectiveEye3D is available for download at

https://s3.amazonaws.com/subjectiveEye/0.9/subjectiveEye3D/part-r-00000.gz

and is a 90MB file in gzipped N-Triples format.

Creation Process

The following steps were involved in the creation of :SubjectiveEye3D.

First, usage data was summed up for each month. The number of hits was kept, but the number of downloaded bytes was removed. URIs that got fewer than ten hits in a month were discarded at this stage. (This resulted in a large compression of the monthly summaries)

Second, normalization factors were computed by summing up all hits for all topics in each language for each month. The goals were to: (i) treat each project as an independent universe, and (ii) put a higher weight on hits that happened earlier in time, on the assumption that the increasing number of hits to Wikipedia reflects increasing use of Wikipedia, not people being more interested in things.

Third, monthly aggregates were divided by these normalization factors and then summed to produce a 3D data file that covers all projects (languages).

Finally, this raw file was merged against the page id and transitive redirects files from DBpedia 3.9 to (i) eliminate concepts which do not exist in DBpedia, and (ii) sum up all of the hits to all variant forms of a concept and assign them to the canonical identifier.

Interpretation of SubjectiveEye3D

SubjectiveEye scores are proportional to the probability that somebody is interested in a given topic at a moment in time.

SubjectiveEye scores are relative probabilities. That is, they have most of the attributes of probabilities except that they don't necessarily add up to one. If you need to treat them as real probabilities you will need to determine a normalization factor that is right for you. In general, however, one can make a number of arguments that these scores should not add up to one.

Why SubjectiveEye could add up to less than one

SubjectiveEye covers the universe of topics that are covered in Wikipedia. It's clear that many (if not most) things in our shared consciousness are covered in the 4.5 million topics in the English Wikipedia, it's also true that we think about things that aren't in Wikipedia.

This has practical consequences for real use cases. For instance, in named entity resolution, it is possible that a surface form (say "Manning") represents a concept in the knowledge base (ex. :Peyton_Manning, :Eli_Manning, or :Manning_Publications) and also possible that it represents some concept that isn't in the knowledge base. To get acceptable results, the system needs to consider all of these possibilities.

Why SubjectiveEye could add up to more than one

Alternatively, people can be thinking of more than one thing at a time. There are two senses of this. One is that concepts are entangled in each other. If I'm thinking about dogs, for instance, I am also thinking about carnivores, mammals, and animals. If I'm thinking about Ohio, I'm also thinking about the United States. There are similar relationships that aren't strictly hierarchical: even though the Actor Kiefer Sutherland isn't literally "part of" the television show 24, somebody who is thinking of him may very well be thinking of that show.

Another aspect is that thinking involves relationships between things. If I'm thinking about a fact that can be expressed as an RDF triple, I've got a subject, predicate and object in my head. It's commonly stated that short term memory can contain 7 plus or minus 2 facts so this points to SubjectiveEye adding up to something around five or so.

Legal

SubjectiveEye3D is Copyright 2014 Ontology2 and can be freely used under the CC-BY-SA license for both commercial and non-commercial use. :SubjectiveEye3D is provided on an "as-is" basis, and by using :SubjectiveEye3D you agree to hold Ontology2 harmless for any liabilities you may incur.