Why is <dc:subject> treated as category? #422

Open
Lukas0907 opened this Issue Dec 31, 2015 · 1 comment

Projects

None yet

2 participants

@Lukas0907

Hi,

I was wondering why <dc:subject> is treated as a category by the get_categories() methods.

A subject and a category seem to be different things to me.

Cheers,

Lukas

@JanPetterMG
Contributor

From the Dublin Core spec: http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#subject

Definition: The topic of the resource.
Comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.

After some research, this table seems to be the best explanation: http://www.feedsweep.com/ShowArticle.aspx?ID=38

Filter Item Atom RSS (2.0, 0.91, 0.92) RDF/RSS (1.0, 0.90)
FeedTags /atom/feed/category /rss/channel/category /rss/channel/dc:subject /rdf:RDF/channel/dc:subjec

Some personal thoughts:
I'll agree, there is too many categories used in Atom/RSS feeds worldwide, a lot of them seems to be too unique (eg. reverse lookup returning 1 item, for a timeline of 3 years, when filtering by url host).

I haven't done any research on dc:subjec specifically, but I guess it has an small to medium impact on the results above.
The biggest problem as I see it, is the webpages doing the following:

  • tagging no categories at all
  • tagging a hardcoded list of categories
  • tagging categories that's not really an category (eg. author, agency, it's own webpage name, etc).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment