Skip to content

straup/nytimes-related

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

nytimes-related

These are the source RDF files used to generate the Who's on First (at the New York Times) webpages between 2006 and 2010. As of this writing that link (to the "Who's on First" pages) is broken because I am a space cadet. It will be fixed shortly...

These are not the actual articles as published by the New York Times but instead the metadata about each article (authors, subjects, locations, etc.) along with pointers to the articles themselves.

Comprehensive documentation still needs to be written.

OMGWTFRDFXML ??????!?!?

Yes. It's all in scary RDF/XML. It's not how I would do it now but it's what I did then. It looks scarier than it is. Please submit patches, fixes, whatever. That's part of the reason I am putting this all on Github.

OMGWTFUTF8 ???!!??!?!

I have no excuses. Some of the data is probably garbled beyond recognition at this point. I suppose it would be possible to recrawl the New York Times website to fix those mistakes but I haven't done that, ever.

I'm sorry. Luminoso's Fixing common Unicode mistakes with Python — after they’ve been made might fix some of the problems (maybe?) but I have not tried this yet...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published