Skip to content

imclab/nytimes-related

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

nytimes-related

These are the source RDF files used to generate the Who's on First (at the New York Times) webpages between 2006 and 2010.

These are not the actual articles as published by the New York Times but instead the metadata about each article (authors, subjects, locations, etc.) along with pointers to the articles themselves.

Comprehensive documentation still needs to be written.

OMGWTFRDFXML ??????!?!?

Yes. It's all in scary RDF/XML. It's not how I would do it now but it's what I did then. It looks scarier than it is. Please submit patches, fixes, whatever. That's part of the reason I am putting this all on Github.

OMGWTFUTF8 ???!!??!?!

I have no excuses. Some of the data is probably garbled beyond recognition at this point. I suppose it would be possible to recrawl the New York Times website to fix those mistakes but I haven't done that, ever.

I'm sorry. Luminoso's Fixing common Unicode mistakes with Python — after they’ve been made might fix some of the problems (maybe?) but I have not tried this yet...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published