Skip to content
This repository has been archived by the owner on Dec 31, 2021. It is now read-only.

na-sa-do/nsndswap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nsndswap

A Python program that parses this website and this one, and outputs gexf files.

How to use

Run run.sh. For the most part, the copious logs can be ignored.

Output files

The following datasets are output to the output/ directory:

  • homestuck contains details of all songs on the first page (Homestuck soundtrack, unofficialmspafans, Homestuck Gaiden, and miscellaneous other things).
  • canwc contains details of all songs on the second page (the Cool and New Web Comic soundtrack).
  • viko contains a few additions I maintain (the raw form is in viko_nsnd.py.
  • everything contains all the previous files' data.

These are dumped in five formats:

  • .gexf - directed graphs of references
  • .txt - simple ad hoc plain-text format
  • .titles.txt - the titles, one per line
  • .reverse.txt - the format in .txt, but showing incoming references rather than outgoing
  • .unknown.txt - titles, one per line, of things which are referenced, but which don't have reference lists of their own (useful for checking for name misspellings and such)
  • .pkl - pickled version of the Web itself (subject to change, obviously)

About

Scrapes a few web pages and outputs graphs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published