Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 41 lines (32 sloc) 2.029 kb
77fb435 John MacFarlane Modified WebArchiver plugin to make Alexa requests (gwern).
authored
1 {-| Scans page of Markdown looking for http links. When it finds them, it submits them
2 to webcitation.org / https://secure.wikimedia.org/wikipedia/en/wiki/WebCite
3 (It will also submit them to Alexa (the source for the Internet Archive), but Alexa says that
0241667 gwern Convert WebArchiver module to shell around archiver
gwern authored
4 its bots take weeks to visit and may not ever.)
5
6 This module employs the archiver daemon <http://hackage.haskell.org/package/archiver> as a library; `cabal install archiver` will install it.
77fb435 John MacFarlane Modified WebArchiver plugin to make Alexa requests (gwern).
authored
7
8 Limitations:
9 * Only parses Markdown, not ReST or any other format; this is because 'readMarkdown'
10 is hardwired into it.
9ba38be gwern +New module which parses and dumps URLs to a file for use by archiver da...
gwern authored
11 * No rate limitation or choking; will fire off all requests as fast as possible.
12 If pages have more than 20 external links or so, this may result in your IP being temporarily
13 banned by WebCite. To avoid this, you can use WebArchiverBot.hs instead, which will parse & dump
14 URLs into a file processed by the archiver daemon (which *is* rate-limited).
77fb435 John MacFarlane Modified WebArchiver plugin to make Alexa requests (gwern).
authored
15
16 By: Gwern Branwen; placed in the public domain -}
1a3c231 John MacFarlane Changed plugin names, made them all work.
authored
17
18 module WebArchiver (plugin) where
19
0a93979 John MacFarlane Made WebArchiver plugin more parallel (gwern).
authored
20 import Control.Concurrent (forkIO)
0241667 gwern Convert WebArchiver module to shell around archiver
gwern authored
21 import Network.URL.Archiver as A (checkArchive)
22 import Network.Gitit.Interface (askUser, bottomUpM, liftIO, uEmail, Plugin(PreCommitTransform), Inline(Link))
77fb435 John MacFarlane Modified WebArchiver plugin to make Alexa requests (gwern).
authored
23 import Text.Pandoc (defaultParserState, readMarkdown)
1a3c231 John MacFarlane Changed plugin names, made them all work.
authored
24
25 plugin :: Plugin
26 plugin = PreCommitTransform archivePage
27
0241667 gwern Convert WebArchiver module to shell around archiver
gwern authored
28 -- archivePage :: String -> ReaderT PluginData (StateT Context IO) String
1a3c231 John MacFarlane Changed plugin names, made them all work.
authored
29 archivePage x = do mbUser <- askUser
30 let email = case mbUser of
31 Nothing -> "nobody@mailinator.com"
32 Just u -> uEmail u
33 let p = readMarkdown defaultParserState x
34 -- force evaluation and archiving side-effects
0241667 gwern Convert WebArchiver module to shell around archiver
gwern authored
35 _p' <- liftIO $ bottomUpM (archiveLinks email) p
1a3c231 John MacFarlane Changed plugin names, made them all work.
authored
36 return x -- note: this is read-only - don't actually change page!
37
38 archiveLinks :: String -> Inline -> IO Inline
0241667 gwern Convert WebArchiver module to shell around archiver
gwern authored
39 archiveLinks e x@(Link _ (uln, _)) = forkIO (A.checkArchive e uln) >> return x
77fb435 John MacFarlane Modified WebArchiver plugin to make Alexa requests (gwern).
authored
40 archiveLinks _ x = return x
Something went wrong with that request. Please try again.