Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 41 lines (32 sloc) 2.029 kB
77fb435 @jgm Modified WebArchiver plugin to make Alexa requests (gwern).
authored
1 {-| Scans page of Markdown looking for http links. When it finds them, it submits them
2 to webcitation.org / https://secure.wikimedia.org/wikipedia/en/wiki/WebCite
3 (It will also submit them to Alexa (the source for the Internet Archive), but Alexa says that
0241667 @gwern Convert WebArchiver module to shell around archiver
gwern authored
4 its bots take weeks to visit and may not ever.)
5
6 This module employs the archiver daemon <http://hackage.haskell.org/package/archiver> as a library; `cabal install archiver` will install it.
77fb435 @jgm Modified WebArchiver plugin to make Alexa requests (gwern).
authored
7
8 Limitations:
9 * Only parses Markdown, not ReST or any other format; this is because 'readMarkdown'
10 is hardwired into it.
9ba38be @gwern +New module which parses and dumps URLs to a file for use by archiver…
gwern authored
11 * No rate limitation or choking; will fire off all requests as fast as possible.
12 If pages have more than 20 external links or so, this may result in your IP being temporarily
13 banned by WebCite. To avoid this, you can use WebArchiverBot.hs instead, which will parse & dump
14 URLs into a file processed by the archiver daemon (which *is* rate-limited).
77fb435 @jgm Modified WebArchiver plugin to make Alexa requests (gwern).
authored
15
16 By: Gwern Branwen; placed in the public domain -}
1a3c231 @jgm Changed plugin names, made them all work.
authored
17
18 module WebArchiver (plugin) where
19
0a93979 @jgm Made WebArchiver plugin more parallel (gwern).
authored
20 import Control.Concurrent (forkIO)
0241667 @gwern Convert WebArchiver module to shell around archiver
gwern authored
21 import Network.URL.Archiver as A (checkArchive)
22 import Network.Gitit.Interface (askUser, bottomUpM, liftIO, uEmail, Plugin(PreCommitTransform), Inline(Link))
77fb435 @jgm Modified WebArchiver plugin to make Alexa requests (gwern).
authored
23 import Text.Pandoc (defaultParserState, readMarkdown)
1a3c231 @jgm Changed plugin names, made them all work.
authored
24
25 plugin :: Plugin
26 plugin = PreCommitTransform archivePage
27
0241667 @gwern Convert WebArchiver module to shell around archiver
gwern authored
28 -- archivePage :: String -> ReaderT PluginData (StateT Context IO) String
1a3c231 @jgm Changed plugin names, made them all work.
authored
29 archivePage x = do mbUser <- askUser
30 let email = case mbUser of
31 Nothing -> "nobody@mailinator.com"
32 Just u -> uEmail u
33 let p = readMarkdown defaultParserState x
34 -- force evaluation and archiving side-effects
0241667 @gwern Convert WebArchiver module to shell around archiver
gwern authored
35 _p' <- liftIO $ bottomUpM (archiveLinks email) p
1a3c231 @jgm Changed plugin names, made them all work.
authored
36 return x -- note: this is read-only - don't actually change page!
37
38 archiveLinks :: String -> Inline -> IO Inline
0241667 @gwern Convert WebArchiver module to shell around archiver
gwern authored
39 archiveLinks e x@(Link _ (uln, _)) = forkIO (A.checkArchive e uln) >> return x
77fb435 @jgm Modified WebArchiver plugin to make Alexa requests (gwern).
authored
40 archiveLinks _ x = return x
Something went wrong with that request. Please try again.