Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
64 lines (38 sloc) 2.39 KB
#Spelling correction
http://www.sumsar.net/blog/2014/12/peter-norvigs-spell-checker-in-two-lines-of-r/
#British Gazette
I want to pull some info from the British Gazette
Here is the API
https://github.com/TheGazette/DevDocs/blob/master/notice/notice-feed.md
I think I can use the API to search, pull in results, deal with those results and get some insight.
I'm looking at allotment gardens
One I pull this in I can use R for NLP to get name and location of allotments?
I think I want this data as json
According to
https://stackoverflow.com/questions/2617600/importing-data-from-a-json-file-into-r
I should use rjson (although there are other libraries too.)
install.packages("rjson")
-- I got errors with the json
> json_file <- "https://www.thegazette.co.uk/all-notices/notice/data.json?end-publish-date=1918-11-11&text=potatoes+wart+schedule&start-publish-date=1914-08-03&location-distance-1=1&service=all-notices&categorycode-all=all&numberOfLocationSearches=1"
> json_data <- fromJSON(file=json_file)
Error in fromJSON(file = json_file) : unexpected character: "
So I switched to xml
Helpful tutorial
http://statistics.berkeley.edu/computing/r-reading-webpages
## can read in only 10 pages at a time
Note the page.number page.size page.start page.stop
id
1 https://www.thegazette.co.uk/all-notices/notice/data.feed?end-publish-date=1918-11-11&text=potatoes wart schedule&start-publish-date=1914-08-03&location-distance-1=1&service=all-notices&categorycode-all=all&numberOfLocationSearches=1&results-page=20
link link.1 link.2 link.3 link.4 link.5 title updated Query page.number page.size page.start page.stop total facets
1 NULL NULL NULL NULL NULL NULL Search Result 2017-10-22T15:17:45.608+01:00 NULL 20 10 191 194 194 NULL
Get
http://www.xpdfreader.com/download.html
Look under
Download the Xpdf tools:
I unzipped it here
exe <- "C:\\a_orgs\\carleton\\hist3814\\R\\graham_fellowship\\pdftools\\bin64\\pdftotext.exe"
#credit where credit is due, because I am sometimes forgetful
http://www.r-tutor.com/r-introduction/list
https://www.r-bloggers.com/how-to-write-the-first-for-loop-in-r/
https://stackoverflow.com/questions/1174799/how-to-make-execution-pause-sleep-wait-for-x-seconds-in-r
https://dzone.com/articles/reading-and-text-mining-pdf