Skip to content
Retrieve structured, textual data from various web sources.
Branch: master
Clone or download
mannau Fix #16
Fix YahooFinanceSource
Latest commit a2e7d08 Feb 26, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
inst Update NEWS file to new format May 7, 2015
man
tests
vignettes
.Rbuildignore
.gitignore
.travis.yml
DESCRIPTION
Makefile
NAMESPACE
README.md

README.md

tm.plugin.webmining

Build Status codecov.io License

tm.plugin.webmining is an R-package which facilitates text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even extracts the text from the original text source.

Install

To install the latest version from CRAN simply

install.packages("tm.plugin.webmining")

Using the devtools package you can easily install the latest development version of tm.plugin.webmining from github with

library(devtools)
install_github("mannau/tm.plugin.webmining")

Windows users need to use the following command to install from github:

library(devtools)
install_github("mannau/boilerpipeR", args = "--no-multiarch")

Usage

The next snippet shows how to download and extract the main text from all supported sources as WebCorpus objects including a rich set of metadata like Author, DateTimeStamp or Source:

library(tm.plugin.webmining)
googlefinance <- WebCorpus(GoogleFinanceSource("NASDAQ:MSFT"))
googlenews <- WebCorpus(GoogleNewsSource("Microsoft"))
nytimes <- WebCorpus(NYTimesSource("Microsoft", appid = "<nytimes_appid>"))
reutersnews <- WebCorpus(ReutersNewsSource("businessNews"))
#twitter <- WebCorpus(TwitterSource("Microsoft")) -> not supported yet
yahoofinance <- WebCorpus(YahooFinanceSource("MSFT"))
yahooinplay <- WebCorpus(YahooInplaySource())
yahoonews <- WebCorpus(YahooNewsSource("Microsoft"))
liberation <- WebCorpus(LiberationSource("latest"))

License

tm.plugin.webmining is released under the GNU General Public License Version 3

You can’t perform that action at this time.