Skip to content
Browse files

new files

  • Loading branch information...
1 parent cd68c98 commit 73cd67194eb2b263d5079fe782a0450633d24aa6 Jeffrey Breen committed Jul 12, 2011
Showing with 96 additions and 0 deletions.
  1. +13 −0 LICENSE
  2. +83 −0 README
View
13 LICENSE
@@ -0,0 +1,13 @@
+Copyright 2011 by Jeffrey Oliver Breen <jbreen@cambridge.aero>
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
View
83 README
@@ -0,0 +1,83 @@
+INTRODUCTION
+
+ This project contains all the code necessary to reproduce the analysis
+ presented in my R tutorial on mining Twitter for airline sentiment:
+
+ http://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/
+
+
+CONTENTS
+
+ data/
+ opinion-lexicon-English/ - Hu & Liu's opinion lexicon
+ negative-words.txt
+ NLP-handbook-sentiment-analysis.pdf
+ positive-words.txt
+ source.txt
+ source.webloc
+
+ acsi.df.RData - airline customer satisfaction scores
+ acsi.raw.df.RData - scraped from theacsi.org web site
+
+ output/ - output files, mainly graphics
+ qplot_delta_hist.pdf
+ twitter_acsi_comparison_with_fit.pdf
+ twitter_acsi_comparison.pdf
+ twitter_score_histograms.pdf
+
+ R/ - R source code
+ 0_start.R
+ 1_load.R
+ 2_run.R
+ scrape.R
+ sentiment.R
+
+ LICENSE - Copyright per Apache 2.0 license
+
+ README - this file
+
+
+INSTRUCTIONS
+
+In order to run the analysis, start R from this project's root directory
+or it with setwd()
+
+
+1. Load the prerequisite packages, our score.sentiment() function, and
+ some environment variables with the "0_start.R" script:
+
+ > source("R/0_start.R")
+
+
+2. This distribution does not ship with any data from Twitter, so you will need
+ to collect your own the first time you attempt to run this package.
+
+ To collect data from Twitter, simply execute the "scrape.R" script:
+
+ > source("R/scrape.R")
+
+ This script caches your collected tweets to the data/ directory, so you
+ only need to run this step once.
+
+
+3. Load the Twitter data, opinion lexicon, and ACSI results from disk:
+
+ > source("R/1_load.R")
+
+ If this is your first time running this code, and you have not followed Step 2
+ to collect your own Twitter data, you will see this error message:
+
+ Error: Tweets not found on disk -- source('R/scrape.R') to scrape Twitter first
+
+
+4. Run the analysis:
+
+ > source("R/2_run.R")
+
+ Progress messages will be displayed on the console and all generated graphics will
+ be displayed and saved as PDFs in the output/ directory.
+
+
+Jeffrey Oliver Breen
+jbreen@cambridge.aero
+July 2011

0 comments on commit 73cd671

Please sign in to comment.
Something went wrong with that request. Please try again.