Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Fixed open4 bundler issue, added README.textfile, HISTORY

  • Loading branch information...
commit f0d4f088c9d05f8ccbb5925a183f8df321c2170a 1 parent e474b00
@mrcsparker authored
View
7 HISTORY
@@ -0,0 +1,7 @@
+0.2 - November 30, 2011
+* Fixed open4 bundler issue - file was getting required that needed open4 before add_dependency
+* Added README info, HISTORY
+* Added more tests
+
+0.1 - November 29, 2011
+* Initial release
View
20 LICENSE
@@ -0,0 +1,20 @@
+Copyright (c) 2011 Chris Parker
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
View
2  README
@@ -1,2 +0,0 @@
-This is a simple frontend to the Java Tika parser
-command line jar / app.
View
45 README.textile
@@ -0,0 +1,45 @@
+h1. Ruby Tika Parser
+
+h2. Introduction
+
+This is a simple frontend to the Java Tika parser command line jar / app.
+
+It is the same as running:
+<pre>
+ java -server -Djava.awt.headless=true -jar tika-app-0.10.jar FileToParse.pdf
+</pre>
+
+with options like --xml, --text, etc.
+
+h2. Installation
+
+To install, add ruby_tika_app to your @Gemfile@ and run `bundle install`:
+
+<pre>
+gem 'ruby_tika_app'
+</pre>
+
+h3. Note about installation
+RubyTikaApp is a pretty big gem since it includes the ruby-tika-app jarfile.
+It might take a while to install.
+
+h2. Usage
+
+First, you need Java installed. And it needs to be in your $PATH.
+
+Then:
+
+<pre>
+require 'ruby_tika_app'
+
+rta = RubyTikaApp.new("sample_file.pdf")
+
+puts rta.to_xml # <xml output>
+
+# You also get to_json, to_text, to_text_main, and to_metadata
+
+</pre>
+
+h2. Contributing
+
+Fork on GitHub and after you've committed tested patches, send a pull request.
View
2  lib/ruby_tika_app.rb
@@ -6,8 +6,6 @@
class RubyTikaApp
- VERSION = "0.1"
-
class Error < RuntimeError; end
class CommandFailedError < Error
View
3  lib/ruby_tika_app/version.rb
@@ -1,3 +0,0 @@
-module RubyTikaApp
- VERSION = "0.0.1"
-end
View
13 ruby_tika_app.gemspec
@@ -1,26 +1,27 @@
# -*- encoding: utf-8 -*-
$:.push File.expand_path("../lib", __FILE__)
-require "ruby_tika_app"
Gem::Specification.new do |s|
s.name = "ruby_tika_app"
- s.version = RubyTikaApp::VERSION
+ s.version = "0.2"
+ s.platform = Gem::Platform::RUBY
s.authors = ["Chris Parker"]
s.email = ["mrcsparker@gmail.com"]
- s.homepage = "http://github.com"
+ s.homepage = "https://github.com/mrcsparker/ruby_tika_app"
s.summary = %q{Wrapper around the tika-app jar}
s.description = %q{Wrapper around the tika-app jar}
s.rubyforge_project = "ruby_tika_app"
- s.files = `git ls-files`.split("\n")
+ s.files = `git ls-files`.split("\n") +
+ %w(LICENSE README.textile HISTORY)
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
s.require_paths = ["lib"]
+ s.test_files = Dir.glob('spec/**/*')
- s.add_dependency("open4")
+ s.add_runtime_dependency("open4")
s.add_development_dependency("rspec", "~> 2.7.0")
s.add_development_dependency("bundler", ">= 1.0.15")
-
end
View
33 spec/ruby_tika_app_spec.rb
@@ -35,18 +35,51 @@
end
describe "#to_json" do
+ it "header" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_json[0..42].should == "{ \"Application\":\"\\u0027Certified by IEEE PD"
+ end
+ it "middle" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_json[100 ... 150].should == "h\":171510, \n\"Content-Type\":\"application/pdf\", \n\"Cr"
+ end
end
describe "#to_text" do
+ it "header" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_text[0..42].should == "Understanding Graph Sampling Algorithms\nfor"
+ end
+ it "middle" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_text[100 ... 150].should == "n Zhang3, Tianyin Xu2\nLong Jin1, Pan Hui4, Beixing"
+ end
end
describe "#to_text_main" do
+ it "header" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_text_main[0..42].should == "Understanding Graph Sampling Algorithms for"
+ end
+ it "middle" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_text_main[100 ... 150].should == "n Zhang3, Tianyin Xu2 Long Jin1, Pan Hui4, Beixing"
+ end
end
describe "#to_metadata" do
+ it "header" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_metadata[0..42].should == "Application: 'Certified by IEEE PDFeXpress "
+ end
+
+ it "middle" do
+ rta = RubyTikaApp.new(@test_file)
+ rta.to_metadata[100 ... 150].should == "Type: application/pdf\nCreation-Date: 2011-03-29T12"
+ end
end
Please sign in to comment.
Something went wrong with that request. Please try again.