Skip to content

Commit

Permalink
Fixed open4 bundler issue, added README.textfile, HISTORY
Browse files Browse the repository at this point in the history
  • Loading branch information
mrcsparker committed Nov 30, 2011
1 parent e474b00 commit f0d4f08
Show file tree
Hide file tree
Showing 8 changed files with 112 additions and 13 deletions.
7 changes: 7 additions & 0 deletions HISTORY
@@ -0,0 +1,7 @@
0.2 - November 30, 2011
* Fixed open4 bundler issue - file was getting required that needed open4 before add_dependency
* Added README info, HISTORY
* Added more tests

0.1 - November 29, 2011
* Initial release
20 changes: 20 additions & 0 deletions LICENSE
@@ -0,0 +1,20 @@
Copyright (c) 2011 Chris Parker

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2 changes: 0 additions & 2 deletions README

This file was deleted.

45 changes: 45 additions & 0 deletions README.textile
@@ -0,0 +1,45 @@
h1. Ruby Tika Parser

h2. Introduction

This is a simple frontend to the Java Tika parser command line jar / app.

It is the same as running:
<pre>
java -server -Djava.awt.headless=true -jar tika-app-0.10.jar FileToParse.pdf
</pre>

with options like --xml, --text, etc.

h2. Installation

To install, add ruby_tika_app to your @Gemfile@ and run `bundle install`:

<pre>
gem 'ruby_tika_app'
</pre>

h3. Note about installation
RubyTikaApp is a pretty big gem since it includes the ruby-tika-app jarfile.
It might take a while to install.

h2. Usage

First, you need Java installed. And it needs to be in your $PATH.

Then:

<pre>
require 'ruby_tika_app'

rta = RubyTikaApp.new("sample_file.pdf")

puts rta.to_xml # <xml output>

# You also get to_json, to_text, to_text_main, and to_metadata

</pre>

h2. Contributing

Fork on GitHub and after you've committed tested patches, send a pull request.
2 changes: 0 additions & 2 deletions lib/ruby_tika_app.rb
Expand Up @@ -6,8 +6,6 @@

class RubyTikaApp

VERSION = "0.1"

class Error < RuntimeError; end

class CommandFailedError < Error
Expand Down
3 changes: 0 additions & 3 deletions lib/ruby_tika_app/version.rb

This file was deleted.

13 changes: 7 additions & 6 deletions ruby_tika_app.gemspec
@@ -1,26 +1,27 @@
# -*- encoding: utf-8 -*-
$:.push File.expand_path("../lib", __FILE__)
require "ruby_tika_app"

Gem::Specification.new do |s|
s.name = "ruby_tika_app"
s.version = RubyTikaApp::VERSION
s.version = "0.2"
s.platform = Gem::Platform::RUBY
s.authors = ["Chris Parker"]
s.email = ["mrcsparker@gmail.com"]
s.homepage = "http://github.com"
s.homepage = "https://github.com/mrcsparker/ruby_tika_app"
s.summary = %q{Wrapper around the tika-app jar}
s.description = %q{Wrapper around the tika-app jar}

s.rubyforge_project = "ruby_tika_app"

s.files = `git ls-files`.split("\n")
s.files = `git ls-files`.split("\n") +
%w(LICENSE README.textile HISTORY)
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
s.require_paths = ["lib"]
s.test_files = Dir.glob('spec/**/*')

s.add_dependency("open4")
s.add_runtime_dependency("open4")

s.add_development_dependency("rspec", "~> 2.7.0")
s.add_development_dependency("bundler", ">= 1.0.15")

end
33 changes: 33 additions & 0 deletions spec/ruby_tika_app_spec.rb
Expand Up @@ -35,18 +35,51 @@
end

describe "#to_json" do
it "header" do
rta = RubyTikaApp.new(@test_file)
rta.to_json[0..42].should == "{ \"Application\":\"\\u0027Certified by IEEE PD"
end

it "middle" do
rta = RubyTikaApp.new(@test_file)
rta.to_json[100 ... 150].should == "h\":171510, \n\"Content-Type\":\"application/pdf\", \n\"Cr"
end
end

describe "#to_text" do
it "header" do
rta = RubyTikaApp.new(@test_file)
rta.to_text[0..42].should == "Understanding Graph Sampling Algorithms\nfor"
end

it "middle" do
rta = RubyTikaApp.new(@test_file)
rta.to_text[100 ... 150].should == "n Zhang3, Tianyin Xu2\nLong Jin1, Pan Hui4, Beixing"
end
end

describe "#to_text_main" do
it "header" do
rta = RubyTikaApp.new(@test_file)
rta.to_text_main[0..42].should == "Understanding Graph Sampling Algorithms for"
end

it "middle" do
rta = RubyTikaApp.new(@test_file)
rta.to_text_main[100 ... 150].should == "n Zhang3, Tianyin Xu2 Long Jin1, Pan Hui4, Beixing"
end
end

describe "#to_metadata" do
it "header" do
rta = RubyTikaApp.new(@test_file)
rta.to_metadata[0..42].should == "Application: 'Certified by IEEE PDFeXpress "
end

it "middle" do
rta = RubyTikaApp.new(@test_file)
rta.to_metadata[100 ... 150].should == "Type: application/pdf\nCreation-Date: 2011-03-29T12"
end

end

Expand Down

0 comments on commit f0d4f08

Please sign in to comment.