Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chris Gutenberg #49

Open
wants to merge 2 commits into
base: gutenberg
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Binary file added .DS_Store
Binary file not shown.
4 changes: 4 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
source 'https://rubygems.org'
ruby '2.0.0'

gem 'pry-byebug'
2 changes: 1 addition & 1 deletion data/stopwords.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your,one,out,more,now,first,two,very,such,same,shall,upon,before,therefore,great,made,even,same,work,make,being,through,here,way,true,see,time,those,place,much,without,body,whole,another,thus,set,new,given,both,above,well,part,between,end,order,each,form,gutenberg
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your,one,out,more,now,first,two,very,such,same,shall,upon,before,therefore,great,made,even,same,work,make,being,through,here,way,true,see,time,those,place,much,without,body,whole,another,thus,set,new,given,both,above,well,part,between,end,order,each,form,gutenberg,
1 change: 1 addition & 0 deletions gutenberg.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
require_relative 'lib/simple_predictor'
require_relative 'lib/complex_predictor'


def run!(predictor_klass, opts={})
puts "+----------------------------------------------------+"
puts "| #{predictor_klass}#{" " * (51 - predictor_klass.to_s.size)}|"
Expand Down
54 changes: 52 additions & 2 deletions lib/complex_predictor.rb
Original file line number Diff line number Diff line change
@@ -1,12 +1,37 @@
require_relative 'predictor'
require 'pry-byebug'

class ComplexPredictor < Predictor
# Public: Trains the predictor on books in our dataset. This method is called
# before the predict() method is called.
#
# Returns nothing.

def train!
# iterates through all the words for each category, and creates a hash that has
# each word as a key, and the value is the number of times it appears.

@data = {}
@temp_data = {}

@all_books.each do |category, books|
@data[category] = {}
@temp_data[category] = {}
books.each do |filename, tokens|
tokens.each do |token|
if good_token?(token)
@data[category][token] ||= 1
@data[category][token] += 1
end
end
@data_array = @data[category].sort_by {|k,v| v}.reverse
limit = @data_array[100].last
@data[category].keep_if do |key, value|
value > limit
end
end
end

end

# Public: Predicts category.
Expand All @@ -15,8 +40,33 @@ def train!
#
# Returns a category.
def predict(tokens)
# Always predict astronomy, for now.
:astronomy

input_words = {}
match_words = {}

# put good tokens in an input_words hash with their counts
tokens.each do |token|
if good_token?(token)
input_words[token] ||= 1
input_words[token] += 1
end
end

# find words in the input_words hash that match words from the @data hash and
# create match_words hash that counts total number of words from each category
@data.each do |category, word_hash|
word_hash.each do |word, count|
if input_words.has_key?(word)
match_words[category] ||= 1
match_words[category] += input_words[word]
end
end
end

#return category with highest number of matches
return match_words.max_by{|k,v| v}.first

end

end

23 changes: 23 additions & 0 deletions vendor/bundle/bin/byebug
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env ruby
#
# This file was generated by RubyGems.
#
# The application 'byebug' is installed as part of a gem, and
# this file is here to facilitate running it.
#

require 'rubygems'

version = ">= 0"

if ARGV.first
str = ARGV.first
str = str.dup.force_encoding("BINARY") if str.respond_to? :force_encoding
if str =~ /\A_(.*)_\z/
version = $1
ARGV.shift
end
end

gem 'byebug', version
load Gem.bin_path('byebug', 'byebug', version)
23 changes: 23 additions & 0 deletions vendor/bundle/bin/coderay
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env ruby
#
# This file was generated by RubyGems.
#
# The application 'coderay' is installed as part of a gem, and
# this file is here to facilitate running it.
#

require 'rubygems'

version = ">= 0"

if ARGV.first
str = ARGV.first
str = str.dup.force_encoding("BINARY") if str.respond_to? :force_encoding
if str =~ /\A_(.*)_\z/
version = $1
ARGV.shift
end
end

gem 'coderay', version
load Gem.bin_path('coderay', 'coderay', version)
23 changes: 23 additions & 0 deletions vendor/bundle/bin/pry
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env ruby
#
# This file was generated by RubyGems.
#
# The application 'pry' is installed as part of a gem, and
# this file is here to facilitate running it.
#

require 'rubygems'

version = ">= 0"

if ARGV.first
str = ARGV.first
str = str.dup.force_encoding("BINARY") if str.respond_to? :force_encoding
if str =~ /\A_(.*)_\z/
version = $1
ARGV.shift
end
end

gem 'pry', version
load Gem.bin_path('pry', 'pry', version)
1 change: 1 addition & 0 deletions vendor/bundle/build_info/byebug-2.7.0.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/coderay-1.1.0.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/columnize-0.8.9.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/debugger-linecache-1.2.0.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/method_source-0.8.2.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/pry-0.10.0.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/pry-byebug-1.3.3.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions vendor/bundle/build_info/slop-3.6.0.info
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

18 changes: 18 additions & 0 deletions vendor/bundle/gems/byebug-2.7.0/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
tmp
pkg
doc

.rvmrc
.ruby-version
.ruby-gemset
.gdbinit
.gdb_history
.bundle
.tags
.ackrc
.jrubyrc

Gemfile.lock

lib/byebug/byebug.so
lib/byebug/byebug.bundle
8 changes: 8 additions & 0 deletions vendor/bundle/gems/byebug-2.7.0/.travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
before_script: bundle exec rake compile
rvm:
- 2.0.0
- 2.1.0
- ruby-head
matrix:
allow_failures:
- rvm: ruby-head
Loading