Skip to content

Commit

Permalink
misc fixes, achieves 82% accuracy on corpus words
Browse files Browse the repository at this point in the history
  • Loading branch information
gkovacs committed May 16, 2011
1 parent 29bd2d8 commit 707d200
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 16 deletions.
4 changes: 2 additions & 2 deletions japanese-rules.lex
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Begin: Root
Root: DECISION_ROOT PARTICLE_ROOT DEMONSTRATIVE_ROOT CONJUNCTION_ROOT PRENOUNADJECTIVAL_ROOT AUXILIARY_ROOT ADVERB_ROOT YOI_ADJ_ROOT I_ADJ_ROOT VerbRoot INTERJECTION_ROOT NA_ADJ_ROOT TARU_ADJ_ROOT NounRoot PREFIX_ROOT
Root: DECISION_ROOT PARTICLE_ROOT AUXILIARY_ROOT DEMONSTRATIVE_ROOT CONJUNCTION_ROOT ADVERB_ROOT PRENOUNADJECTIVAL_ROOT YOI_ADJ_ROOT I_ADJ_ROOT VerbRoot INTERJECTION_ROOT NA_ADJ_ROOT TARU_ADJ_ROOT NounRoot PREFIX_ROOT

NounRoot: NUMBER NOUN_ROOT
VerbRoot: SURU_V_ROOT NonSuruVerbRoot HONORIFIC_REG_VERB
Expand All @@ -14,8 +14,8 @@ PREFIX_SUFFIX:
'' NOUN_ROOT

DECISION_ROOT:
'' PROBABLY POS:Decision;;BASE:だ
'' NA_ADJ_NOUN_DESU POS:Decision;;BASE:だ
な End POS:Decision;;BASE:だ

HONORIFIC_REG_VERB:
お NonSuruVerbRoot_HONORIFIC
Expand Down
2 changes: 1 addition & 1 deletion successrate.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def printSuccessRate(inputLines)
if total == 0
"0 (0%)"
else
"#{v} (#{(v*100/total)}%)"
"#{v} (#{(v*100.0/total).round}%)"
end
}
puts "total w: #{total}"
Expand Down
30 changes: 17 additions & 13 deletions word-pos-in-corpus.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,27 @@
word = ARGV[0]

posCounts = {}
partsOfSpeech.each { |pos| posCounts[pos] = 0 }
partsOfSpeech.each { |pos| posCounts[pos] = {} }
File.open("corpus/corpus-allwords-base-pos.txt").each { |line|
spl = line.split(" ")
baseform = spl[1]
if baseform != word
conjform = spl[0]
if conjform != word
next
end
baseform = spl[1]
pos = spl[2]
posCounts[pos] += 1
}
maxcount = 0
bestpos = ""
posCounts.each { |pos,count|
if count > maxcount
bestpos = pos
maxcount = count
if !posCounts[pos].include?(baseform)
posCounts[pos][baseform] = 0
end
posCounts[pos][baseform] += 1
}
counts = []
posCounts.each { |pos,bfcount|
bfcount.each { |baseform,count|
counts.push([count, baseform, pos])
}
}
counts.sort! {|a,b| b[0] <=> a[0] }
counts.each { |count, baseform, pos|
puts baseform + " " + pos + " " + count.to_s
}
puts bestpos

0 comments on commit 707d200

Please sign in to comment.