Rikaikun Copy-Paste File cleaner
A simple command-line utility to clean up files created by copy-pasting data from Rikaikun.
Rikaikun (https://code.google.com/p/rikaikun/) is a great Chrome extension that does Japanese kanji lookups on-the-fly when you hover over Japanese words with your cursor. When you have looked up a word, you can copy it into your clipboard by pressing "c", and then paste the words into a text file.
Rikaikun provides you with a great deal of information for each lookup. A sample file generated from two lookup-copy-pastes is shown below (lines snipped for display):
通報 つうほう (n,vs) report; tip; bulletin; ...
通 つう (adj-na,n,ctr) connoisseur; ...
中西部 ちゅうせいぶ (n) Mid-west
中 うち (n,adj-no,pn,arch) inside; within; ...
中 なか (n) inside; in; among; within; ...
中 じゅう (suf) through; throughout; ...
中 ちゅう (suf,abbr,n-suf) medium; average; ...
The above was generated by looking up "通報" and "中西部", and hitting "c" to copy the data from Rikaikun.
Rikaiklean is a simple script that gets rid of the duplicated entries (presumably, I am not interested in things that I did not look up), and outputs a condensed version of the same file to the console. The file above would be condensed to the following:
通報 つうほう (n,vs) report; tip; bulletin; ...
中西部 ちゅうせいぶ (n) Mid-west
This can be redirected to a file, and then imported into an SRS (Spaced Repetition Software, such as Anki).
Rikaiklean does a few other small operations, such as combining spelling variations and pronunciations. See test/test_input.txt for a test file and the output.
Usage
Installation and Set Up
Other than setting up Ruby, and perhaps making main.rb executable, there shouldn't be anything to set up.
Usage
A sample run:
$ ruby main.rb test/test_input.txt
This has been written and tested on a Mac (ruby 2.0.0p481 (2014-05-08 revision 45883) [universal.x86_64-darwin13]). It does not use any additional Ruby Gems.