Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 47 lines (37 sloc) 2.158 kB
2085e4e @gpeterson2 Update to use the observer pattern.
authored
1 The ultimate goal of this project is to feed a list of japanese words into a
2 program and get a list of translations back out.
3
4 The first step was to read a Japanese translation dictionary into a format
5 that could then be queried. Then create something to break up Japanese text,
6 feed the words into this, and print out the results.
7
8 My original approach was to insert the contents of the JMdict
9 Japanese translation file into a sqlite database. I was hoping than I could
10 then use sql syntax to make searching easier.
11
12 Inserting the data into a sqlite database was relatively easy, despite
13 initially running into issues using SqlAlchemy. It may eventually be useful
14 but the insert queries it ran would take hours to complete. I've now managed
15 to get it down to a couple minutes, but the join required on fully normalized
16 data meant that it was slower than reading the file directly from xml. I was
17 in the process of creating a single warehouse table before getting distracted
18 by other things. That would still allow the sqlite file to be a cross
19 platform data file, but it would loose
20
21 I don't necessarily want to entirely scrap that idea, but for any data analysis
22 I may try other databases backends instead.
23
24 The current goal is still to read the dictionary file and convert it into some
25 kind of non-xml store that can be quickly read in or queried. I haven't gotten
26 into any other specifics yet.
27
28 The current project setup is a little cluttered. At some point it will have to
29 be cleaned up.
30
31 Required packages:
32 - lxml
33 - SqlAlchemy - for databse setup (Need to eventually remove, or at least move,
34 this requirement, as not all stores are going to need it).
35
36 TODO:
37 - Create a means of querying a data store.
38 - Develop companion readers/writers for each existing type - ideally you will
39 be able to read in anything that has been written out, and write out anything
40 that has been written in.
41 - Figure out why sqlite on windows isn't saving the data as unicode, or if it
42 is just a console issue.
43 - Perhaps move some of the sqlite normalized table infomration into the reader
44 as it is currently an extra step. Although it may only be useful for sql
45 stores.
46
Something went wrong with that request. Please try again.