-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO.txt
111 lines (79 loc) · 5.14 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Algorithm::AM
Right now: change the Project tests to use the add_data and add_test methods instead of using files where possible.
check that data and test and outcome lines are formatted properly, don't just assume that they are (above change will make it easier).
perhaps require/allow user to input number of variables in their vectors in the constructor?
The whole long/short thing is broken. You should not be allowed to have more than one short/long combination.
Using a Project by itself is bad design. Create another module to hold a dataset. Project can manage 2 datasets (data and test). This would be much more flexible.
error checking for getting data data at a specific index; make sure it's a legal index.
possible create a DataSet class to pass to new Projects.
change $self->{data/outcome/spec} to $self->{data}->{data/outcome/spec}. $self->{outcome/outcomes} are too similarly named.
Use undef instead of increment for making unique string lists using hashes.
It seems Theron expected no more than 60 data columns. Maybe warn somewhere if more than that are used.
require user to create and pass in the Project object?
- this would allow us to change commas to boolean and still check for non-existence (more gracefully than otherwise).
change commas to be either 1 or 0
factor out data line parsing into a single routine
Create Algorithm::AM::Datum object to pass to hooks. Add accessors that make the use of outcometonum, outcomelist, etc. obsolete. They are gross.
Make a single method for reading a data/test file, and move the creation of the format variables to their own loop. Or decide not to.
Change the POD to use Pod::Weaver properly (=method, etc.).
Find reasons for 5.14 Perl minimum and see if that is reasonable.
Create tests for each of the example programs in the documentation. This way they are forced to be updated as the API is updated.
change activeVars to be accepted at classification time
ask someone about expected outcome file format
- finnverb outcome file does not have unique long outcomes to match unique short ones
- short outcome column seems redundant with data file
work on types of input to AM dataset
-allow user to specify input format
specify bigsep, littlesep
specify comment character
-allow user to pass in actual data instead of using project file.
properly destroy project or AM object on error so that illegal state is not possible even if someone catches an error and tries to continue.
##Printing TODOs:
-Split print_summary into two methods: print_config and print_data_stats
-Make printing clearer (label gangs, etc.)
-output something about the number of features in the vectors contained in a data file (I think this is done. Just make it clearer?)
-Refactor gangs param into two different boolean params
- change logger to Log::Any
- return classification result objects that have print methods, instead of just printing everything
## Other TODOs:
- update HISTORY pod section in AM.pm
-Figure out good project organization to allow dual builders (MB and EUMM)
-Create an AM old stuff branch
-make sure decisions on what is in $self and what is in $data are good
so far, $self has options and $data has info about the input data
Not so! Now $self has @data, @outcome, and @spec
It'd be better if $data had classification information (a result object), another variable had per-iteration information, and $self had everything else
Create accessors for $self
-Eliminate outcometonum
- using an accessor of some kind
-get rid of difficult (but extremely cool) __DATA__ hack
At first, put __DATA__ in a separate file for easier browsing
Maybe use Text::Template or Template toolkit instead of hand markings
#Eventual refactoring plan:
Algorithm::AM object should be a single classifier, not a batch runner.
classify() method should classify a single vector
Should move iteration, and iter vars, into a separate class.
This will make testing much easier, since we can then simply return classification outcomes in a single variable and test it, instead of testing printed output or accumulating outcomes and testing them.
Don't print as we go; provide print methods or a verbose option
also provide method for grabbing format variables
pass the AM object to hooks so that these are usable
I think beginning vars should just be in the AM object;
format vars should also be in the AM object
usage of outcometonum should be replaced with a method
@sum should be replaced with something with index starting at 0
possibly rename to "subtotals"
I think endvars should be a return value besides being available in the hook.
should probably also return the entire analogical set...
## Documentation TODOs:
-Write guide on porting old AM code.
classify, not ->()
variables
running batches
bigcmp
parameters
projects
-Read through documentation, update/clean if necessary
Include Wikipedia pictures
-Define "specifier" early on in the documentation. Give the anatomy of a data file.
-Mention that the specifier is the data unless otherwise specified
-remove references to 'the red book'; put all documentation in the code!