Download classifications from TAMS database

This package will download the classifications assigned to signature data given a detector and a date range.

Necessary upgrades

This code was run to completion in July of 2017, processing all of the database archive tables through June of 2017.

To run for new archive tables, this code needs to be modified as follows.

First, all of the tables that were already processed are listed in the file ./data/cached_tables.json. Second, I have processed the out.txt file and have pulled the detectors that were processed through June of 2017 archive data into the beginning of the file get_classifications.js.

Right now, without any edits, re-running get_classifications.js will not do anything, as all of "cached" tables and all of the detectors have already been processed.

If you delete the "cached" list of tables (not really recommended as it takes forever to generate it), then the code will revisit all of the archive tables and extract the first and last timestamp for each detector stored in each table. This will have the effect of picking up new tables.

If you delete the "finished detectors" list (or empty it out), then the code will re-process each detector. Again, not recommended.

Fix one: process new tables

The first fix is to hack the code to first load the cached_tables.json file, figure out all of the DB tables that have already been processed, and then pull in and process new ones. At the end of this step, a new cached_tables.json file should be written out.

Fix two: process new tables only

Along with the above fix, rather than relying on a hand-coded list of finished detectors (which was a hack I did for speed of implementation), a better approach is to look at the data files in the ./data/ directory, and generate a list of (database table, detector id) pairs that have already been processed. Using this as the "finished" list is better, because then only unfinished entries in the cached_tables.json file need to be processed.

Fix three: add processing state to `cached_tables.json`

Come to think of it, a better approach all around will be to add the "completed" state to the dumped cached_tables.json file. Because programs can crash, every time a detector and table are processed and written out to a data file, the completion state should be added to the cached tables object and that object should be written out to the filesystem. Then then next run can just pick up work by doing only those tables and detectors that have not yet been marked as done.

Stopping work on this

As of July 20, 2017, I (James Marca) am stopping work on this code.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
lib		lib
sql		sql
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
finished.txt		finished.txt
get_classifications.js		get_classifications.js
index.js		index.js
out.txt		out.txt
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download classifications from TAMS database

Necessary upgrades

Fix one: process new tables

Fix two: process new tables only

Fix three: add processing state to `cached_tables.json`

Stopping work on this

About

Releases

Packages

Languages

jmarca/tams_classifications

Folders and files

Latest commit

History

Repository files navigation

Download classifications from TAMS database

Necessary upgrades

Fix one: process new tables

Fix two: process new tables only

Fix three: add processing state to cached_tables.json

Stopping work on this

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Fix three: add processing state to `cached_tables.json`

Packages