-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor config parsing throughout. Introduce topicexplorer.config module. #150
Comments
There is some config parsing code from the work @adithyant did last spring, and is already in the main branch via the code merge. |
This should be my weekend project for May 6-7, 2017. |
The goal here is to remove the |
I've made some good progress here on Below is a mockup of the interface we're aiming for: import topicexplorer
te = topicexplorer.from_config('ap.ini')
# access the corpus with .corpus
te.corpus
# access the individual models with dictionary attributes
assert isinstance(te[k], LdaCgsViewer)
te[k].theta
te[k].phi
# comparing two models using the interface
import topicexplorer.analysis
topicexplorer.analysis.model_dist(te[20], te[40])
# integrated past_to_text analysis
ordered_ids = ['some', 'labels', 'by', 'date']
p2t = topicexplorer.analysis.past_to_text(te[20], ordered_ids)
### returns raw numbers
# possible plot library?
import topicexplorer.analysis.plot
topicexplorer.analysis.plot.past_to_text(p2t) Some other thoughts: # accessing doc-topic distributions
te[20].doc_topics('some-document') == te[20]['some-document']
# getting specific topic proportion:
te[20]['some-document'][2]
# accessing word-topic distributions
te[20].topics(2) == te[20][2]
te[20].topics(2)[te[20].topics(2)[word=='something']] == te[20][2]['something'] This is too much for a single ticket, and definitely more of what I'm thinking for a 2.0, but I want to get at least to the point where the models are loaded with |
import from config is good
If the two tms are commensurable (same word set whether or not different corpora) it should work, but where the two tms are not really commensurable because the words don't match, do we want the mess that ensues to be on the programmer, or do we want to do something "smart"? |
I guess the above point should be made with the new syntax:
|
Also, most of the syntax is perspicuous, but I find this more than a bit opaque:
|
Migration of config read to topicexplorer.config
Rather than having duplicate code in every topicexplorer submodule, create a single module that handles all config file access. Create a
Config
object that can be used throughout the application, especially in model comparison situations. Replace thenotebooks/corpus.py
template script with something that usesConfig
instead. This will massively help improve code readability.The text was updated successfully, but these errors were encountered: