Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor config parsing throughout. Introduce topicexplorer.config module. #150

Closed
JaimieMurdock opened this issue Jul 28, 2016 · 7 comments
Assignees
Milestone

Comments

@JaimieMurdock
Copy link
Member

Rather than having duplicate code in every topicexplorer submodule, create a single module that handles all config file access. Create a Config object that can be used throughout the application, especially in model comparison situations. Replace the notebooks/corpus.py template script with something that uses Config instead. This will massively help improve code readability.

@JaimieMurdock JaimieMurdock added this to the 1.0 Release milestone Jul 28, 2016
@JaimieMurdock JaimieMurdock self-assigned this Jul 28, 2016
@JaimieMurdock JaimieMurdock modified the milestone: 1.0 Release Jul 28, 2016
@JaimieMurdock
Copy link
Member Author

There is some config parsing code from the work @adithyant did last spring, and is already in the main branch via the code merge.

@JaimieMurdock
Copy link
Member Author

This should be my weekend project for May 6-7, 2017.

@JaimieMurdock JaimieMurdock moved this from In Progress to Weekend Projects in Bug Fixes Apr 26, 2017
@JaimieMurdock
Copy link
Member Author

The goal here is to remove the from corpus import * and corpus.tmpl.py dependencies in all notebooks.

@JaimieMurdock JaimieMurdock removed this from Weekend Projects in Bug Fixes Apr 7, 2018
@JaimieMurdock
Copy link
Member Author

JaimieMurdock commented May 10, 2018

I've made some good progress here on issues/150 branch.

Below is a mockup of the interface we're aiming for:

import topicexplorer
te = topicexplorer.from_config('ap.ini')

# access the corpus with .corpus
te.corpus

# access the individual models with dictionary attributes
assert isinstance(te[k], LdaCgsViewer)
te[k].theta
te[k].phi

# comparing two models using the interface
import topicexplorer.analysis
topicexplorer.analysis.model_dist(te[20], te[40])

# integrated past_to_text analysis
ordered_ids = ['some', 'labels', 'by', 'date']
p2t = topicexplorer.analysis.past_to_text(te[20], ordered_ids)
### returns raw numbers

# possible plot library?
import topicexplorer.analysis.plot
topicexplorer.analysis.plot.past_to_text(p2t)

Some other thoughts:

# accessing doc-topic distributions
te[20].doc_topics('some-document') == te[20]['some-document']
# getting specific topic proportion:
te[20]['some-document'][2]

# accessing word-topic distributions
te[20].topics(2) == te[20][2]
te[20].topics(2)[te[20].topics(2)[word=='something']] == te[20][2]['something']

This is too much for a single ticket, and definitely more of what I'm thinking for a 2.0, but I want to get at least to the point where the models are loaded with topicexplorer.from_config() in notebooks.

@colinallen
Copy link
Member

import from config is good
what happens if someone does something like this? (If I've got the syntax right:)

te1 = topicexplorer.from_config('tm1.ini')
te2 = topicexplorer.from_config('tm2.ini')
dist_topics_topics(te1[20][2],te2[20][2])

If the two tms are commensurable (same word set whether or not different corpora) it should work, but where the two tms are not really commensurable because the words don't match, do we want the mess that ensues to be on the programmer, or do we want to do something "smart"?

@colinallen
Copy link
Member

colinallen commented May 20, 2018

I guess the above point should be made with the new syntax:

topicexplorer.analysis.model_dist(te1[20], te2[20])

@colinallen
Copy link
Member

colinallen commented May 20, 2018

Also, most of the syntax is perspicuous, but I find this more than a bit opaque:

te[20].topics(2) == te[20][2]
te[20].topics(2)[te[20].topics(2)[word=='something']] == te[20][2]['something']

JaimieMurdock added a commit that referenced this issue May 20, 2018
Migration of config read to topicexplorer.config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants