This repository has been archived by the owner on May 31, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Michael Greenberg edited this page Feb 2, 2016
·
25 revisions
-
CENTRAL PROJECT: GUI AND INFRASTRUCTURE
- Accesses ACM DL data
- Operates over any subset of conferences or journals selected by the user
- Allow users to name topics
- 4 or 5 datasets
- propose names; vote on them (up, down, offensive) Hive?
- comment pages (DISQUS?) on topic listings, temporal graphs
- Generate a Noam-style visualization for each data set
- Explore additional methods for visualization
- Allow us to generate logins for trusted users. Allow trusted users with logins to name and save their analyses for other users to view.
- Limits vocabulary (use TFIDF a la Blei)
-
PROJECT: TOPIC NAME GENERATION
- Generate names for topics automatically (in GUI, allow user override)
-
PROJECT: SEMI-AUTOMATED TOPIC CONSTRUCTION
- Allow users to generate topics using a GUI (split, merge, see ITM )
-
PROJECT: CHOOSING THE RIGHT NUMBER OF TOPICS
- See Noam-style AIC fitting
- For a casual user, can we generate a heuristic for the right number of topics? Generate rules of thumb? Use number of conferences? Number of papers? Different priors for different areas? eg: Networks vs PL? Will this work?
-
PROJECT: ALLOW USERS TO SELECT MORE ADVANCED MODELS
- Dynamic topic models
- How do we visualize the results?
- Can we compare the results to non-dynamic models? How do we analyze the impact?
- track the influence of papers forward to other papers
-
PROJECT: COMBINE CORRELATED TOPIC MODELS AND DYNAMIC TOPIC MODELS
-
PROJECT: REPLACE THE ACM CLASSIFICATION SYSTEM.
- Generate a topic model for all of the ACM with names. Justify your decisions.
-
PROJECT: REPLACE THE ACM SEARCH AND/OR RELATED WORK SEARCH
- Combine topic models and citation and co-author data to improve related work search
-
PROJECT: SUBMIT A PAPER OR COLLECTION OF PAPERS
- Return a person
-
PROJECT: SUBMIT A SET OF PAPERS, ONE PER PERSON; DETERMINE the OVERLAP WITH A CONFERENCE
- Useful for seeing if your PC covers the topics of your conference
-
NOTES:
- Have students sign form declaring they will not release ACM DL data and will take appropriate measures to protect its privacy
- How will we get and pay for the cycles?
-
Code
- Error handling in
run_lda.sh
. More parameters inrun_lda.sh
(e.g., select different data sets). - Parameterize data sources
- current directory layout
- ACM data
- http://adsabs.harvard.edu/
- SIGCOMM
- Error handling in
-
LDAvis integration
-
Check out ITM.
- User interface for introducing new topics/killing old topics
-
Check out DTM Dynamic Topic Models.
- Explore topics being born, topics dying, evolving topic levels.
- DIM: Document Influence Model. Tracks which papers are influential
-
Check out CTM Correlated topic models.
-
Projects/extensions
- ACM Classifier inference
- Researcher models
- including citation graph information in citation graph
- doing it for networking research or another domain (use other data sources!)
- a general scraper API for acquiring data/getting info in to a database
Do something with the session data that Michael collected.