Skip to content
This repository has been archived by the owner on May 31, 2023. It is now read-only.
Michael Greenberg edited this page Feb 2, 2016 · 25 revisions

PROJECT(S) DESCRIPTION

  • CENTRAL PROJECT: GUI AND INFRASTRUCTURE

    • Accesses ACM DL data
    • Operates over any subset of conferences or journals selected by the user
    • Allow users to name topics
      • 4 or 5 datasets
      • propose names; vote on them (up, down, offensive) Hive?
      • comment pages (DISQUS?) on topic listings, temporal graphs
    • Generate a Noam-style visualization for each data set
    • Explore additional methods for visualization
    • Allow us to generate logins for trusted users. Allow trusted users with logins to name and save their analyses for other users to view.
    • Limits vocabulary (use TFIDF a la Blei)
  • PROJECT: TOPIC NAME GENERATION

    • Generate names for topics automatically (in GUI, allow user override)
  • PROJECT: SEMI-AUTOMATED TOPIC CONSTRUCTION

    • Allow users to generate topics using a GUI (split, merge, see ITM )
  • PROJECT: CHOOSING THE RIGHT NUMBER OF TOPICS

    • See Noam-style AIC fitting
    • For a casual user, can we generate a heuristic for the right number of topics? Generate rules of thumb? Use number of conferences? Number of papers? Different priors for different areas? eg: Networks vs PL? Will this work?
  • PROJECT: ALLOW USERS TO SELECT MORE ADVANCED MODELS

    • Dynamic topic models
    • How do we visualize the results?
    • Can we compare the results to non-dynamic models? How do we analyze the impact?
    • track the influence of papers forward to other papers
  • PROJECT: COMBINE CORRELATED TOPIC MODELS AND DYNAMIC TOPIC MODELS

  • PROJECT: REPLACE THE ACM CLASSIFICATION SYSTEM.

    • Generate a topic model for all of the ACM with names. Justify your decisions.
  • PROJECT: REPLACE THE ACM SEARCH AND/OR RELATED WORK SEARCH

    • Combine topic models and citation and co-author data to improve related work search
  • PROJECT: SUBMIT A PAPER OR COLLECTION OF PAPERS

    • Return a person
  • PROJECT: SUBMIT A SET OF PAPERS, ONE PER PERSON; DETERMINE the OVERLAP WITH A CONFERENCE

    • Useful for seeing if your PC covers the topics of your conference
  • NOTES:

    • Have students sign form declaring they will not release ACM DL data and will take appropriate measures to protect its privacy
    • How will we get and pay for the cycles?

TODO

  • Code

    • Error handling in run_lda.sh. More parameters in run_lda.sh (e.g., select different data sets).
    • Parameterize data sources
  • LDAvis integration

  • Check out ITM.

    • User interface for introducing new topics/killing old topics
  • Check out DTM Dynamic Topic Models.

    • Explore topics being born, topics dying, evolving topic levels.
    • DIM: Document Influence Model. Tracks which papers are influential
  • Check out CTM Correlated topic models.

  • Projects/extensions

    • ACM Classifier inference
    • Researcher models
    • including citation graph information in citation graph
    • doing it for networking research or another domain (use other data sources!)
    • a general scraper API for acquiring data/getting info in to a database

Sessions analysis

Do something with the session data that Michael collected.

Clone this wiki locally