Skip to content
This repository has been archived by the owner on May 31, 2023. It is now read-only.
David Walker edited this page Sep 15, 2015 · 25 revisions

PROJECT(S) DESCRIPTION

  • GUI

    • Accesses ACM DL data
    • Operates over any subset of conferences or journals selected by the user
    • Allow users to name topics
    • Generate a Noam-style visualization for each data set
    • Explore additional methods for visualization
    • Allow users with logins to save their analyses for other users to view
  • TOPIC: NUMBER GENERATION

    • Generate names for topics automatically (in GUI, allow user override)
  • TOPIC: SEMI-AUTOMATED TOPIC CONSTRUCTION

    • Allow users to generate topics using a GUI (split, merge, see ITM )
  • TOPIC: CHOOSING THE RIGHT NUMBER OF TOPICS

    • See Noam-style AIC fitting
    • For a casual user, can we generate a heuristic for the right number of topics? Generate rules of thumb? Use number of conferences? Number of papers? Different priors for different areas? eg: Networks vs PL? Will this work?
  • TOPIC: ALLOW USERS TO SELECT MORE ADVANCED MODELS

    • Dynamic topic models
    • How do we visualize the results?
    • Can we compare the results to non-dynamic models? How do we analyze the impact?
    • track the influence of papers forward to other papers
  • TOPIC: COMBINE CORRELATED TOPIC MODELS AND DYNAMIC TOPIC MODELS

  • TOPIC: REPLACE THE ACM CLASSIFICATION SYSTEM.

    • Generate a topic model for all of the ACM with names. Justify your decisions.
  • TOPIC: REPLACE THE ACM SEARCH AND/OR RELATED WORK SEARCH

    • Combine topic models and citation and co-author data to improve related work search
  • TOPIC: SUBMIT A PAPER OR COLLECTION OF PAPERS

    • Return a person
  • TOPIC: SUBMIT A SET OF PAPERS, ONE PER PERSON; DETERMINE the OVERLAP WITH A CONFERENCE

    • Useful for seeing if your PC covers the topics of your conference
  • NOTES:

    • Have students sign form declaring they will not release ACM DL data and will take appropriate measures to protect its privacy
    • How will we get and pay for the cycles?

TODO

  • Code

    • Error handling in run_lda.sh. More parameters in run_lda.sh (e.g., select different data sets).
    • Parameterize data sources
  • LDAvis integration

  • Check out ITM.

    • User interface for introducing new topics/killing old topics
  • Check out DTM Dynamic Topic Models.

    • Explore topics being born, topics dying, evolving topic levels.
    • DIM: Document Influence Model. Tracks which papers are influential
  • Check out CTM Correlated topic models.

  • Projects/extensions

    • ACM Classifier inference
    • Researcher models
    • including citation graph information in citation graph
    • doing it for networking research or another domain (use other data sources!)
    • a general scraper API for acquiring data/getting info in to a database

Sessions analysis

Do something with the session data that Michael collected.

Server migration

We need to migrate the server away from Michael's hosting and to something more permanent at Princeton.

Dave

Request project and webspace at Princeton.

Request Princeton accounts

Clone this wiki locally