Framework for statistical modeling of computer performance
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
article
doc
impl
keynote
.gitattributes
.gitignore
README.md
adaptor

README.md

logo

Contains 'Adaptor' computer performance modeling framework.

Author: Michael K. Pankov, graduate of Bauman Moscow State Technical University.

Installation

$ means super-user console (use sudo on Ubuntu). # means usual user console.

  • Python 2.7.*

    $ apt-get install python2.7

    • easy_install

      # wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | sudo python

    • pip

      $ easy_install pip

    • ipython

      $ apt-get install ipython

    • recordtype

      $ pip install recordtype

    • parse

      $ pip install parse

    • matplotlib

      $ pip install matplotlib

    • numpy

      $ pip install numpy

  • CouchDB

    $ apt-get install couchdb

    • CouchApp

      $ pip install couchapp

    • CouchDB Kit

      $ pip install couchdbkit

  • Tools

    $ pip install ipdb

  • Orange

    Refer to Orange Download page. Section "Building from source", subsection "setup.py".

Running

  • ipython

    • import system
    • import scenarios
    • cpdh_main(...)

Useful links

  1. Orange tutorial.
  2. Scikit-learn. Seems to have that we need. Average documentation.
    • Documentation turned out to be quite good (has explanation of models). Has many regression models, especially isotonic one, which is possibly what is useful for us. Has an Ubuntu package.
    • Tutorial showed it's a decent package, although lacking easy visualization, which is present in Orange in many forms.
  3. mlpy. Seems to have what we need. Best documentation.
    • Has a lot of regression models and decent Python-style documentation with examples (!). Has Ubuntu package.
  4. PyML. Seems to have what we need. Somewhat documented.
  5. Orange. Has graphical interface. Maybe has what we need. Average documentation.
    • Current option.
    • Orange turned out to be laggy and buggy (especially on Linux) and very poorly documented. Apart from that, it has a name which makes it impossible to Google for. It's graphical interactive version is barely usable. Maybe it's better for scripting however. We now will go with another option.

TODO

  1. [ ] Fix the system setup changing current directory.
  2. [ ] Add support of Windows.
  3. [ ] Add support of Polybench/GPU.
  4. [ ] Perform experiments on GPU.
  5. [-] Think over the workflow. It is as follows.
    • Overall, it's postponed till we have at least locally working system.
    1. [ ] Data is collected until certain number of experiments is performed.
    2. [ ] Model is learned on these experiments. It's as simple as possible. Since source code features and optimization flags present very big amount of features, it will possibly lead to overfitting. To avoid that, we should consider the use of aggregated features (like level of optimizations instead of individual ones). The model is either of two.
      • This model should take into account the hardware-software platform, dataset size and guess good compiler parameters to reach optimal performance.
      • This model should take into account the hardware-software platform, dataset size and make a prediction of performance given some fixed compiler settings.
    3. [ ] Search is directed using feature ranking — features ranked in top are explored first. However, the search existence itself should be reconsidered. Rather, just normal program launches should happen. Anyway, we then assume that some experiments were conducted the specified number of times. If we're lucky, we get new points in interesting area. System could tune settings automatically without notice to the user. It could piss him off, but it could be disabled at will. It would improve the search by searching in interesting area.
    4. [ ] New model is learned. Basically it's loop of experimenting and learning.
    • Scenario itself is trial to build regression model based on feature choice. Feature choice will be implemented to account for need of different models for different platforms, which is not obviously required per se.
    • Maybe doing an offline regression model building is not so useful. We should aim for online learning.
      • In general, the system should behave as a cloud service.
      • One thought is that we should periodically detect outliers for current model and re-learn it. When re-learning fails (as it will fail due to unexpected by current model observations), we add a new model, which is used with new examples. Outliers are removed from current model and new model is learn on them. The approach is flawed in detection of what outliers are actually unpredictable data, and what are just noise.
  6. [-] Add automatic building of dummy program.
  7. [-] Add dependency checking: numpy, recordtype, couchdb, couchdbkit, couchapp.

Ideas

Currently none.