Add MLCube integration #1

davidjurado · 2022-03-04T15:40:00Z

DataPerf Speech Example - MLCube integration

Project setup

# Create Python environment and install MLCube Docker runner 
virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker

# Fetch the implementation from GitHub
git clone https://github.com/harvard-edge/dataperf-speech-example && cd ./dataperf-speech-example
git fetch origin pull/1/head:feature/MLCube-integration && git checkout feature/MLCube-integration

Project structure

Tasks execution

# Run download task
mlcube run --task=download -Pdocker.build_strategy=always

# Run select task
mlcube run --task=select -Pdocker.build_strategy=always

# Run evaluate task
mlcube run --task=evaluate -Pdocker.build_strategy=always

Execute complete pipeline

# Run all steps
mlcube run --task=download,select,evaluate -Pdocker.build_strategy=always

* test commit * Delete unncessary files. Add utils and constants for supporting functions in eval * Add core supporting functions for model trainig and scoring * Add main functionality to eval, and supporting utils functions. Update requirements * Add folder structure. Add random training file and its results for testing setup. Minor fix to constant and setup file * Add gitignore to ignore everything except test file. Delete selection folder since it is not necessary * Add gitignore file to ignore all files in train sets except random_500.csv * Simplify output readout to avoid bug * Updated file and methods to match with previous design pattern * Add data file as input in main function and yaml file so all paths in yaml are relative * Add docker-compose file, and modify dockerfile, requirements and main accordingly * Fixed type hinting as suggested in PR review

colbybanbury · 2022-11-02T17:51:37Z

@davidjurado How would someone specify the workspace/ directory to MLCube?

Also is there a way to point to a file outside of workspace/? For example config_files/?

davidjurado · 2022-11-12T18:08:27Z

Hello @colbybanbury,

I'm sorry for the late reply, I didn't get a notification of your comment.

To specify a different workspace folder you can use --workspace and then provide the path, for example:

mlcube run --task=select --workspace=path/to/new_folder

To point a file outside the workspace folder you need to have a parameter for the task you want to run, this is defined in the mlcube.yamlfile, for example, in the task select you have the following parameters:

select:
    # Run selection algorithm
    parameters:
      inputs:
        {
          allowed_training_set: { type: file, default: data/preliminary_evaluation_dataset/allowed_training_set.yaml },
          train_embeddings_dir: data/preliminary_evaluation_dataset/train_embeddings/,
        }
      outputs: { outdir: select_output/ }

and let's say we want to define a different allowed_training_set, we need to specify the name of the parameters to override and provide the absolute path of the new file we want to use:

mlcube run --task=select allowed_training_set=/Users/me/allowed_training_set.yaml

davidjurado added 5 commits March 4, 2022 10:37

Add MLCube integration

d7d1281

Fix PR number

bdab6ed

Rename select script output path inside mlcube.yaml

c8d9502

Add project structure diagram

9da1f2e

Add explanation on how to execute the complete pipeline

d713aa6

davidjurado added 2 commits June 22, 2022 09:56

Update MLCube integration

513c39f

Update MLCube pipeline

dfc53de

dfeddema mentioned this pull request Jul 1, 2022

Speech Containerization of Algorithm mlcommons/mlcube#228

Open

davidjurado and others added 2 commits July 13, 2022 09:33

Add dataperf config to MLCube workspace folder

011f881

Merge branch 'main' into feature/MLCube-integration

f8aa32f

moving dataperf_speech_config.yaml to workspace/

740d5c4

colbybanbury requested a review from mmaz November 2, 2022 17:58

colbybanbury merged commit 8f9bb5c into harvard-edge:main Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLCube integration #1

Add MLCube integration #1

davidjurado commented Mar 4, 2022 •

edited

Loading

colbybanbury commented Nov 2, 2022

davidjurado commented Nov 12, 2022

Add MLCube integration #1

Add MLCube integration #1

Conversation

davidjurado commented Mar 4, 2022 • edited Loading

DataPerf Speech Example - MLCube integration

Project setup

Project structure

Tasks execution

Execute complete pipeline

colbybanbury commented Nov 2, 2022

davidjurado commented Nov 12, 2022

davidjurado commented Mar 4, 2022 •

edited

Loading