Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harsha/reorg #118

Closed
wants to merge 63 commits into from
Closed

Harsha/reorg #118

wants to merge 63 commits into from

Conversation

harsha-simhadri
Copy link
Collaborator

This PR organizes the code into (a) edge_tf + examples/tf, and (b) edgeml_pytorch + examples/pytorch. All notebooks and scripts need to be tested for disruptions in path.

Also has a cleaned up version of the multi-layer RNN and train_classifer.

Yash Gaurkar and others added 30 commits July 22, 2019 23:13
Copy link
Contributor

@metastableB metastableB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Harsha,

Few comments I've added inline with the code.

README.md Outdated

### Organization
- The `edgeml_tf` directory contains the graphs and models in TensorFlow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphs and models is not a standard terminology. Mode might even be wrong because by model people usually mean the matrices and those are not included. Could you just use something like implementation of inference and training routines of these algorithms in tensorflow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "edgeml_tf contains a definition of these architectures", and "examples/tf contains example training" runs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that works. Just remove the word model

README.md Outdated
### Details and project pages
For details, please see our
[project page](https://microsoft.github.io/EdgeML/),
[wiki](https://github.com/Microsoft/EdgeML/wiki/), and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-linking from the wiki might be a good idea. We aren't maintaining it. Its reddundant if we intent to move to microsofot.github.io/EdgeML anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

available in Tensorflow:

1. [Bonsai](../docs/publications/Bonsai.pdf)
2. [S-RNN](../docs/publications/srnn-??.pdf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not available yet. Please remove hyper-link.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sir.

[batchSize, timeSteps, inputDims] else
[timeSteps, batchSize, inputDims]
'''
class RNNCell(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: do we have to do anything special to the existing cells, the ones already native to torch, to make it ONIX compatible ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they should be fine to export to ONNX without this ugly contraption.

setup.py Outdated
@@ -0,0 +1,11 @@
import setuptools #enables develop

setuptools.setup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work. Did you test this?

For pip packages the structure is

package_name/
    package_code/
    README.md
    LICENSE.md
    setup.py

So, for our case with two packages, we will have (within $EDGEML_HOME)

edgeml_tf/
    edgeml_tf/ # The package code
    LICENSE
    README.md
    setup.py

and

edgeml_pytorch/
    edgeml_pytorch/ # The package code
    LICENSE
    README.md
    setup.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked this. I can change the name to setup_python since that needs to be released.

Do you want to release a TF package at this point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to edgeml_pytorch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we dont' have to release the tf package to pypi. Lets just stick to pytorch.

But that is not what I meant by structural change. Renaming setup.py doesnt have anything to do with that.

The confusion here is because the parent directory has the same name as the package. Let me try to explain what I am trying to say with the old organization. Earlier we had the following structure:

${EDGEML_ROOT}/
|
|-- tf/                          # <--- this directory is where you run pip from
  |-- edgeml_tf/                 # <--- This is the actual package. 
  |-- setup.py                   # <--- This is the setup.py for that package.
  |-- README.md        

Note how the setup.py and the top level directory of the package, that is the first part of edgeml_tf.graph.*, is on the same level.

In the new organization, the tf folder corresponds to the edgeml_tf folder and the tf/edgeml_tf folder corresponds to edgeml_tf/edgeml_tf folder. The code will be in edgeml_tf/edgeml_tf/graph and edgeml_tf/edgeml_tf/trainer etc.

Hope that makes sense. Here is an example for reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SachinG007 Could you have a look at this? Testing should fail in a new environment with this setup.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting that we create another folder $EDGEML_ROOT/pytorch into which edgeml_pytorch goes

Yes. For a few reasons.

  1. Even though we are only releasing pytorch package and want to move forward with pytorch, I still want to keep the tensorflow package structure in the repository. TF will definitely become useful sometime in the future for RNN related work. One simple scenario that we will immediately need is testing/benchmarking tf and pytorch code within the same virtualenvironment. This will require local installations of both the packages. For this to work, we need two setup.py files; both of them need to be in $EDGEML_ROOT/pytorch/setup.py and $EDGEML_ROOT/tf/setup.py.

  2. As of now, our edgeml repository is a collection of various sub-projects; C++, tensorflow package, pytorch package, applications. In the future, other packages might pop up that would require its own setup.py files. Lets just be future proof and isolate edgeml_pytorch to its own directory. Lets not put any sub-project specific files at the top level directory ($EDGEML_ROOT). Else we will have to deal with this re-org again.

@adityakusupati Can you have a look here? Do you have other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metastableB, @harsha-simhadri I don't understand python packaging and the norms that well.

  1. I completely agree with @metastableB on TF retaining its packaging structure until pytorch is viable for custom RNNs.
  2. I again agree with you, given the possibility of more tools/directories/frameworks which might come up, it is better to isolate installation of each of the packages.

However, having said the above two, I tried the old packaging format that is here and this works seamlessly for pytorch (I am not sure about TF as I haven't tested it) @SachinG007 and @pushkalkatara did the structure here ensure TF examples ran without issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityakusupati That is odd. The pytoch link should not work, unless you run pip install ./ from ${EDGEML_ROOT}. If you try to install the package from within the edgeml_pytorch folder, pip will complain that setup.py is not found.

Anyway, @harsha-simhadri the only thing you have to do is move $EDGEML_ROOT/edgeml_pytorch to $EDGEML_ROOT/pytorch/edgeml_pytorch/ and similarly move the $EDGEML_ROOT/edgeml_tf/ to $EDGEML_ROOT/tf/edgeml_tf/. Having done that, you then move $EDGEML_ROOT/setup.py to $EDGEML_ROOT/pytorch/. And create a setup file for the tf directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will create subfolders for TF and PyTorch. I wont create a TF set up file yet since I have not read through that code much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise a PR for that this weekend. Thanks.

@SachinG007
Copy link
Contributor

@harsha-simhadri I ran the notebooks and scripts after the path changes. They run fine

@harsha-simhadri
Copy link
Collaborator Author

@mr-yamraj @adityakusupati @metastableB Please take a look at the new organization and updates.
Much work remains in terms of making variable naming homogenous, and gixing ONNX warnings.

@harsha-simhadri , the org structure looks good except the stray SRNN example https://github.com/microsoft/EdgeML/tree/harsha/reorg/pytorch/examples/SRNN

fixed

@adityakusupati
Copy link
Contributor

@harsha-simhadri I ran the notebooks and scripts after the path changes. They run fine

@SachinG007 can you please follow-up on Issue 119. Your testing should have caught this error.

@mr-yamraj
Copy link
Contributor

mr-yamraj commented Aug 20, 2019

In examples folder, I think we should have another folder for data preparation.
Reason to do this:

  1. At present, the same script is stored in different folders. (For ex: process_usps.py)
  2. And the same processed data (ex: google-speech-data or usps) can be used to train models with scripts present in SRNN, FastCells, Bonsai, ProtoNN.

@metastableB
Copy link
Contributor

In examples folder, I think we should have another folder for data preparation.
Reason to do this:

  1. At present, the same script is stored in different folders. (For ex: process_usps.py)
  2. And the same processed data (ex: google-speech-data or usps) can be used to train models with scripts present in SRNN, FastCells, Bonsai, ProtoNN.

I'm against code sharing in examples. While I agree that data preparation in examples is pretty much the same and we can re-use the code, I would prefer keeping each example independent, isolated.

Let say we decide to share pre-processing . It is likely that some future algorithm will require a slightly different pre-processing method for the same dataset, and to support this we will have to add more ifs and buts on top of the common data processing code like:

  • First go to common data processing scripts and run the scripts to download and process data.
  • Then come here and run this additional processing script on the extracted data to make it compatible to the current algorithm (and so forth).

Its better we isolate examples and all code relating to it.

@SachinG007
Copy link
Contributor

SachinG007 commented Aug 20, 2019

ERROR: File "setup.py" not found.
On executing pip install -e .

@adityakusupati
Copy link
Contributor

@SachinG007 the setup file is renamed, please do the needful.
@mr-yamraj , I agree with @metastableB about data preprocessing. Every point he makes is valid.

@mr-yamraj
Copy link
Contributor

mr-yamraj commented Aug 20, 2019

@metastableB @adityakusupati
I feel that the users of Edgeml will try to use examples in the following way (Let's say to train KWS models):
If I have to train a FastGRNN Model, then I will go inside FastCell dir run data preprocessing then run training script.
Then if I want to try out SRNN Model, then I will go inside SRNN dir run data preprocessing again then run training script.

@harsha-simhadri
Copy link
Collaborator Author

In examples folder, I think we should have another folder for data preparation.
Reason to do this:

  1. At present, the same script is stored in different folders. (For ex: process_usps.py)
  2. And the same processed data (ex: google-speech-data or usps) can be used to train models with scripts present in SRNN, FastCells, Bonsai, ProtoNN.

@mr-yamraj Agree regarding data prep being another folder under example. That is a future PR. Lets not wait on that for this PR

@harsha-simhadri
Copy link
Collaborator Author

will start another PR once directory structure is updated to @metastableB 's request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants