Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this available on windows? #150

Closed
snaik2016 opened this issue Oct 25, 2019 · 12 comments
Closed

Is this available on windows? #150

snaik2016 opened this issue Oct 25, 2019 · 12 comments

Comments

@snaik2016
Copy link

Can this library be built on windows?

@dskhudia
Copy link
Contributor

Hi @snaik2016,

We haven't tried building it on windows. We expect some issues as we generate code at runtime and the calling convention is different with windows compiler + platform. There may be some compiler incompatibility issues as well. However, we don't expect any major changes. Currently the support on windows platform is not planned to be added in near future but we welcome external contributions.

Thanks
Daya

@peterjc123
Copy link
Contributor

I tried with #238. Due to the fact that I don't have too much knowledge about assembly code, I cannot proceed any further. Could you please take a look there?

@peterjc123 peterjc123 mentioned this issue Jan 8, 2020
4 tasks
@dskhudia
Copy link
Contributor

dskhudia commented Jan 8, 2020

@peterjc123 Thanks for your contributions. Let me look at your PR. I have been making some progress on this with d06ea70 c960d51 fc3dfe3 and others recently.

@ykim362
Copy link
Contributor

ykim362 commented Jan 8, 2020

@dskhudia @peterjc123 @snaik2016 Actually, I've made FBGEMM fully functional on windows. https://github.com/marian-nmt/FBGEMM
It's based on 518d8a1 .
This fork of FBGEMM is used in marian-nmt.

I could do rebase on the latest branch, if needed.
I am aware of there are some new code including AVX512 based fp16. And, I have a little more code clean up.

dskhudia added a commit to dskhudia/FBGEMM that referenced this issue Jan 13, 2020
Summary:
Pull Request resolved: pytorch#240

For pytorch#150

Reviewed By: jspark1105

Differential Revision: D19324136

fbshipit-source-id: aa6148c2993d5ab9cb9d72d7f8a0942c4f0e454a
dskhudia added a commit to dskhudia/FBGEMM that referenced this issue Jan 13, 2020
Summary:
Pull Request resolved: pytorch#241

For pytorch#150

Some changes picked from with minor modifications  pytorch#238 . Thanks a lot for your contributions.

Reviewed By: jspark1105

Differential Revision: D19331181

fbshipit-source-id: 08dd3a8ffd65f781e3bda8d241b9eb0221f03669
facebook-github-bot pushed a commit that referenced this issue Jan 13, 2020
Summary:
Pull Request resolved: #240

For #150

Reviewed By: jspark1105

Differential Revision: D19324136

fbshipit-source-id: 24c2eeeafc905588c6db5b4aa5c20c2d5ef56351
facebook-github-bot pushed a commit that referenced this issue Jan 13, 2020
Summary:
Pull Request resolved: #241

For #150

Some changes picked from with minor modifications  #238 . Thanks a lot for your contributions.

Reviewed By: jspark1105

Differential Revision: D19331181

fbshipit-source-id: 130961190c19d369e323ca45dcd953c45d3bf964
jspark1105 pushed a commit to jspark1105/FBGEMM that referenced this issue Jan 18, 2020
Summary:
This PR includes instrinsic API based implementation of FP16 kernels for windows build.
AVX2 is slightly slower or similar speed to the inline assembly version.
AVX512 is slower than inline assembly version for large m size.

Anyway, they are much faster than the reference implementation.

I've done accuracy tests for AVX2 and AVX512.

This is for pytorch#150 .
Pull Request resolved: pytorch#254

Differential Revision: D19460473

Pulled By: jspark1105

fbshipit-source-id: b3dcb2e4a19cf315ae9e339959a659bd19562fa6
jspark1105 pushed a commit to jspark1105/FBGEMM that referenced this issue Jan 18, 2020
Summary:
This PR includes instrinsic API based implementation of FP16 kernels for windows build.
AVX2 is slightly slower or similar speed to the inline assembly version.
AVX512 is slower than inline assembly version for large m size.

Anyway, they are much faster than the reference implementation.

I've done accuracy tests for AVX2 and AVX512.

This is for pytorch#150 .
Pull Request resolved: pytorch#254

Differential Revision: D19460996

Pulled By: jspark1105

fbshipit-source-id: 4f5248140cd2f3ee3a40c1e330c2be3085ed718a
jspark1105 pushed a commit to jspark1105/FBGEMM that referenced this issue Jan 18, 2020
Summary:
This PR includes instrinsic API based implementation of FP16 kernels for windows build.
AVX2 is slightly slower or similar speed to the inline assembly version.
AVX512 is slower than inline assembly version for large m size.

Anyway, they are much faster than the reference implementation.

I've done accuracy tests for AVX2 and AVX512.

This is for pytorch#150 .
Pull Request resolved: pytorch#254

Differential Revision: D19460996

Pulled By: jspark1105

fbshipit-source-id: 4c29863aec069ce009c85d67cc42269f6acf569e
jspark1105 pushed a commit to jspark1105/FBGEMM that referenced this issue Jan 21, 2020
Summary:
This PR includes instrinsic API based implementation of FP16 kernels for windows build.
AVX2 is slightly slower or similar speed to the inline assembly version.
AVX512 is slower than inline assembly version for large m size.

Anyway, they are much faster than the reference implementation.

I've done accuracy tests for AVX2 and AVX512.

This is for pytorch#150 .
Pull Request resolved: pytorch#254

Differential Revision: D19460996

Pulled By: jspark1105

fbshipit-source-id: 537766a226a3e8436bee259a1240a775e819c34b
facebook-github-bot pushed a commit that referenced this issue Jan 22, 2020
Summary:
This PR includes instrinsic API based implementation of FP16 kernels for windows build.
AVX2 is slightly slower or similar speed to the inline assembly version.
AVX512 is slower than inline assembly version for large m size.

Anyway, they are much faster than the reference implementation.

I've done accuracy tests for AVX2 and AVX512.

This is for #150 .
Pull Request resolved: #254

Reviewed By: shz0116

Differential Revision: D19460996

Pulled By: jspark1105

fbshipit-source-id: 3542918b4224c057a7a239cfbe4b44f6ad618b5e
@shz0116
Copy link

shz0116 commented Jan 22, 2020

Can this library be built on windows?

@snaik2016 we are trying to port the code to Windows platform. One question is whether you need both Static and Shared FBGEMM libraries, or just static ? Do you know how important shared library is on Windows platforms compared with static ?
Icc @peterjc123 and @ykim362 @dskhudia @jspark1105

@ykim362
Copy link
Contributor

ykim362 commented Jan 22, 2020

@shz0116 In my use case in marian-nmt, I directly include those FBGEMM file into my marian VS project. And, the mother VS project generates a shared library (dll). So, I am using a shared library (I use all the files in FBGEMM). I use ' FBGEMM_EXPORTS' definition in the project.

@shz0116
Copy link

shz0116 commented Jan 22, 2020

@ykim362 so you have not seen the problem in https://github.com/pytorch/FBGEMM/issues/266 ?

@ykim362
Copy link
Contributor

ykim362 commented Jan 22, 2020

@shz0116 I don't have a problem in my forked branch. ( https://github.com/marian-nmt/FBGEMM ). I might need to merge the latest FBGEMM branch into marian-nmt, soon. So, I will let you know how it goes.

@ykim362
Copy link
Contributor

ykim362 commented Jan 23, 2020

@shz0116 I can see one difference between the old version and the latest version. 'FBGEMM_EXPORTS' is forcely defined in some cc files in the latest version.
e.g.

#define FBGEMM_EXPORTS

Could this cause potential mismatches between dllimport and dllexport?
In the PackAWithQuantRowOffset.cc file, it's defined with dllexport because of the definition. On the other hand, in other files (e.g. PackedFloatInOutBenchmark.cc), Fbgemm.h is included without FBGEMM_EXPORTS resulting dllimport.

@peterjc123
Copy link
Contributor

Can this library be built on windows?

@snaik2016 we are trying to port the code to Windows platform. One question is whether you need both Static and Shared FBGEMM libraries, or just static ? Do you know how important shared library is on Windows platforms compared with static ?
Icc @peterjc123 and @ykim362 @dskhudia @jspark1105

I guess the static build is okay if the size is not too big. Otherwise, it will fail in the linking stage because PyTorch is already large enough.

@peterjc123
Copy link
Contributor

The shared build should be working in #268.

@dskhudia
Copy link
Contributor

dskhudia commented Mar 3, 2020

FBGEMM works on windows now. Closing this issue.

@dskhudia dskhudia closed this as completed Mar 3, 2020
jspark1105 pushed a commit that referenced this issue Mar 21, 2020
Summary:
Pull Request resolved: #240

For #150

Reviewed By: jspark1105

Differential Revision: D19324136

fbshipit-source-id: 24c2eeeafc905588c6db5b4aa5c20c2d5ef56351
jspark1105 pushed a commit that referenced this issue Mar 21, 2020
Summary:
Pull Request resolved: #241

For #150

Some changes picked from with minor modifications  #238 . Thanks a lot for your contributions.

Reviewed By: jspark1105

Differential Revision: D19331181

fbshipit-source-id: 130961190c19d369e323ca45dcd953c45d3bf964
jspark1105 pushed a commit that referenced this issue Mar 21, 2020
Summary:
This PR includes instrinsic API based implementation of FP16 kernels for windows build.
AVX2 is slightly slower or similar speed to the inline assembly version.
AVX512 is slower than inline assembly version for large m size.

Anyway, they are much faster than the reference implementation.

I've done accuracy tests for AVX2 and AVX512.

This is for #150 .
Pull Request resolved: #254

Reviewed By: shz0116

Differential Revision: D19460996

Pulled By: jspark1105

fbshipit-source-id: 3542918b4224c057a7a239cfbe4b44f6ad618b5e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants