Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add support for BLIS to numpy.distutils #7294

Merged
merged 3 commits into from
Mar 1, 2016

Conversation

rgommers
Copy link
Member

@rgommers rgommers commented Feb 20, 2016

Note: the status of BLIS and some experiments/benchmarks in using it are discussed in gh-5479.

Besides adding a blis_info class plus the corresponding changes to site.cfg.example, a few things that generate spurious logging (checks for empty paths) are cleaned up.

Edit: mention that #5479 is now closed and #7372 has a more extensive discussion of BLIS.

@charris
Copy link
Member

charris commented Feb 20, 2016

Needs a note in 1.12.0-notes.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 20, 2016
@charris
Copy link
Member

charris commented Feb 22, 2016

@rgommers Could you add the relese note?

@rgommers
Copy link
Member Author

@charris done. Please don't merge yet though - @matthew-brett wanted to test this.

@charris charris removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 22, 2016
# [blis]
# libraries = blis
# library_dirs = /home/username/blis/lib
# include_dirs = /home/username/blis/include/blis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unusual, normally the include directory is just "include/" and subfolders are included via the #include line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm but it seems where blis puts its stuff, should probably have a comment that this needs to be the folder containing cblas.h which is prefix/blis by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done. also added a few more notes on compiling BLIS itself.

@juliantaylor
Copy link
Contributor

seems to work but import ends with numpy/core/multiarray.so: undefined symbol: cblas_cdotc_sub but that seems to be a problem in blis itself (dunnington and reference configuration)

@juliantaylor
Copy link
Contributor

you actually have to put #define BLIS_ENABLE_CBLAS into the appropriate bli_config.h to get cblas, then it seems to work

@matthew-brett
Copy link
Contributor

I also get undefined symbols starting with: arraytypes.obj : error LNK2001: unresolved external symbol cblas_zdotu_sub.

I don't get these errors if I specify BLIS in the blas section of site.cfg instead:

[blas]
blas_libs = numpy-blis-reference
library_dirs = c:\code\blis\test\lib
include_dirs = c:\code\blis\test\include

@matthew-brett
Copy link
Contributor

python setup.py config using the [blas] config section gives these flags for BLIS:

    libraries = ['numpy-blis-reference']
    define_macros = [('NO_ATLAS_INFO', 1)]
    library_dirs = ['c:\\code\\blis\\test\\lib']
    language = f77

Using the [blis] section, I get this config:

  FOUND:
    libraries = ['numpy-blis-reference', 'numpy-blis-reference']
    include_dirs = ['c:\\code\\blis\\test\\include']
    library_dirs = ['c:\\code\\blis\\test\\lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

@njsmith
Copy link
Member

njsmith commented Feb 23, 2016

Yeah, BLIS seems to export a fortran-compatible BLAS interface by default, and only export the CBLAS interface if specifically configured to do so.

@juliantaylor
Copy link
Contributor

@matthew-brett in your configuration in [blas] numpy will not use it, we need to have HAVE_CBLAS defined

@rgommers
Copy link
Member Author

It's of course a bit odd that for multiarray we only use cblas, while in linalg we do link and compile Fortran BLAS and LAPACK if available. But that is how it is now - I think defining HAVE_CBLAS is correct here. Right?

@rgommers
Copy link
Member Author

On the other hand, for Scipy it makes little sense. The Fortran interface would be better.

@juliantaylor
Copy link
Contributor

we will need both and all suitable blas libraries provide both. Its just not default in BLIS which can and should probably be changed.

Maybe one can think about adding two blas sections to the config, one for fortran blas and one for cblas, but the need didn't really come up yet.

@njsmith
Copy link
Member

njsmith commented Feb 23, 2016

It's not hard to enable CBLAS in BLIS if that's what we need...

@fgvanzee: any reason in particular why BLIS's CBLAS layer is disabled by default?

@rgommers
Copy link
Member Author

For reference, here's the BLIS config changes I had to make to make this work on 32-bit:

diff --git a/config/reference/bli_config.h b/config/reference/bli_config.h
index 5195e61..a1bd205 100644
--- a/config/reference/bli_config.h
+++ b/config/reference/bli_config.h
@@ -35,7 +35,11 @@
 #ifndef BLIS_CONFIG_H
 #define BLIS_CONFIG_H

+#endif
+

+#ifndef BLIS_ENABLE_CBLAS
+#define BLIS_ENABLE_CBLAS

 #endif

diff --git a/config/reference/make_defs.mk b/config/reference/make_defs.mk
index cf61534..1254c29 100644
--- a/config/reference/make_defs.mk
+++ b/config/reference/make_defs.mk
@@ -44,8 +44,8 @@ MAKE_DEFS_MK_INCLUDED := yes

 # Variables corresponding to other configure-time options.
 BLIS_ENABLE_VERBOSE_MAKE_OUTPUT := no
-BLIS_ENABLE_STATIC_BUILD        := yes
-BLIS_ENABLE_DYNAMIC_BUILD       := no
+BLIS_ENABLE_STATIC_BUILD        := no
+BLIS_ENABLE_DYNAMIC_BUILD       := yes



@@ -86,7 +86,7 @@ CDBGFLAGS      := #-g
 CWARNFLAGS     := -Wall
 COPTFLAGS      := -O2
 CKOPTFLAGS     := $(COPTFLAGS)
-CVECFLAGS      := #-msse3 -march=native # -mfpmath=sse
+CVECFLAGS      := -msse3 -march=native # -mfpmath=sse

 # Aggregate all of the flags into multiple groups: one for standard
 # compilation, and one for each of the supported "special" compilation
diff --git a/config/sandybridge/make_defs.mk b/config/sandybridge/make_defs.mk
index a2f7ee4..44b13ff 100644
--- a/config/sandybridge/make_defs.mk
+++ b/config/sandybridge/make_defs.mk
@@ -80,7 +80,7 @@ CC             := gcc
 # Enable IEEE Standard 1003.1-2004 (POSIX.1d). 
 # NOTE: This is needed to enable posix_memalign().
 CPPROCFLAGS    := -D_POSIX_C_SOURCE=200112L
-CMISCFLAGS     := -std=c99 -m64 -fopenmp  # -fopenmp -pg
+CMISCFLAGS     := -std=c99 -m32 -fopenmp  # -fopenmp -pg
 CPICFLAGS      := -fPIC
 CDBGFLAGS      := #-g
 CWARNFLAGS     := -Wall
diff --git a/frame/include/bli_config_macro_defs.h b/frame/include/bli_config_macro_defs.h
index 3f33c7e..343b570 100644
--- a/frame/include/bli_config_macro_defs.h
+++ b/frame/include/bli_config_macro_defs.h
@@ -45,7 +45,7 @@
 // internally within BLIS as well as those exposed in the native BLAS-like BLIS
 // interface.
 #ifndef BLIS_INT_TYPE_SIZE
-#define BLIS_INT_TYPE_SIZE               64
+#define BLIS_INT_TYPE_SIZE               32
 #endif


@@ -155,7 +155,7 @@
 // C99 type "long int". Note that this ONLY affects integers used within the
 // BLAS compatibility layer.
 #ifndef BLIS_BLAS2BLIS_INT_TYPE_SIZE
-#define BLIS_BLAS2BLIS_INT_TYPE_SIZE     64
+#define BLIS_BLAS2BLIS_INT_TYPE_SIZE     32
 #endif

It works, with

./configure reference   # auto doesn't work on 32-bit
make
make install

but I'm of course not sure if those are all the changes needed (Numpy doesn't exercise that much of BLAS). It's a bit fiddly....

@juliantaylor
Copy link
Contributor

as blis defaults to no cblas, maybe it is a good idea to add an explicit cblas test like the blas_info class has

@juliantaylor
Copy link
Contributor

but to encourage testing with blis I'm also fine with merging now, we can sort out details later.

@njsmith
Copy link
Member

njsmith commented Feb 23, 2016

diff --git a/config/reference/bli_config.h b/config/reference/bli_config.h
index 5195e61..a1bd205 100644
--- a/config/reference/bli_config.h
+++ b/config/reference/bli_config.h
@@ -35,7 +35,11 @@
 #ifndef BLIS_CONFIG_H
 #define BLIS_CONFIG_H

+#endif
+

+#ifndef BLIS_ENABLE_CBLAS
+#define BLIS_ENABLE_CBLAS

 #endif

Here I think you just want a bare #define BLIS_ENABLE_CBLAS inside the first #ifndef.

diff --git a/config/reference/make_defs.mk b/config/reference/make_defs.mk
index cf61534..1254c29 100644
--- a/config/reference/make_defs.mk
+++ b/config/reference/make_defs.mk
@@ -86,7 +86,7 @@ CDBGFLAGS      := #-g
 CWARNFLAGS     := -Wall
 COPTFLAGS      := -O2
 CKOPTFLAGS     := $(COPTFLAGS)
-CVECFLAGS      := #-msse3 -march=native # -mfpmath=sse
+CVECFLAGS      := -msse3 -march=native # -mfpmath=sse

I think this is wrong -- you're saying "make a binary that uses SSE3 instructions, and also all the instruction sets available on the machine where I'm compiling this". For our purposes I think we want something more like -msse2 -mfpmath=sse -mtune=generic.

diff --git a/config/sandybridge/make_defs.mk b/config/sandybridge/make_defs.mk
index a2f7ee4..44b13ff 100644
--- a/config/sandybridge/make_defs.mk
+++ b/config/sandybridge/make_defs.mk

You're not using the sandybridge config, so I think you can discard these changes :-)

diff --git a/frame/include/bli_config_macro_defs.h b/frame/include/bli_config_macro_defs.h
index 3f33c7e..343b570 100644
--- a/frame/include/bli_config_macro_defs.h
+++ b/frame/include/bli_config_macro_defs.h
@@ -45,7 +45,7 @@
 // internally within BLIS as well as those exposed in the native BLAS-like BLIS
 // interface.
 #ifndef BLIS_INT_TYPE_SIZE
-#define BLIS_INT_TYPE_SIZE               64
+#define BLIS_INT_TYPE_SIZE               32
 #endif


@@ -155,7 +155,7 @@
 // C99 type "long int". Note that this ONLY affects integers used within the
 // BLAS compatibility layer.
 #ifndef BLIS_BLAS2BLIS_INT_TYPE_SIZE
-#define BLIS_BLAS2BLIS_INT_TYPE_SIZE     64
+#define BLIS_BLAS2BLIS_INT_TYPE_SIZE     32
 #endif

These should probably be set in config/reference/bli_config.h. (Note that these are protected by #ifndef, i.e. these are default settings that only apply if the specified macros haven't already been set.)

Also, the default internal integer size should probably be sizeof(void*), not 32 or 64, because the reference configuration ought to work on both 32- and 64-bit systems out-of-the-box... and the BLAS/CBLAS-level public API should default to 32-bits unconditionally, because that's the de facto standard. (See also all the problems Julia had when trying to use OpenBLAS built with 64-bit API, before they renamed the symbols to avoid clashes with programs expecting dgemm etc. to be 32-bit.) But these are things to submit back upstream to blis...

@tkelman
Copy link
Contributor

tkelman commented Feb 23, 2016

But these are things to submit back upstream to blis

In a way that can be selected at build time without patching headers, ideally.

@fgvanzee
Copy link

@njsmith, CBLAS is disabled in most BLIS configurations by default because very few people I interact with need/use it. :)

@matthew-brett
Copy link
Contributor

@fgvanzee - would it be easy to enable CBLAS with a flag like make CBLAS=1 or similar?

Where is the best place to ask about default SSE2 and SSE3 templates? I think we're really close to being able to use BLIS by default, but at the moment the reference implementation is very often selected by the template selection algorithm, and that is too slow compared to the alternatives.

Is there any help we can offer in exchange?

@njsmith
Copy link
Member

njsmith commented Feb 24, 2016

@matthew-brett: from looking at the autodetection code, I think it's just poorly written. Two obvious problems jump out at me: (a) if the CPU supports AVX but the OS does not, then it falls back on reference, instead of falling back on SSE3. (b) since it has a hard-coded table of every CPU model, it needs to be updated every time a new CPU type is released. So e.g. it doesn't know anything about the latest skylake processors, and therefore they get the reference kernel instead of AVX2 like they should. Instead it should be checking directly for which instruction sets are supported using feature flags ("dunnington" = sse3, "sandybridge" = avx, "haswell" = avx2).

Also note that there are no SSE2-specific kernels, but the reference configuration can/should be built with -msse2 -mfpmath=sse to at least let the compiler attempt to autovectorize the loops -- this is not the default.

(@fgvanzee: for context note that @matthew-brett has been experimenting with building multiple configurations of BLIS and then using cpu-detection code to decide which version to load at runtime.)

@fgvanzee
Copy link

@matthew-brett:

  • On CBLAS: For now, no. The only way to enable CBLAS is to do by editing the configuration's bli_config.h file prior to configuration. I know that is not ideal for non-developers, but it's how the build system works for now. (BLIS was designed initially to be a developer's tool.)
  • On SSE: Not sure what you mean by templates.

@njsmith: The auto-detection code may very well be poorly written, but I did not write it--Xianyi did, during his visit. Now, we were very grateful that he put at least something in place, as prior to his stint here at UT there was zero auto-detection. As for SSE2/3, you are correct: we only have SSE3 kernels, and they are incomplete (real domain only, I think). But I don't know to what extent those systems are properly detected, and based on your comments it sounds like there are gaps. Given that we no longer have the hardware those were developed on, it will be up to others to test that functionality and propose patches.

@matthew-brett: I am rearchitecting BLIS to facilitate various features which are not practical (feasible) presently, including runtime auto-detection of kernels/blocksizes. That will probably result in a redesign of the configure-time build system as well. So, these issues are on our radar, but they take time. But the goal is not to "fill the tree" of support for all hardware. We will need contributions from the community to fill the gaps once the new software architecture is in place.

Again, not sure what you mean by "templates". Maybe you mean configurations? Or kernels?

@matthew-brett
Copy link
Contributor

Sorry - by templates I mean 'configurations' as in the sub-directories in your config directory.

Do you think it is possible, with something like the current state of BLIS, to make a collection of built configurations from which we could select at run-time, to give us a reasonably good performance on average across CPUs?

If that was possible, then I think you would find a lot of developer interest coming your way from us, from Julia, R and Octave, by which I mean issues and patches.

@fgvanzee
Copy link

@matthew-brett: The current BLIS software architecture does not allow one to change configuration information (kernels, blocksizes, etc.) at runtime. As I alluded to, I am working to facilitate this, but it requires a lot of groundwork. (Auto-detection is just one "application" of the general feature. Experts--those who know what they are doing--will be able to switch between custom kernels on-demand, if they wish. I tend to think of all of this under the umbrella of "runtime management" of what we currently think of as "the configuration." The CPUID/hardware-detection side of this is actually the least interesting part, to me, even if it is the most useful to end-users.)

@matthew-brett
Copy link
Contributor

@fgvanzee - sorry - I am not being clear. What we are experimenting with, is building all the currently defined x86 configurations into separate libraries. So we have a directory structure something like this:

reference/numpy-blis.dll
dunnington/numpy-blis.dll
haswell/numpy-blis.dll
...

Then, at runtime, we check cpuid (e.g. with https://pypi.python.org/pypi/x86cpu) and select the numpy-blis.dll library with the configuration likely to give the best performance for the CPU that we are running on.

@njsmith
Copy link
Member

njsmith commented Feb 25, 2016

@fgvanzee:

The auto-detection code may very well be poorly written, but I did not write it

Sincere apologies if that came across as criticism -- I'm aware that you didn't write it, and even if you had I wouldn't have considered it a judgement on you. All my code can certainly be improved further too :-)

@matthew-brett
Copy link
Contributor

@fgvanzee - following up here.

Let's say we have built each defined x86 configuration as a separate library, as I described above, and we have found some optimal way of choosing between these libraries using the CPU identification, do you think we would be able to get reasonable performance across a range of CPUs?

If not - what would it take for that to be possible?

@fgvanzee
Copy link

@njsmith: No offense taken. I just wanted to make clear that it was one of the few parts of BLIS that I did not author, and that I have not even really looked at myself. I will probably try to clean it up eventually, but I don't know exactly when that will be. Furthermore, it will depend on how successfully I can learn the absurd intricacies of CPUID, and I will tell you right now, I may get to a point where it makes me want to jump off of a building. If you or anyone in your circle has expertise in that, and can describe how the CPUID register values should be detected for the modern family of Intel and AMD systems, that would be welcome.

@matthew-brett: Yes, that clarifies things a bit. First, I hope we can agree that your extra-BLIS solution will become unnecessary in the short- to medium-term, since that functionality is planned for implementation within BLIS. Now, to your question: I feel like I'm missing something. If your multi-shared object approach uses reference and haswell, you will only get good performance on haswell/broadwell systems. If it uses reference, sandybridge, and haswell, you will only get good performance on sandy/ivybridge and haswell/broadwell, and so forth. (The reference configuration will always be slow, but it will always work.) I don't know what your definition of "reasonable range" of CPUs is.

@njsmith
Copy link
Member

njsmith commented Feb 28, 2016

@matthew-brett @tkelman @juliantaylor @fgvanzee:

You know, it occurs to me that it would be very straightforward to create our own runtime-autoconfiguring build of BLIS right now, as a temporary measure while Field is rearchitecting the core to support more powerful forms of runtime configuration. This could be done without any hackery to BLIS itself. Just create a new BLIS configuration called, config/runtime-x86-64 or something, with:

bli_kernel.h

#define BLIS_DGEMM_UKERNEL bli_selected_dgemm
#define BLIS_DEFAULT_MC_D bli_selected_mc_d
/* ... and so forth ... */

kernels/runtime-selected.c

static void (*selected_dgemm_pointer)(...);
int bli_selected_mc_d;
/* ...and so forth... */

static void select_kernels() {
    if (cpuid_says_we_have_avx2) {
        selected_dgemm_microkernel = avx2_kernel;
        /* magic values from haswell/bli_kernel.h */
        bli_selected_mc_d = 72;
        /* ... */
    } else if (cpuid_says_we_have_avx) {
        /* ... */
    } else {
        /* use reference settings */
    }
}

void bli_selected_dgemm(...) {
    if (!selected_dgemm_pointer) {
        select_kernels();
    }
    (*selected_dgemm_pointer)(...);
}

The only thing we'd need to change about the core of BLIS is that we'd probably want to modify the existing x86-64 kernels to have more unique names (e.g. renaming the current bli_dgemm_asm_8x6 to instead be bli_dgemm_avx2_asm_8x6), to avoid collisions and confusion.

For bonus points:

  • To eliminate the extra branch+indirect jump from the microkernel call (does it even matter?), we could mark select_kernels() as a constructor so it gets called automatically at startup, and then #define BLIS_DGEMM_UKERNEL to point directly selected_dgemm_pointer. This would require one tiny tweak to the BLIS core code -- right now there's some clever code in frame/include/bli_kernel_prototypes.h that tries to automatically declare a prototype for BLIS_DGEMM_UKERNEL, and this won't work if we've defined that macro to refer to a pointer-to-a-function instead of a function itself.
  • select_kernels() in this configuration could also check whether the user has specified any particular settings for the threading environment variables, and if not then make some better-than-nothing guess and set them.
  • If we want to get really fancy, we could use cpuid to check the cache configuration, and use that information when setting KC and friends (as per the formulas given Analytical modeling is enough for high-performance BLIS).

Of course this would be slightly klugey, but rather minimally: it'd be simple, would take advantage of the existing BLIS API configuration abstractions (and in particular, since all changes would be restricted to a new directory in config/ it would not conflict with any of Field's rearchitecting work), and it'd allow us to start experimenting with BLIS and autoconfiguration heuristics for real. (Presumably all the configuration heuristics we developed here could then be carried over directly when the "real" autoconfiguration API arrives.)

@tkelman
Copy link
Contributor

tkelman commented Feb 28, 2016

Anything that can be implemented around BLIS in C rather than Python would be much easier for Julia (and R, Octave, SciLua, anyone else) to test and use, so I like the sound of that.

@matthew-brett
Copy link
Contributor

For CPUID - the heavy lifting for https://github.com/matthew-brett/x86cpu is in the C code at https://github.com/matthew-brett/x86cpu/tree/master/src so would be easy enough to recycle into kernel selection.

@njsmith
Copy link
Member

njsmith commented Feb 28, 2016

@matthew-brett: also, a lot of the complexity there goes away if we don't care about supporting compilers without inline asm (and blis very much requires inline asm).

@tkelman
Copy link
Contributor

tkelman commented Feb 29, 2016

Anywhere you care about MSVC runtime compatibility but don't need Fortran, you should probably already be using clang.

@fgvanzee
Copy link

@njsmith: your interim solution sounds fine. Just be advised that you will probably need to rework your solution at least slightly (for example, frame/include/bli_kernel_prototypes.h will be going away) after my latest code refactoring is committed. (This factoring will lay the groundwork, but will not yet include, runtime management of kernels.)

EDIT: you've also hit on yet another item on my to-do list, which is to clean up the kernels directory. For example, the avx2 directory does not actually contain any instructions specific to AVX2. I know, it's awful. Adding architecture labels to the kernel function names, as you suggest, is one possibility to help out.

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

@fgvanzee: is your refactoring-in-progress available anywhere to peek at (e.g. on a personal branch)? And are there any kernels that you know that we should specifically steer clear of until you have time to clean things up more? (Or is there some way that someone external could help with the clean up?)

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

@rgommers: sorry this turned into a general chat channel about numpy+blis interaction -- I kinda lost track of what's going on with the underlying PR. Should we just merge it?

@rgommers
Copy link
Member Author

rgommers commented Mar 1, 2016

Can be merged as is, or I can add the CBLAS check to this PR as Julian suggested (but that'll take a few days). Either way is fine with me.

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

@rgommers: I don't think the CBLAS check is too urgent just because whether we have it or not, numpy+BLIS is going to remain an experts-only endeavour in several ways for the next while :-). Might well be a good idea to fix it up later, but we can do that as a separate PR?

@rgommers
Copy link
Member Author

rgommers commented Mar 1, 2016

sure

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

Then let's do this

njsmith added a commit that referenced this pull request Mar 1, 2016
ENH: add support for BLIS to numpy.distutils
@njsmith njsmith merged commit 25ac6b1 into numpy:master Mar 1, 2016
@matthew-brett
Copy link
Contributor

I refactored the CPUID etc code into:

@fgvanzee
Copy link

@njsmith: Sorry for the delay. I became overwhelmed with communication (and still am). And so I just kind of ignored most everything on github. Sometimes I feel like I'm autistic and I do strange things to cope.

Unfortunately, I don't typically use branches for interim work. There certainly hasn't been any reason to so far; I recently did a git diff of my work-in-progress, and it is north of 7MB. (I wasn't kidding when I told a few people that this is as close to a total rewrite of BLIS as the project will probably ever see.) Good news: I'm done with my first pass of changes. Bad news: now I get to try to compile it for the first time in three months. More good news: once everything works again, I will have enough confidence to just push it to master. (I tend to be a very careful commit pusher.)

@matthew-brett: Thanks for your efforts vis-a-vis the CPUID. That should be quite helpful. We may need to talk about licensing, though. There was a time when it was very important to our project that all of our code be "owned" by people employed by the university, and thus owned by UT. (I know, very much not in the spirit of OSS, but I don't call the shots.) Worst case, we would need to discuss what degree of rewrite I would need to perform to your code for it to qualify as "original" for the purposes of authorship (and thus having the new code qualify as being UT-authored).

@matthew-brett
Copy link
Contributor

Field - is there any license I can put on the code that would allow you to use the code (and still allow me to distribute and modify)?

@fgvanzee
Copy link

@matthew-brett: The answer is probably something very close to whatever is the least restrictive "license" possible. Public domain? Something narrower would probably work, something that granted copyright to UT. I'm not an IP expert, so I don't know exactly how this works. But we still have time to figure it out. I'm not quite ready to pivot to the CPU autodetection stuff yet. But we can definitely revisit this later. That will give me time to huddle with Robert and for us to determine whether I'm being overly conservative, and if not, determine who to contact within UT for further clarification.

@matthew-brett
Copy link
Contributor

I'm happy to make it public domain if that would help - just let me know.

@fgvanzee
Copy link

@matthew-brett: Many thanks for your flexibility and willingness to contribute. I'll get back to you.

@njsmith
Copy link
Member

njsmith commented Mar 12, 2016

@fgvanzee: No worries! I think we all understand very well about being overwhelmed by communication :-) I'll look forward to seeing what you've got when it's ready...

@rgommers rgommers deleted the blis-support branch July 2, 2018 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants