Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please offer bazel BUILD files to build the repository? #379

Closed
Yablon opened this issue Apr 27, 2020 · 8 comments
Closed

Could you please offer bazel BUILD files to build the repository? #379

Yablon opened this issue Apr 27, 2020 · 8 comments
Assignees

Comments

@Yablon
Copy link

Yablon commented Apr 27, 2020

I can build with tensorflow's BUILD file. But at runtime, the following codes would give "Sementation Fault"

#include <libxsmm.h>
#include <vector>
int main(int argc, char* argv[])
{
  typedef double value_type;
  int batchsize = 1000, m = 13, n = 5, k = 7;
  std::vector<value_type> a(batchsize*m*k), b(batchsize*k*n), c(m*n, 0);
  /* C/C++ and Fortran interfaces are available */
  typedef libxsmm_mmfunction<value_type> kernel_type;
  /* generates and dispatches a matrix multiplication kernel (C++ functor) */
  kernel_type kernel(LIBXSMM_GEMM_FLAG_NONE, m,n,k, 1.0/*alpha*/, 1.0/*beta*/);
  assert(kernel);
  for (int i = 0; i < batchsize; ++i) { /* initialize input */
    a[i*m*k] = static_cast<value_type>(1) / (i % 25);
    b[i*k*n] = static_cast<value_type>(7) / (i % 75);
  }
  /* kernel multiplies and accumulates matrix products: C += Ai * Bi */
  for (int i = 0; i < batchsize; ++i) kernel(&a[i*m*k], &b[i*k*n], &c[0]);
}

my BUILD file is

# Description:
#    LIBXSMM: Library for small matrix-matrix multiplications targeting Intel Architecture (x86).
package(
    default_visibility = ["//visibility:public"],
    licenses = ["notice"],  # Apache 2.0
)

# Description:
#    LIBXSMM: Library for small matrix-matrix multiplications targeting Intel Architecture (x86).

exports_files(["LICENSE.md"])

# Arguments to ./scripts/libxsmm_interface.py, see that file for detailed description.
#  precision: SP & DP
#  prefetch: 1 (auto)
libxsmm_interface_arguments = "0 1"

# Arguments to ./scripts/libxsmm_config.py, see that file for detailed description.
# rely on default arguments
libxsmm_config_arguments = ""

# Arguments to ./scripts/libxsmm_dispatch.py, see that file for detailed description.
#  (dummy argument)
libxsmm_dispatch_arguments = "0"

genrule(
    name = "libxsmm_headers",
    srcs = [
        "src/template/libxsmm.h",
        "src/template/libxsmm_config.h",
    ],
    outs = [
        "include/libxsmm.h",
        "include/libxsmm_config.h",
        "include/libxsmm_dispatch.h",
    ],
    cmd = "$(location :libxsmm_interface) $(location src/template/libxsmm.h) " + libxsmm_interface_arguments + " > $(location include/libxsmm.h);" +
          "$(location :libxsmm_config) $(location src/template/libxsmm_config.h) " + libxsmm_config_arguments + " > $(location include/libxsmm_config.h);" +
          "$(location :libxsmm_dispatch) " + libxsmm_dispatch_arguments + " > $(location include/libxsmm_dispatch.h)",
    tools = [
        ":libxsmm_config",
        ":libxsmm_dispatch",
        ":libxsmm_interface",
    ],
    visibility = [
        "//visibility:public"
    ],
)

cc_library(
    name = "xsmm_avx",
    srcs = glob(
        [
            # general source files (translation units)
            "src/generator_*.c",
            "src/libxsmm_*.c",
        ],
        exclude = [
            # exclude generators (with main functions)
            "src/libxsmm_generator_*.c",
        ],
    ),
    hdrs = glob(
        [
            # general header files
            "include/libxsmm_*.h",
            # trigger rebuild if template changed
            "src/template/*.c",
            "src/*.h"
        ],
        exclude = [
            # exclude existing/generated headers
            "include/libxsmm.h",
            "include/libxsmm_config.h",
            "include/libxsmm_dispatch.h",
        ],
    ) + [
        # source files included internally
        "src/libxsmm_hash.c",
        # generated header files
        "include/libxsmm.h",
        "include/libxsmm_config.h",
        "include/libxsmm_dispatch.h",
    ],
    #copts = [
    #    "-mavx",  # JIT does not work without avx anyway, and this silences some CRC32 warnings.
    #    "-Wno-vla",  # Libxsmm convolutions heavily use VLA.
    #],
    defines = [
        "LIBXSMM_BUILD",
        "LIBXSMM_CTOR",
        "__BLAS=0",
    ],
    includes = [
        "include",
        "src",
        "src/template",
    ],
    linkopts = ["-lpthread"],
    visibility = ["//visibility:public"],
)

py_library(
    name = "libxsmm_scripts",
    srcs = glob(["scripts/*.py"]),
    data = ["version.txt"],
)

py_binary(
    name = "libxsmm_interface",
    srcs = ["scripts/libxsmm_interface.py"],
    deps = [":libxsmm_scripts"],
)

py_binary(
    name = "libxsmm_config",
    srcs = ["scripts/libxsmm_config.py"],
    deps = [":libxsmm_scripts"],
)

py_binary(
    name = "libxsmm_dispatch",
    srcs = ["scripts/libxsmm_dispatch.py"],
    deps = [":libxsmm_scripts"],
)
@hfp hfp self-assigned this Apr 27, 2020
@Yablon
Copy link
Author

Yablon commented Apr 27, 2020

I build the dynamic library using the following:

wget https://github.com/hfp/libxsmm/archive/1.14.tar.gz
tar xvf 1.14.tar.gz
cd libxsmm-1.14/
make STATIC=0

and the following code gives an segmentation fault error:

#include <libxsmm.h>
#include <vector>

int main(int argc, char* argv[]) {
  float alpha = 1.0;
  float beta = 1.0;
  typedef libxsmm_bfloat16 value_type;
  int batchsize = 1, m = 1, n = 32, k = 32;
  std::vector<value_type> a(batchsize*m*k), b(batchsize*k*n), c(m*n, 0);
  libxsmm_bmmfunction kernel;
  kernel = libxsmm_bmmdispatch(m, n, k, &m, &n, &k, &alpha, &beta, NULL, NULL);
  assert(kernel);
  for (int i = 0; i < batchsize; ++i) { /* initialize input */
    a[i*m*k] = static_cast<value_type>(1);
    b[i*k*n] = static_cast<value_type>(7);
  }
  /* kernel multiplies and accumulates matrix products: C += Ai * Bi */
  for (int i = 0; i < batchsize; ++i) kernel(&a[i*m*k], &b[i*k*n], &c[0]);
}

my BUILD file is as following:

cc_library(
  name = "xsmm_avx",
  srcs = glob(["lib/*.so"]),
  hdrs = glob(
      [
          # trigger rebuild if template changed
          "src/template/*.c",
          "src/*.h",
          "include/*.h",
      ]
  ) + [
      # source files included internally
      "src/libxsmm_hash.c",
  ],
  includes = [
      "include",
      "src",
      "src/template",
  ],
  linkopts = ["-lpthread"],
  visibility = ["//visibility:public"],
)

Would you please tell me where I am wrong ?

@Yablon
Copy link
Author

Yablon commented Apr 27, 2020

@hfp I am worried about that you may not see the message upwards. So excuse me for @ you. Thank you !

hfp added a commit that referenced this issue Apr 27, 2020
…The purpose is help getting started for those who wish to use Bazel. There is no intent to change our build system (make) or to actually support Bazel beyond this minimal starting point. Please note, the current "support" relies on header-only LIBXSMM's with "zero-config".
@hfp
Copy link
Collaborator

hfp commented Apr 27, 2020

You may want to read the following commit message to realize the limitation of what was added. For your convenience, you can find our "Hello World" example equipped with some support for Bazel.

@Yablon
Copy link
Author

Yablon commented Apr 27, 2020

Thank you for your kind and useful replies ! It's so nice of you.

I am looking for a GEMM library for our deep learning applications. I have 2 matrix, one is the kernel which is fixed in inference, the other is the activation which varies time to time. The size is 1 * 512 * 512, and the MM will repeat many times.
I used to use a GEMM library that packs the kernel before computing, and the packed kernel would be used for many times.

Recently I have demand for faster MM and Sparse MM that computed in bfloat16. Could I ask if the libxsmm will make it in your opinion ?
By the way, can you show me a libxsmm_bmmdispatch example ?

Thank you !

@hfp
Copy link
Collaborator

hfp commented Apr 27, 2020

Thank you, Yablon for your kind words!

By the way, can you show me a libxsmm_bmmdispatch example ?

For deep learning, the bf16sgemm sample can be interesting. It uses a flavour of our kernel that not only runs a singular GEMM but rather a whole batch of multiplications in one call. The kernel is available in several flavours accepting different input data. The sample code shows the strided flavour ("strd"). This sort of code is especially suitable for code that wants to control the loop-nest running the kernel.

Recently I have demand for faster MM and Sparse MM that computed in bfloat16. Could I ask if the libxsmm will make it in your opinion ?

For further questions, @alheinecke may be also willing to help.


There is a whole collection of other code samples that perform more out-of-the-box tasks related to deep learning (https://github.com/hfp/libxsmm/tree/master/samples/deeplearning). The GxM framework among this set of samples would be also able to run entire CNN topologies (prototxt). Further, you may want to have a look at PlaidML (please note it's the "v1" branch), which uses LIBXSMM when targeting CPUs.

@alheinecke
Copy link
Collaborator

I would look into https://github.com/hfp/libxsmm/blob/master/samples/xgemm/kernel.c for a good bf16 example. The bf16sgemm mentioned above is very specific case for trailing updates / sizes.

@Yablon
Copy link
Author

Yablon commented Apr 27, 2020

hi, @alheinecke, thank you for your reply.

Do I get a code dispatch one in that example ? I got thousands of hundreds of matrix mulplication between a varying A and fixed B. I want to minimize the cost between the same mulplication.

@Yablon
Copy link
Author

Yablon commented Apr 27, 2020

hi, @hfp Thank you ! I have read the example and I will try and compare with my current GEMM library. It's late in night in East Asia now. I would test it tomorrow moring and hope you safe and sound.

@Yablon Yablon closed this as completed Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants