Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow slicing of virtual datasets with chunked source data #1597

Open
takluyver opened this issue Jul 28, 2020 · 7 comments
Open

Very slow slicing of virtual datasets with chunked source data #1597

takluyver opened this issue Jul 28, 2020 · 7 comments

Comments

@takluyver
Copy link
Member

I came across a case where reading a virtual dataset is very slow - more than an order of magnitude slower than reading the same data by iterating over it in a Python loop. Scripts to reproduce this are below. I suspect this is coming from HDF5 itself (cc @epourmal ), but I haven't yet tried to reproduce it without Python.

Version info:

h5py    2.10.0
HDF5    1.12.0
Python  3.8.5 | packaged by conda-forge | (default, Jul 24 2020, 01:25:15) 
[GCC 7.5.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.19.1
create.py
import h5py
import numpy as np

layout = h5py.VirtualLayout((1000, 16, 512, 128), dtype=np.uint32)

for i in range(16):
    arr = np.full((1000, 512, 128), i, dtype=np.uint32)
    with h5py.File(f'{i}.h5', 'w') as f:
        ds = f.create_dataset('a', data=arr, chunks=(1, 512, 128))
        layout[:, i] = h5py.VirtualSource(ds)

with h5py.File('vds.h5', 'w') as f:
    f.create_virtual_dataset('a', layout)
read.py
import h5py
import numpy as np
import time

print(h5py.version.info)

f = h5py.File('vds.h5', 'r')
ds = f['a']

t0 = time.perf_counter()
arr1 = ds[:50]
t1 = time.perf_counter()
print(f"Slicing: {t1 - t0:.03f} s")

arr2 = np.zeros((50,) + ds.shape[1:], dtype=ds.dtype)
for i in range(50):
    arr2[i] = ds[i]
t2 = time.perf_counter()
print(f"Loop: {t2 - t1:.03f} s")

np.testing.assert_array_equal(arr1, arr2)

I would expect slicing (ds[:50]) to be at least as fast as reading the same data in a loop, but I consistently see slicing taking about 13 seconds, and the loop taking about 0.25 s. I don't see this when reading a chunked dataset directly, nor a virtual dataset where the source data is contiguous.

@dallanto originally noticed this with real data. I've reproduced it using sample data, HDF5 1.12 and h5py master.

@takluyver
Copy link
Member Author

I can reproduce this in C, by modifying the h5_read.c example, so it's definitely coming from HDF5.

My modification of the C code is below. It works with the files made by create.py above. To test it:

h5cc -o h5_read h5_read.c
time ./h5_read
h5_read.c
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * Copyright by The HDF Group.                                               *
 * Copyright by the Board of Trustees of the University of Illinois.         *
 * All rights reserved.                                                      *
 *                                                                           *
 * This file is part of HDF5.  The full HDF5 copyright notice, including     *
 * terms governing use, modification, and redistribution, is contained in    *
 * the COPYING file, which can be found at the root of the source code       *
 * distribution tree, or in https://support.hdfgroup.org/ftp/HDF5/releases.  *
 * If you do not have access to either file, you may request a copy from     *
 * help@hdfgroup.org.                                                        *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

#include <stdlib.h>
#include "hdf5.h"

#define H5FILE_NAME        "vds.h5"
#define DATASETNAME "a"
#define NX 50           /* output buffer dimensions */
#define NY 16
#define NZ  512
#define NZZ 128
#define RANK         4

int
main (void)
{
    hid_t       file, dataset;         /* handles */
    hid_t       datatype, dataspace;
    hid_t       memspace;
    H5T_class_t t_class;                 /* data type class */
    H5T_order_t order;                 /* data order */
    size_t      size;                  /*
				        * size of the data element
				        * stored in file
				        */
    hsize_t     dims_out[4];           /* dataset dimensions */
    herr_t      status;

    int *data_out; /* output buffer */

    hsize_t      count[4];              /* size of the hyperslab in the file */
    hsize_t      offset[4];             /* hyperslab offset in the file */
    int          i, j, k, l, status_n, rank;

    data_out = malloc(NX * NY * NZ * NZZ * sizeof(int));

    /*
     * Open the file and the dataset.
     */
    file = H5Fopen(H5FILE_NAME, H5F_ACC_RDONLY, H5P_DEFAULT);
    dataset = H5Dopen2(file, DATASETNAME, H5P_DEFAULT);

    /*
     * Get datatype and dataspace handles and then query
     * dataset class, order, size, rank and dimensions.
     */
    datatype  = H5Dget_type(dataset);     /* datatype handle */
    t_class     = H5Tget_class(datatype);
    if (t_class == H5T_INTEGER) printf("Data set has INTEGER type \n");
    order     = H5Tget_order(datatype);
    if (order == H5T_ORDER_LE) printf("Little endian order \n");

    size  = H5Tget_size(datatype);
    printf(" Data size is %d \n", (int)size);

    dataspace = H5Dget_space(dataset);    /* dataspace handle */
    rank      = H5Sget_simple_extent_ndims(dataspace);
    status_n  = H5Sget_simple_extent_dims(dataspace, dims_out, NULL);
    printf("rank %d, dimensions %lu x %lu x %lu x %lu \n", rank,
	   (unsigned long)(dims_out[0]), (unsigned long)(dims_out[1]),
       (unsigned long)(dims_out[2]), (unsigned long)(dims_out[3]));

    /*
     * Define hyperslab in the dataset.
     */
    offset[0] = 0;
    offset[1] = 0;
    offset[2] = 0;
    offset[3] = 0;
    count[0]  = NX;
    count[1]  = NY;
    count[2]  = NZ;
    count[3]  = NZZ;
    status = H5Sselect_hyperslab(dataspace, H5S_SELECT_SET, offset, NULL,
				 count, NULL);
    printf("Selected in dataset %d\n", H5Sget_select_npoints(dataspace));

    /*
     * Define the memory dataspace.
     */
    memspace = H5Screate_simple(RANK, count, NULL);
    status = H5Sselect_all(memspace);
    printf("Selected in memspace %d\n", H5Sget_select_npoints(memspace));

    /*
     * Read data from hyperslab in the file into the hyperslab in
     * memory.
     */
    status = H5Dread(dataset, H5T_NATIVE_INT, memspace, dataspace,
		     H5P_DEFAULT, data_out);

    /*
     * Close/release resources.
     */
    free(data_out);
    H5Tclose(datatype);
    H5Dclose(dataset);
    H5Sclose(dataspace);
    H5Sclose(memspace);
    H5Fclose(file);

    return 0;
}

@epourmal
Copy link

We do have reports about VDS performance. I entered an issue HDFFV-11124 for this one. Unfortunately, I don't think we will have a bandwidth to investigate before October. Any help with profiling and more in depth analysis will be highly appreciated.
Thank you for reporting!
Elena

@takluyver
Copy link
Member Author

Thanks Elena! I'm not sure if I'll get time to delve into it further, but it's good to know that it's tracked, in any case.

@epourmal
Copy link

One thing I noticed that there is datatype conversion since memory buffer is native int vs. unsigned int, but I think the main issue comes from hyperslab selection. We do have similar reports and were going to look into optimizations. It is on our radar and has high priority.

@dallanto
Copy link

Thank you both. Having noticed this with h5py and real data, I wouldn't have stated the problem as clearly as @takluyver did.

@takluyver
Copy link
Member Author

Thanks, well spotted. I just tried using unsigned int and H5T_NATIVE_UINT, and the timings are still much the same.

@epourmal
Copy link

Thank you for checking! Yes, I wouldn't expect datatype conversion being a huge contributor to performance drop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants