## Important Installation information:

The important libraries for you to install are h5py and pyJHTDB for the download itself, and pyMP if you are using the parallel version. The h5py is also important if you are to export data to hdf5.

To install h5py and pyJHTDB via pip:

pip install pyJHTDB

The pyMP library is also installed via pip:

pip install pymp-pypi

In [1]:
import os
import sys
import time
import pymp
import h5py
import numpy as np
import pyJHTDB
from pyJHTDB.dbinfo import isotropic1024coarse
from pyJHTDB import libJHTDB

Here are the parameters from the database to be downloaded. We chose to download the from isotropic database, the $t=0.0$ snapshots. To do it I need to download in 32-grid points wide chunks. 

In [2]:
Nx = isotropic1024coarse['nx']; Ny = isotropic1024coarse['ny']; Nz = isotropic1024coarse['nz']
Lx = isotropic1024coarse['lx']; Ly = isotropic1024coarse['ly']; Lz = isotropic1024coarse['lz']

dataset = 'isotropic1024coarse'
getFunction='Velocity'
t = 0.0; nx=Nx; ny=Ny; nz=Nz
chkSz = 32; slabs = nx//chkSz

Here is the serial version of the download. Here it took around 42.5 minutes to download and re-shape properly the data. Please input your authentication token in place vacant below. It is necessary for all downloads

In [3]:
t1 = time.time()

auth_token = "com.gmail.jhelsas-b854269a"

################################################

lJHTDB=libJHTDB(auth_token)
lJHTDB.initialize() #NOTE: datbase returns Velcocity as [lz,ly,lx,3]
        
for k in range(slabs):
    print("slab number : "+str(k))
    start=np.array([k*chkSz,0,0],dtype=np.int)
    width=np.array([chkSz,Ny,Nz],dtype=np.int)
    uAll=lJHTDB.getRawData(t,start,width,data_set=dataset,getFunction=getFunction)
    if(k==0):
        vx=uAll[:,:,:,0]
        vy=uAll[:,:,:,1]
        vz=uAll[:,:,:,2]
    else:
        vx=np.concatenate((vx,uAll[:,:,:,0]),axis=2) 
        vy=np.concatenate((vy,uAll[:,:,:,1]),axis=2)
        vz=np.concatenate((vz,uAll[:,:,:,2]),axis=2)

lJHTDB.finalize()

t2 = time.time()
sys.stdout.write('Download from the database: {0:.2f} seconds\n'.format(t2-t1))

u=np.zeros((Nx,Ny,Nz),dtype='float32')
v=np.zeros((Nx,Ny,Nz),dtype='float32')
w=np.zeros((Nx,Ny,Nz),dtype='float32')

u[:,:,:]=np.transpose(vx)
v[:,:,:]=np.transpose(vy)
w[:,:,:]=np.transpose(vz)

################################################

t3 = time.time()
sys.stdout.write('Reshaping: {0:.2f} seconds\n'.format(t3-t2))

slab number : 0


KeyboardInterrupt: 

Here is the disk write of the same data, in 8 files, in case we are running a parallel execution. 

In [None]:
t1 = time.time()
nproc = 8
for k in range(nproc):
    folder = "/home/idies/workspace/scratch"
    filename = "dwn-isotropic1024coarse-"+str(k)+".npz"
    filet = folder + "/" + filename
    np.savez(filet,u=u[k*(Nx//nproc):(k+1)*(Nx//nproc),:,:],v=v[k*(Nx//nproc):(k+1)*(Nx//nproc),:,:],w=w[k*(Nx//nproc):(k+1)*(Nx//nproc),:,:],nproc=nproc)
t2 = time.time()
sys.stdout.write('Write in disk: {0:.2f} seconds\n'.format(t2-t1))

In my machine I needed to move the temporary directory to have pyMP to work properly. You might be able to skip this part.

In [3]:
os.environ['TMPDIR']='/home/idies/workspace/scratch'

Alocate the arrays to be used in parallel. It is important to be alocated this way instead of numpy default array alocations.

In [4]:
t1 = time.time()

shu = pymp.shared.array((Nx,Ny,Nz), dtype='float32')
shv = pymp.shared.array((Nx,Ny,Nz), dtype='float32')
shw = pymp.shared.array((Nx,Ny,Nz), dtype='float32')

t2 = time.time()
sys.stdout.write('Download from the database: {0:.2f} seconds\n'.format(t2-t1))

Download from the database: 90.99 seconds


Here is the pyMP version of the download. Here it took around 4 minutes to download and re-shape properly the data, and around 3 minutes just to download. Please input your authentication token in place vacant below. It is necessary for all downloads.

In [5]:
t1 = time.time()

auth_token = "com.gmail.jhelsas-b854269a"

lJHTDB=libJHTDB(auth_token)
lJHTDB.initialize() 

chkSz = 32
threads = 8
slabSize = Nx//threads
chks = slabSize//chkSz

with pymp.Parallel(threads) as p:
    for idx in p.range(0,threads):
        t01 = time.time()
        for k in range(chks):
            if(idx==0):
                print("slab number : "+str(k))
            
            start=np.array([idx*slabSize+k*chkSz,0,0],dtype=np.int)
            width=np.array([chkSz,ny,nz],dtype=np.int)
            uAll=lJHTDB.getRawData(t,start,width,data_set=dataset,getFunction=getFunction)
            if(k==0):
                vx=uAll[:,:,:,0]
                vy=uAll[:,:,:,1]
                vz=uAll[:,:,:,2]
            else:
                vx=np.concatenate((vx,uAll[:,:,:,0]),axis=2) 
                vy=np.concatenate((vy,uAll[:,:,:,1]),axis=2)
                vz=np.concatenate((vz,uAll[:,:,:,2]),axis=2)
                
        t02 = time.time()
        if(idx==0):
            sys.stdout.write('download from database: {0:.2f} seconds\n'.format(t02-t01)) 
                
        u=np.zeros((Nx//threads,ny,nz),dtype='float32')
        v=np.zeros((Nx//threads,ny,nz),dtype='float32')
        w=np.zeros((Nx//threads,ny,nz),dtype='float32')

        u[:,:,:]=np.transpose(vx)
        v[:,:,:]=np.transpose(vy)
        w[:,:,:]=np.transpose(vz)
            
        shu[idx*slabSize:(idx+1)*slabSize,:,:] = u[:,:,:]
        shv[idx*slabSize:(idx+1)*slabSize,:,:] = v[:,:,:]
        shw[idx*slabSize:(idx+1)*slabSize,:,:] = w[:,:,:]
                
        t03 = time.time()
        if(idx==0):
            sys.stdout.write('Reshape: {0:.2f} seconds\n'.format(t03-t02))
            
lJHTDB.finalize()
t2 = time.time()
sys.stdout.write('Getting the data: {0:.2f} seconds\n'.format(t2-t1))

slab number : 0
slab number : 1
slab number : 2
slab number : 3
download from database: 224.58 seconds
Reshape: 76.34 seconds
Getting the data: 410.92 seconds


Here is to write the data from the pyMP version of the download into disc. 

In [6]:
print(shu[shu==0])
print(shv[shv==0])
print(shw[shw==0])

[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


In [7]:
t1 = time.time()
nproc = 8
for k in range(nproc):
    folder = "/home/idies/workspace/scratch"
    filename = "xi-isotropic1024coarse-"+str(k)+".npz"
    filet = folder + "/" + filename
    np.savez(filet,u=shu[k*(Nx//nproc):(k+1)*(Nx//nproc),:,:],v=shv[k*(Nx//nproc):(k+1)*(Nx//nproc),:,:],w=shw[k*(Nx//nproc):(k+1)*(Nx//nproc),:,:],nproc=nproc)
t2 = time.time()
sys.stdout.write('Write in disk: {0:.2f} seconds\n'.format(t2-t1))

Write in disk: 316.44 seconds


Here is to write the same data into a hdf5 file to be read elsewere. 

In [8]:
t1 = time.time()

h5f = h5py.File(folder+'/'+'velocity.h5','w')
h5f.create_dataset('u',data=shu)
h5f.create_dataset('v',data=shv)
h5f.create_dataset('w',data=shw)
h5f.close()

t2 = time.time()
sys.stdout.write('Write hdf5 file: {0:.2f} seconds\n'.format(t2-t1))

Write hdf5 file: 111.75 seconds
