Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred while performing LS-RTM on the 2D field data using stochastic gradient descent. #195

Closed
zhangxiaoshuotttt opened this issue Jul 31, 2023 · 7 comments

Comments

@zhangxiaoshuotttt
Copy link

zhangxiaoshuotttt commented Jul 31, 2023

Hi, @mloubout @kerim371
Please accept my warm greetings. I am writing to seek your guidance on a few queries that I have encountered.
JUDI 3.3.5
I need some help with a strange error I encountered while attempting to perform LS-RTM using stochastic gradient descent on 2D field data.
The code I’m using works fine with model data and produces good results. However, when I try to apply it to the field data, I keep getting an error. I’ve experimented with different values for the batchsize and niter parameters. Interestingly, I found that setting batchsize=10 and niter=5 allows the code to run successfully. But when I try larger values for batchsize and niter, it throws the error again.
Here is my code:

using Statistics, Random, LinearAlgebra, JOLI, Distributed
using JUDI, SegyIO, HDF5, PyPlot, IterativeSolvers,SlimPlotting

#######Load model ###########
function read_binary_file(file_path)
    data = Array{Float32}(undef, (2535, 200))  # Create a emppty array
    
    fid = open(file_path, "r")
    
    for i in 1:2535
        line = reinterpret(Float32,read(fid,200*sizeof(Float32)))
        data[i, :] = line
    end
    
    close(fid)
    
    return data
end


vpdata = read_binary_file("Real_model.vp")
Vpdata = vpdata/1000
m0 = 1f0./Vpdata.^2f0

n = (2535,200) # Number of gridpoints nx, nz
d = (25,25) # #n meters here
o = (0, 0) # In meters as well

model0 = Model(n,d,o,m0)

######Load shot gather######
my_dir="./trim_segy/Slected"
my_segy="Real_data_shot_Z"
block=segy_scan(my_dir,my_segy,["SourceX", "SourceY", "GroupX", "GroupY", "RecGroupElevation", "SourceSurfaceElevation", "dt"])
d_lin = judiVector(block)

# Set up wavelet
src_geometry = Geometry(block, key = "source", segy_depth_key = "SourceDepth")
wavelet = ricker_wavelet(src_geometry.t[1], src_geometry.dt[1], 0.020)    # 8 Hz wavelet
q = judiVector(src_geometry, wavelet)

# Setup operators
opt = Options(optimal_checkpointing=true)  
M = judiModeling(model0, q.geometry, d_lin.geometry; options=opt)
J = judiJacobian(M, q)

# Right-hand preconditioners (model topmute)
Mr = judiTopmute(model0; taperwidth=0)
# Left-hand Preconditionners (data top mute)
Ml = judiDataMute(q.geometry, d_lin.geometry)

#Set up ifo structure
ntComp=get_computational_nt(q.geometry,d_lin.geometry,model0)
info=Info(prod(model0.n),d_lin.nsrc,ntComp)

# Stochastic gradient
x = zeros(Float32, info.n)
batchsize = 10
niter = 10
fval = zeros(Float32, niter)

# Main loop
for j = 1: niter
    println("Iteration: ", j)

    # Select batch and set up left-hand preconditioner
    i = randperm(d_lin.nsrc)[1: batchsize]

    # Compute residual and gradient
    r = Ml[i]*J[i]*Mr*x - Ml[i]*d_lin[i]
    g = adjoint(Mr)*adjoint(J[i])*adjoint(Ml[i])*r

    # Step size and update variable
    fval[j] = .5f0*norm(r)^2
    t = norm(r)^2/norm(g)^2
    global x -= t*g
end


-------------------------------error-------------------------------------------------------------------------
DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 61 and 120")

Stacktrace:
 [1] _bcs1
   @ ./broadcast.jl:501 [inlined]
 [2] _bcs (repeats 2 times)
   @ ./broadcast.jl:495 [inlined]
 [3] broadcast_shape
   @ ./broadcast.jl:489 [inlined]
 [4] combine_axes
   @ ./broadcast.jl:484 [inlined]
 [5] instantiate
   @ ./broadcast.jl:266 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(-), Tuple{Matrix{Float32}, Matrix{Float32}}})
   @ Base.Broadcast ./broadcast.jl:883
 [7] materialize(bc::JUDI.MultiSource)
   @ JUDI ~/.julia/packages/JUDI/DUfUj/src/TimeModeling/Types/broadcasting.jl:45
 [8] -(ms1::judiVector{Float32, Matrix{Float32}}, ms2::judiVector{Float32, Matrix{Float32}})
   @ JUDI ~/.julia/packages/JUDI/DUfUj/src/TimeModeling/Types/broadcasting.jl:63
 [9] top-level scope
   @ ./In[53]:9

@kerim371
Copy link
Contributor

Hi Zhang,

The problem is because some of your shots have non-constant number of receivers ("arrays could not be broadcast to a common size; got a dimension with lengths 61 and 120"). The bigger the batchsize and the number of iterations the most likely you will encounter this error.

The question is: does the problem origin from your "incorrect" script (or data) or there is something wrong with the JUDI?

To investigate this I propose to output size of every operator for each shot in loop like:

for i in 1:d_lin.nsrc
  @info "size(Ml[i]): $(size(Ml[i]))"
  @info "size(J[i]): $(size(J[i]))"
  etc...
end

The idea is to check that each operator have equal number of receivers per each shot.

@zhangxiaoshuotttt
Copy link
Author

Hi Zhang,

The problem is because some of your shots have non-constant number of receivers ("arrays could not be broadcast to a common size; got a dimension with lengths 61 and 120"). The bigger the batchsize and the number of iterations the most likely you will encounter this error.

The question is: does the problem origin from your "incorrect" script (or data) or there is something wrong with the JUDI?

To investigate this I propose to output size of every operator for each shot in loop like:

for i in 1:d_lin.nsrc
  @info "size(Ml[i]): $(size(Ml[i]))"
  @info "size(J[i]): $(size(J[i]))"
  etc...
end

The idea is to check that each operator have equal number of receivers per each shot.

Hi Kerim, thank you for your advice. I have noticed that the issue might be due to some problems in my data. When I was preparing my field data, I tried to check it, but I couldn’t find the problem. It seems like all the field data have the same traces per source. Also, here is the output when I tried to follow your advice.

for j = 1: 10
    i = randperm(d_lin.nsrc)[1: batchsize]
    @info "size(Ml[i]): $(size(Ml[i]))"
    @info "size(J[i]): $(size(J[i]))"
    @info "size(d_lin[i]): $(size(d_lin[i]))"
end

[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)
[ Info: size(Ml[i]): (1500000, 1500000)
[ Info: size(J[i]): (AbstractSize(Dict{Symbol, Any}(:src => 10, :rec => Integer[120, 120, 120, 120, 120, 120, 120, 120, 120, 120], :time => Integer[1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250, 1250])), AbstractSize(Dict(:z => 200, :x => 2535)))
[ Info: size(d_lin[i]): (10,)

@zhangxiaoshuotttt
Copy link
Author

Actually, I attempted to convert the SGY format to SU format yesterday. During that process, I made sure to check the traces per source, but I didn’t come across any issues. Interestingly, I am able to use this field data for FWI, but I encounter difficulties when attempting to use it for RTM. Hence, I cannot confidently assert that the data is completely problem-free. Here is the specific error message I encountered while trying to perform RTM.

# Enable optimal checkpointing
opt = Options(
             limit_m = true,
             IC="isic",
             dft_subsampling_factor=8
	     )


# Setup operators
q_dist = generate_distribution(q)
F = judiModeling(model0, q.geometry, d_lin.geometry; options=opt)
J = judiJacobian(F, q)

# Set up random frequencies
nfreq = 8
J.options.frequencies = Array{Any}(undef, d_lin.nsrc)
J.options.dft_subsampling_factor = 8
for k=1:d_lin.nsrc
    J.options.frequencies[k] = select_frequencies(q_dist; fmin=0.003, fmax=0.04, nf=nfreq)
end

# Topmute
Ml = judiDataMute(q.geometry, d_lin.geometry)
d_lin = Ml*d_lin

rtm = adjoint(J)*d_lin


PyError ($(Expr(:escape, :(ccall(#= /home/master/.julia/packages/PyCall/twYvK/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'BlockingIOError'>
BlockingIOError(11, 'write could not complete without blocking', 0)


Stacktrace:
  [1] pyerr_check
    @ ~/.julia/packages/PyCall/twYvK/src/exception.jl:75 [inlined]
  [2] pyerr_check
    @ ~/.julia/packages/PyCall/twYvK/src/exception.jl:79 [inlined]
  [3] _handle_error(msg::String)
    @ PyCall ~/.julia/packages/PyCall/twYvK/src/exception.jl:96
  [4] macro expansion
    @ ~/.julia/packages/PyCall/twYvK/src/exception.jl:110 [inlined]
  [5] #107
    @ ~/.julia/packages/PyCall/twYvK/src/pyfncall.jl:43 [inlined]
  [6] disable_sigint
    @ ./c.jl:458 [inlined]
  [7] __pycall!
    @ ~/.julia/packages/PyCall/twYvK/src/pyfncall.jl:42 [inlined]
  [8] _pycall!(ret::PyCall.PyObject, o::PyCall.PyObject, args::Tuple{}, nargs::Int64, kw::Ptr{Nothing})
    @ PyCall ~/.julia/packages/PyCall/twYvK/src/pyfncall.jl:29
  [9] _pycall!
    @ ~/.julia/packages/PyCall/twYvK/src/pyfncall.jl:11 [inlined]
 [10] #_#114
    @ ~/.julia/packages/PyCall/twYvK/src/pyfncall.jl:86 [inlined]
 [11] PyObject
    @ ~/.julia/packages/PyCall/twYvK/src/pyfncall.jl:86 [inlined]
 [12] (::PyCall.var"#129#130")()
    @ PyCall ~/.julia/packages/PyCall/twYvK/src/pyinit.jl:264
 [13] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [14] invokelatest
    @ ./essentials.jl:706 [inlined]
 [15] execute_request(socket::ZMQ.Socket, msg::IJulia.Msg)
    @ IJulia ~/.julia/packages/IJulia/6TIq1/src/execute_request.jl:101
 [16] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [17] invokelatest
    @ ./essentials.jl:706 [inlined]
 [18] eventloop(socket::ZMQ.Socket)
    @ IJulia ~/.julia/packages/IJulia/6TIq1/src/eventloop.jl:8
 [19] (::IJulia.var"#15#18")()
    @ IJulia ./task.jl:417

@mloubout
Copy link
Member

You might have receivers outside the model. Some datasets come with a model that spans smaller area than the receivers so you end up with smaller synthetic data than field data since you don't model the receivers outside the model.

There us a utility function remove_out_of_bounds_receivers(recGeometry, recData, model) that will remove the receivers outside of the computational domain for safety for a single shot record.

@kerim371
Copy link
Contributor

@mloubout should the option limit_m=true help? Or it doesn't work for RTM? Zhang said FWI runs normally.

@mloubout
Copy link
Member

It does work for RTM and used but it extensively including with TTI but it might not work properly in that corner case where the recievers are outside the domain

@zhangxiaoshuotttt
Copy link
Author

You might have receivers outside the model. Some datasets come with a model that spans smaller area than the receivers so you end up with smaller synthetic data than field data since you don't model the receivers outside the model.

There us a utility function remove_out_of_bounds_receivers(recGeometry, recData, model) that will remove the receivers outside of the computational domain for safety for a single shot record.

Thank you so much for your assistance! I must admit that I overlooked an important detail. Earlier, I realized that a few of the source and receiver points were located outside of my velocity field. I will rectify this immediately and try the compilation process again. Your help is greatly appreciated. Thank you once more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants