Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrootd/client: add bind request, implement support for multiple sockets #288

Merged
merged 1 commit into from
Jul 13, 2018

Conversation

EgorMatirov
Copy link
Contributor

No description provided.

Copy link
Member

@sbinet sbinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
although perhaps quite complicated, would it be possible to add a test that specifically triggers this new feature?

pathID xrdproto.PathID
}

// pendingRequest is the request that having been sent to the server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pendingRequest is the request that having been sent to the server./pendingRequest is a request that has been sent to the remote server./

requests map[xrdproto.StreamID][]byte
requests map[xrdproto.StreamID]pendingRequest

childsMu sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

childs isn't english (I know... english is annoying :P), but children is.

what about s/child/sub/, short for sub-session?
then everything reads better:

subsMu      sync.RWMutex
subCreateMu sync.Mutex
subs        map[xrdproto.PathID]*session

maxSubs  int
freeSubs chan xrdproto.PathID

what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
My bad.
You're right, sub looks way better.

requests map[xrdproto.StreamID][]byte
requests map[xrdproto.StreamID]pendingRequest

childsMu sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, you should probably say what the 2 mutexes are for.
IIUC, childCreationMu makes sure we serialize the creation of new "sub" sessions, while childsMu protects the access to the map.

you might want to properly describe this in the documentation of these fields.

childCreationMu sync.Mutex
childs map[xrdproto.PathID]*session

maxChilds int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the usage of maxChilds and freeChilds, it seems to me one could just create the chan xrdproto.PathID with a buffer size of maxChilds. (although I haven't worked out the specifics.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of maxChilds is to restrict connections count to the specific value. We can afford up to 255 sub-sessions (due to pathID being byte), however, it's not always a good solution and server may restrict max number of connections. It even can be obtained from the server as part of query request IIRC.

I'm not sure about buffer. IIUC, it doesn't restrict max number of values and there is no way of obtaining buffer size (am I wrong?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, one can use cap to get the buffer size of a channel:
https://play.golang.org/p/hl99j1KG__3

still thinking about it, though :)

@EgorMatirov
Copy link
Contributor Author

EgorMatirov commented Jul 10, 2018

would it be possible to add a test that specifically triggers this new feature?

Actually, current read \ write tests are already using it.

Regarding mock tests - I'll look into it but it looks complicated from first sight because session tries to connect to the "remote" server. :-/

Any ideas regarding test failure? It looks like tests for rootio were running, however, they pass on my machine without problems.

@sbinet
Copy link
Member

sbinet commented Jul 10, 2018

travis is in a weird state these days.
ccxrootdgotest.in2p3.fr is also acting a bit...

on my laptop, the rootio passes but the xrootd ones fail:

?   	go-hep.org/x/hep/xrootd/xrdfs	0.005s [no test files]
2018/07/10 16:23:46 Dispatch 1: INIT, NodeId: 0. data: {7.26 Ra 0x20000 NO_OPEN_SUPPORT,CAP_PARALLEL_DIROPS,CAP_POSIX_ACL,ASYNC_READ,POSIX_LOCKS,FLOCK_LOCKS,READDIRPLUS,READDIRPLUS_AUTO,ASYNC_DIO,CAP_PARALLEL_DIROPS,WRITEBACK_CACHE,BIG_WRITES,SPLICE_WRITE,SPLICE_READ,IOCTL_DIR,SPLICE_MOVE,AUTO_INVAL_DATA,ATOMIC_O_TRUNC,EXPORT_SUPPORT,DONT_MASK} 
2018/07/10 16:23:46 Serialize 1: INIT code: OK value: {7.23 Ra 0x20000 AUTO_INVAL_DATA,ASYNC_READ,READDIRPLUS,NO_OPEN_SUPPORT,BIG_WRITES 0/0 Wr 0x10000 Tg 0x0}
2018/07/10 16:23:46 Dispatch 2: LOOKUP, NodeId: 1. names: [xrdfuse-759858020] 18 bytes
2018/07/10 16:23:46 Serialize 2: LOOKUP code: 5=input/output error value: {NodeId: 0 Generation=0 EntryValid=0.000 AttrValid=0.000 Attr={M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000000 M 0.000000000 C 0.000000000}}
2018/07/10 16:23:46 Dispatch 3: FORGET, NodeId: 1. data: {Nlookup=1} 
--- FAIL: TestFS_Mkdir (0.30s)
    --- FAIL: TestFS_Mkdir/ccxrootdgotest.in2p3.fr:9001 (0.30s)
        xrdfuse_test.go:38: got error: xrdfuse: error calling Stat: xrootd: error 3011: Unable to locate /tmp/xrdfuse-759858020; no such file or directory
        xrdfuse_test.go:65: could not create dir: mkdir /tmp/xrdfuse-945268185/xrdfuse-759858020: input/output error
2018/07/10 16:23:46 Dispatch 1: INIT, NodeId: 0. data: {7.26 Ra 0x20000 ASYNC_READ,POSIX_LOCKS,FLOCK_LOCKS,READDIRPLUS,NO_OPEN_SUPPORT,CAP_PARALLEL_DIROPS,CAP_POSIX_ACL,READDIRPLUS_AUTO,ASYNC_DIO,CAP_PARALLEL_DIROPS,BIG_WRITES,SPLICE_WRITE,SPLICE_READ,IOCTL_DIR,WRITEBACK_CACHE,ATOMIC_O_TRUNC,EXPORT_SUPPORT,DONT_MASK,SPLICE_MOVE,AUTO_INVAL_DATA} 
2018/07/10 16:23:46 Serialize 1: INIT code: OK value: {7.23 Ra 0x20000 BIG_WRITES,AUTO_INVAL_DATA,ASYNC_READ,READDIRPLUS,NO_OPEN_SUPPORT 0/0 Wr 0x10000 Tg 0x0}
2018/07/10 16:23:46 Dispatch 2: LOOKUP, NodeId: 1. names: [xrdfuse-036145718] 18 bytes
2018/07/10 16:23:46 Serialize 2: LOOKUP code: 5=input/output error value: {NodeId: 0 Generation=0 EntryValid=0.000 AttrValid=0.000 Attr={M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000000 M 0.000000000 C 0.000000000}}
2018/07/10 16:23:46 Dispatch 3: FORGET, NodeId: 1. data: {Nlookup=1} 
--- FAIL: TestFS_OpenDir (0.33s)
    --- FAIL: TestFS_OpenDir/ccxrootdgotest.in2p3.fr:9001 (0.33s)
        xrdfuse_test.go:38: got error: xrdfuse: error calling Stat: xrootd: error 3011: Unable to locate /tmp/xrdfuse-036145718; no such file or directory
        xrdfuse_test.go:99: could not create dir: mkdir /tmp/xrdfuse-909483379/xrdfuse-036145718: input/output error
2018/07/10 16:23:46 Dispatch 1: INIT, NodeId: 0. data: {7.26 Ra 0x20000 BIG_WRITES,SPLICE_WRITE,SPLICE_READ,IOCTL_DIR,WRITEBACK_CACHE,ATOMIC_O_TRUNC,EXPORT_SUPPORT,DONT_MASK,SPLICE_MOVE,AUTO_INVAL_DATA,ASYNC_READ,POSIX_LOCKS,FLOCK_LOCKS,READDIRPLUS,NO_OPEN_SUPPORT,CAP_PARALLEL_DIROPS,CAP_POSIX_ACL,READDIRPLUS_AUTO,ASYNC_DIO,CAP_PARALLEL_DIROPS} 
2018/07/10 16:23:46 Serialize 1: INIT code: OK value: {7.23 Ra 0x20000 NO_OPEN_SUPPORT,ASYNC_READ,READDIRPLUS,BIG_WRITES,AUTO_INVAL_DATA 0/0 Wr 0x10000 Tg 0x0}
2018/07/10 16:23:46 Dispatch 2: LOOKUP, NodeId: 1. names: [xrdfuse-276473560] 18 bytes
2018/07/10 16:23:46 Serialize 2: LOOKUP code: 5=input/output error value: {NodeId: 0 Generation=0 EntryValid=0.000 AttrValid=0.000 Attr={M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000000 M 0.000000000 C 0.000000000}}
2018/07/10 16:23:46 Dispatch 3: FORGET, NodeId: 1. data: {Nlookup=1} 
--- FAIL: TestFS_Rename (0.31s)
    --- FAIL: TestFS_Rename/ccxrootdgotest.in2p3.fr:9001 (0.31s)
        xrdfuse_test.go:38: got error: xrdfuse: error calling Stat: xrootd: error 3011: Unable to locate /tmp/xrdfuse-276473560; no such file or directory
        xrdfuse_test.go:152: could not create dir: mkdir /tmp/xrdfuse-563275805/xrdfuse-276473560: input/output error
2018/07/10 16:23:47 Dispatch 1: INIT, NodeId: 0. data: {7.26 Ra 0x20000 READDIRPLUS_AUTO,ASYNC_DIO,CAP_PARALLEL_DIROPS,BIG_WRITES,SPLICE_WRITE,SPLICE_READ,IOCTL_DIR,WRITEBACK_CACHE,ATOMIC_O_TRUNC,EXPORT_SUPPORT,DONT_MASK,SPLICE_MOVE,AUTO_INVAL_DATA,ASYNC_READ,POSIX_LOCKS,FLOCK_LOCKS,READDIRPLUS,NO_OPEN_SUPPORT,CAP_PARALLEL_DIROPS,CAP_POSIX_ACL} 
2018/07/10 16:23:47 Serialize 1: INIT code: OK value: {7.23 Ra 0x20000 BIG_WRITES,AUTO_INVAL_DATA,ASYNC_READ,READDIRPLUS,NO_OPEN_SUPPORT 0/0 Wr 0x10000 Tg 0x0}
2018/07/10 16:23:47 Dispatch 2: LOOKUP, NodeId: 1. names: [xrdfuse-399177674] 18 bytes
2018/07/10 16:23:47 Serialize 2: LOOKUP code: 5=input/output error value: {NodeId: 0 Generation=0 EntryValid=0.000 AttrValid=0.000 Attr={M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000000 M 0.000000000 C 0.000000000}}
2018/07/10 16:23:47 Dispatch 3: FORGET, NodeId: 1. data: {Nlookup=1} 
--- FAIL: TestFS_Mknod (0.31s)
    --- FAIL: TestFS_Mknod/ccxrootdgotest.in2p3.fr:9001 (0.31s)
        xrdfuse_test.go:38: got error: xrdfuse: error calling Stat: xrootd: error 3011: Unable to locate /tmp/xrdfuse-399177674; no such file or directory
        xrdfuse_test.go:226: could not create file: open /tmp/xrdfuse-053566551/xrdfuse-399177674: input/output error
2018/07/10 16:23:47 Dispatch 1: INIT, NodeId: 0. data: {7.26 Ra 0x20000 BIG_WRITES,SPLICE_WRITE,SPLICE_READ,IOCTL_DIR,WRITEBACK_CACHE,ATOMIC_O_TRUNC,EXPORT_SUPPORT,DONT_MASK,SPLICE_MOVE,AUTO_INVAL_DATA,CAP_POSIX_ACL,ASYNC_READ,POSIX_LOCKS,FLOCK_LOCKS,READDIRPLUS,NO_OPEN_SUPPORT,CAP_PARALLEL_DIROPS,READDIRPLUS_AUTO,ASYNC_DIO,CAP_PARALLEL_DIROPS} 
2018/07/10 16:23:47 Serialize 1: INIT code: OK value: {7.23 Ra 0x20000 READDIRPLUS,NO_OPEN_SUPPORT,ASYNC_READ,BIG_WRITES,AUTO_INVAL_DATA 0/0 Wr 0x10000 Tg 0x0}
2018/07/10 16:23:47 Dispatch 2: LOOKUP, NodeId: 1. names: [xrdfuse-303039884] 18 bytes
2018/07/10 16:23:47 Serialize 2: LOOKUP code: 5=input/output error value: {NodeId: 0 Generation=0 EntryValid=0.000 AttrValid=0.000 Attr={M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000000 M 0.000000000 C 0.000000000}}
--- FAIL: TestFS_Chmod (0.32s)
    --- FAIL: TestFS_Chmod/ccxrootdgotest.in2p3.fr:9001 (0.32s)
        xrdfuse_test.go:38: got error: xrdfuse: error calling Stat: xrootd: error 3011: Unable to locate /tmp/xrdfuse-303039884; no such file or directory
        xrdfuse_test.go:265: could not create file: open /tmp/xrdfuse-409627809/xrdfuse-303039884: input/output error
FAIL
FAIL	go-hep.org/x/hep/xrootd/xrdfuse	1.583s

@sbinet
Copy link
Member

sbinet commented Jul 10, 2018

hum... the travis build for "master" (so w/o this PR) is consistently successful:

while this PR is consistently failing (also tried a couple of times.)

locally, the tests are ok on my laptop.
so definitely something fishy. perhaps not totally related to travis, though.

@sbinet
Copy link
Member

sbinet commented Jul 10, 2018

did your tests passed locally when running with -race ?

@EgorMatirov
Copy link
Contributor Author

Yes, both the xrootd and rootio pass locally with -race specified.

@EgorMatirov
Copy link
Contributor Author

It hangs at

=== RUN   TestFileDirectory/root://ccxrootdgotest.in2p3.fr:9001/tmp/rootio/testdata/small-flat-tree.root
2018/07/10 15:46:03 xrootd: adding new subsession in progress... Current subsessions are: map[]

which means that somehow binding session (either dialing, handshake, or bind request) hangs.
I'm still unable to reproduce it locally however.

@codecov-io
Copy link

codecov-io commented Jul 10, 2018

Codecov Report

Merging #288 into master will increase coverage by 0.14%.
The diff coverage is 79.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #288      +/-   ##
==========================================
+ Coverage   51.31%   51.45%   +0.14%     
==========================================
  Files         213      214       +1     
  Lines       22015    22118     +103     
==========================================
+ Hits        11297    11381      +84     
- Misses       9440     9451      +11     
- Partials     1278     1286       +8
Impacted Files Coverage Δ
xrootd/xrdproto/protocol.go 0% <ø> (ø) ⬆️
xrootd/client/bind.go 100% <100%> (ø)
xrootd/client/handshake.go 55.55% <100%> (ø) ⬆️
xrootd/client/session.go 57.7% <78.94%> (+15.49%) ⬆️
rio/rio.go 47.34% <0%> (-0.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add6bc8...4d8a726. Read the comment docs.


maxSubs int
freeSubs chan xrdproto.PathID
isSub bool // indicates whether this session is sub-session.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is sub-session/is a sub-session/

return nil
}

func (sess *session) send(ctx context.Context, streamID xrdproto.StreamID, responseChannel mux.DataRecvChan, header []byte, body []byte, pathID xrdproto.PathID) ([]byte, *mux.Redirection, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/header []byte, body []byte/header, body []byte/

@@ -299,3 +438,43 @@ func (sess *session) sign(streamID xrdproto.StreamID, requestID uint16, data []b

return wBuffer.Bytes(), nil
}

func newChildSession(ctx context.Context, parent *session) (*session, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/newChildSession/newSubSession/

}
}

if !sess.isSub {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! so this was the main issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

For some reason, bind fails with cross-host bind not allowed on CI. I had a guess that it complains about logins from 4 different hosts (running test session) with same login. However, when I tried to make username include current pid nothing changed, so looks like that's not that case.

Due to failing bind, mux was closed and it looked like "request hangs".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.
I am not sure how much of this is "by design" and how much is "that's just the way it is".
perhaps something to clarify with upstream?

@sbinet
Copy link
Member

sbinet commented Jul 12, 2018

So, with the assumption that this CORS-like error comes from the way tests are run under Travis, I guess we can leave the tests to be run when people do 'go test' on their machine, and disable it when on Travis.
(Let me know if you need guidance for that.)

@EgorMatirov
Copy link
Contributor Author

(Let me know if you need guidance for that.)

First of all, current tests fallback to the sending data over a single socket.

I'm going to add bind_test.go that tests that bind request works.

As far as I understand, conditional running is obtained via build tags. So, for example, we can set inside .travis.yml the env variable TAGS= "-tags travis" and check that it's not set in bind_test.go via +build !travis.

Is it better to use opposite way and run that test only if a specified tag was passed (because it can fail for the same reason for someone else and bother people, especially since vgo runs tests for all dependencies IIRC)?

@sbinet
Copy link
Member

sbinet commented Jul 12, 2018

I think using a 'travis' tag is fine at the moment.

If vgo fails us, we'll have ample time to adjust before it becomes really mainstream (Feb 2019.)

Thanks!

@sbinet sbinet merged commit 0c63256 into go-hep:master Jul 13, 2018
@sbinet
Copy link
Member

sbinet commented Jul 13, 2018

thanks!

@EgorMatirov EgorMatirov deleted the xrootd-bind branch July 13, 2018 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants