Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#110][#111] List operation handles large collections #114

Merged
merged 7 commits into from
May 26, 2021

Conversation

korydraughn
Copy link
Collaborator

This needs to be tested.

@trel
Copy link
Member

trel commented Jan 22, 2021

seems good - ready to # if we're tested/confirmed. what changed in the force push?

@korydraughn
Copy link
Collaborator Author

Working on and running test right now. The force push involved a change to the pom.xml file. I needed to revert part of the file so that the build succeeds.

@trel
Copy link
Member

trel commented Jan 22, 2021

got it - thanks.

@korydraughn
Copy link
Collaborator Author

korydraughn commented Jan 23, 2021

I've added a BATS test that verifies that the list operation produces the correct number of entries for a large collection (in this case 3000 files). The test passed.

We'll have to look into whether this PR resolves #110. We may need to use apache httpd to simulate that issue.

@trel
Copy link
Member

trel commented Jan 23, 2021

great - yes, let's see if sanger may want to test this branch themselves, too. if you're not seeing hitches in the listings anymore, i'm fine to get it #'d and merged. can always leave the issue open until they confirm as fixed.

@trel
Copy link
Member

trel commented Jan 25, 2021

@kript @bh9 please eyeball this when you get a chance

@michael-conway
Copy link
Member

michael-conway commented Jan 25, 2021 via email

@kript
Copy link

kript commented Jan 25, 2021

This sounds exiting! However I shall defer to my esteemed colleague @ac55-sanger who is leading the charge here....

@ac55-sanger
Copy link

Sure, sounds good, happy to test.
Will update once done. Thanks.

@ac55-sanger
Copy link

Hi,

docker build is failing on korydraughn:110 with sleepycat dependency issue:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] nfs4j-irodsvfs ..................................... SUCCESS [  1.144 s]
[INFO] nfsrods ............................................ FAILURE [03:58 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:03 min
[INFO] Finished at: 2021-01-26T14:08:52+00:00
[INFO] Final Memory: 20M/714M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project nfsrods: Could not resolve dependencies for project org.irods.jargon:nfsrods:jar:1.0.1: Could not find artifact com.sleepycat:je:jar:7.3.7 in dcache-snapshots (https://download.dcache.org/nexus/content/repositories/releases) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :nfsrods
The command '/bin/sh -c cd irods_client_nfsrods &&     git checkout ${_sha} &&     mvn clean install -Dmaven.test.skip=true' returned a non-zero code: 1

I tried to add it in pom.xml and re-build but didn't work.
Could you please have a look and let me know?
Thanks!

@korydraughn
Copy link
Collaborator Author

Will look into this.

@korydraughn
Copy link
Collaborator Author

@ac55-sanger Mike and I have resolved the build issue. Please give this another shot when you get a chance.

@ac55-sanger
Copy link

Sure, thanks.
Will update.

@ac55-sanger
Copy link

ac55-sanger commented Jan 29, 2021

@korydraughn
This is still failing to build with same

Could not resolve dependencies for project org.irods.jargon:nfsrods:jar:1.0.1: Could not find artifact com.sleepycat:je:jar:7.3.7 in dcache-snapshots

dependency issue.

Steps followed:
$ git clone https://github.com/korydraughn/irods_client_nfsrods.git
$ cd irods_client_nfsrods/
$ git checkout 110
$ docker build -t nfsrods .

Let me know if I'm doing something wrong here.

@korydraughn
Copy link
Collaborator Author

Your docker build command is actually trying to build the master branch. You need to point that command at this branch. To do that, run the following:

$ docker build -t nfsrods --build-arg='_github_account=korydraughn' --build-arg='_sha=110' 

@ac55-sanger
Copy link

Ah, my bad. Thanks, it built successfully.
I will test it next week and let you guys know.

@ac55-sanger
Copy link

@korydraughn
I tested this build and here is my report:

  1. "ls" on the directory containing 65K+ files took more than 24 hours and yet resulted in wrong data and count.
$ ls -l | wc -l
.
ls: cannot access 'DDD_MAIN5249029.cram': No such file or directory
ls: cannot access 'DDD_MAIN5249029.cram.crai': No such file or directory
ls: cannot access 'DDD_MAIN5249030.cram': No such file or directory
ls: cannot access 'DDD_MAIN5249030.cram.crai': No such file or directory
1631

whereas all these files exist on irods and I can list these using "ils" command.

  1. This build also broke data mapping and random data is being mapped to a collection.
$ ls -l <mounted_path>/20140918/
ls: cannot access '<mounted_path>/20140918/cram': No such file or directory
total 0
d????????? ? ? ? ?            ? cram

$ ls -l <mounted_path>/20140918/ | wc -l
ls: cannot access '/mnt/humgen/projects/ddd/20140918/cram': No such file or directory
2

=====
$ ils <irods_path>/20140918/
<irods_path>/20140918:
  10:1-135534747.vcf.gz
  10:1-135534747.vcf.gz.tbi
  11:1-135006516.vcf.gz
  11:1-135006516.vcf.gz.tbi
.
.
.
$ ils <irods_path>/20140918/ | wc -l
49

NFSRODs mounted directory 1) doesn't list the data properly 2) lists some random data (eg. "cram" directory in this case) which doesn't exist in irods.

This mapping issue is not seen in the previous release build (tested it again).

Thanks

@michael-conway
Copy link
Member

michael-conway commented Feb 3, 2021 via email

@korydraughn
Copy link
Collaborator Author

We'll continue to look into this.

@korydraughn
Copy link
Collaborator Author

@ac55-sanger When you did your testing, did you make sure to unmount nfsrods before remounting it?

I know that weird things happen when the nfsrods server is bounced without remounting it.

@michael-conway
Copy link
Member

michael-conway commented Feb 5, 2021 via email

@ac55-sanger
Copy link

@ac55-sanger When you did your testing, did you make sure to unmount nfsrods before remounting it?

I know that weird things happen when the nfsrods server is bounced without remounting it.

Yes @korydraughn I remember unmounting the old one before mounting a server with the new build.
But I can re-check and let you know by the end of today.

@korydraughn
Copy link
Collaborator Author

That is surprising. Can you verify that the SHA for your build matches the one for this PR?

@ac55-sanger
Copy link

Sure,

~# docker run --rm ac55/nfsrods-patch:3.0 sha
Build Time    => 2021-04-14T11:29:39+0000
Build Version => 1.0.1
Build SHA     => 35e278cd3b89f4809f8859fe7e85197baccf24f9

@ac55-sanger
Copy link

It just gave up after ~38 hours. :(

ls: reading directory '<path_to_mounted_directory>': Remote I/O error
total 0

real    2233m20.581s
user    0m0.000s
sys     0m0.003s

@trel
Copy link
Member

trel commented May 10, 2021

We narrowed the scope of @ac55-sanger's slow listings to an iRODS specific query that is not valid syntax for Oracle. We are moving ahead with the rest of these edits and will tackle the Oracle syntax issue separately.

@korydraughn
Copy link
Collaborator Author

@michael-conway Can you make a new snapshot of jargon (tip of master) available?

@korydraughn
Copy link
Collaborator Author

korydraughn commented May 13, 2021

@ac55-sanger Please try using NFSRODS again. You'll need to rebuild the docker image.

You'll need to make some changes before running NFSRODS.

  • Add "using_oracle_database": true to the "nfs_server" section of the NFSRODS config file.
  • Replace the following specific queries in iRODS with ones that work with Oracle.
    • ilsLACollections
    • ilsLADataObjects

Below are the new specific queries. Please have Simon take a look at these. You're free to adjust these.

ilsLACollections

SELECT * FROM (
  SELECT c.parent_coll_name, c.coll_name, c.create_ts, c.modify_ts,
         c.coll_id, c.coll_owner_name, c.coll_owner_zone, c.coll_type, u.user_name, u.zone_name,
         a.access_type_id, u.user_id, rownum as limit_rn
  FROM R_COLL_MAIN c
  JOIN R_OBJT_ACCESS a ON c.coll_id = a.object_id
  JOIN R_USER_MAIN u ON a.user_id = u.user_id
  WHERE c.parent_coll_name = ?
  ORDER BY c.coll_name, u.user_name, a.access_type_id, c.parent_coll_name, c.create_ts, c.modify_ts,
           c.coll_id, c.coll_owner_name, c.coll_owner_zone, c.coll_type, u.zone_name, u.user_id DESC
) WHERE limit_rn > ? AND limit_rn <= ?

ilsLADataObjects

SELECT * FROM (
  SELECT s.coll_name, s.data_name, s.create_ts, s.modify_ts, s.data_id,
         s.data_size, s.data_repl_num, s.data_owner_name, s.data_owner_zone, u.user_name,
         u.user_id, a.access_type_id, u.user_type_name, u.zone_name, rownum as limit_rn
  FROM (
      SELECT c.coll_name, d.data_name, d.create_ts, d.modify_ts, d.data_id, d.data_repl_num,
             d.data_size, d.data_owner_name, d.data_owner_zone
      FROM R_COLL_MAIN c
      JOIN R_DATA_MAIN d ON c.coll_id = d.coll_id
      WHERE c.coll_name = ?
  ) s
  JOIN R_OBJT_ACCESS a ON s.data_id = a.object_id
  JOIN R_USER_MAIN u ON a.user_id = u.user_id
  ORDER BY s.coll_name, s.data_name, u.user_name, a.access_type_id, s.create_ts, s.modify_ts,
           s.data_id, s.data_size, s.data_repl_num, s.data_owner_name, s.data_owner_zone,
           u.user_id, u.user_type_name, u.zone_name DESC
) WHERE limit_rn > ? and limit_rn <= ?

@korydraughn
Copy link
Collaborator Author

@ac55-sanger Please hold on replacing the existing specific queries.

@ac55-sanger
Copy link

Sure. Thankfully I haven't looked into these yet.

@korydraughn
Copy link
Collaborator Author

@ac55-sanger How long does it take ils -A to run against a fairly large collection on your system?

I'm trying to determine if anything breaks by changing those queries. I feel anything attempting to invoke the existing ones will result in the poor performance we see in NFSRODS.

@ac55-sanger
Copy link

Here is the time taken to list a directory having 65K files using ils command -

real	3m31.221s
user	0m15.197s
sys	0m4.812s

@korydraughn
Copy link
Collaborator Author

korydraughn commented May 17, 2021

@ac55-sanger You can proceed with replacing those specific queries. They were added to improve Jargon's performance around large data sets.

See https://github.com/irods/irods-legacy/blob/ff4eaa47a34f1bb5990d5560f825975c26bab118/iRODS/server/icat/patches/patch3.2to3.3.sh

@ac55-sanger
Copy link

Sure.
Should I go ahead with the queries shared last week or the ones mentioned in

See https://github.com/irods/irods-legacy/blob/ff4eaa47a34f1bb5990d5560f825975c26bab118/iRODS/server/icat/patches/patch3.2to3.3.sh

?

@korydraughn
Copy link
Collaborator Author

Don't use the ones from irods-legacy. Use the replacements mentioned here: #114 (comment)

@ac55-sanger
Copy link

Sure, thanks.
Will update on how it goes.

Dockerfile Outdated Show resolved Hide resolved
korydraughn and others added 7 commits May 26, 2021 13:04
- iRODS permissions are now cached
- iRODS user type information is now cached
- Fixed list operation result truncation
- Replaced parallelStream() w/ stream()
- Use counter instead of inode number as cookie for directory entries
- Server caches query results for list operation
- List operation jumps over previously handled entries instead of looping/skipping them
- Experimenting with connection cache
- Exposed cache eviction time options
- Added new configuration option: using_oracle_database
- Bumped Jargon version for Oracle support
- Updated the README.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants