ICAT schema extension with columns for investigation and dataset sizes #233

EmilJunker · 2020-05-26T08:46:35Z

This is a proposal for the schema extension discussed in #211. It includes the following:

Added properties Dataset.datasetSize and Investigation.investigationSize to the schema.
The upgrade script automatically calculates and initializes the sizes of existing Datasets and Investigations. Note that icat.server should not be running while this is done to avoid inconsistencies.
The upgrade script also adds SQL triggers in the database backend: whenever a Datafile is added/modified/deleted, the sizes of the related Dataset and Investigation are updated automatically.

Note that the update is done incrementally: e.g. when a new Datafile is added, its size simply gets added to the size of the related Dataset and Investigation (this obviously assumes that the previous values were already correct). This approach has advantages (good performance) and disadvantages (might lead to inconsistencies in certain edge cases).
I also considered a different implementation where the sizes of Datasets and Investigations are always re-calculated from scratch (which arguably is more reliable) but I found that this not only comes with a computational overhead, but it also leads to issues with Oracle database backends (which apparently don't like it when the table that caused a trigger is accessed inside that trigger).

I have done a few simple tests so far to see how the Java Persistence API behaves in conjunction with the triggers, and it appears to be working fine.

What still needs to be done:

Extensive testing (for each database backend) to make sure everything works as expected, including the performance.
Come up with a mechanism to prevent a client from overwriting the calculated size of a Dataset or Investigation.

Closes #211

RKrahl · 2020-06-03T08:37:30Z

We discussed this in the collaboration meeting last week and in particular the question of preventing the client to write these new attributes. The decision was to do nothing for the moment.

The main use case for the size attributes is to provide the user a a hint on the size of an investigation or a dataset in the web user interface, so that he might think twice before clicking on download for an investigation having several tens of gigabytes. For this purpose, accuracy is not critical. Furthermore, it's not clear whether errors due to overwriting the values will be an issue in practice. So we decided to wait and see how it works in production and fix that only if it turns out to be a problem.

It would be rather simple to set up a maintenance task that runs in background from time to time and checks these sizes. This would even be easier if the attributes are writable.

antolinos · 2020-06-12T08:29:27Z

Hi,

Take into account that in some cases the value is and has to be exact. It can be used to create a quota system that can block the archiving, calculate the money that cost the archival during ten years of an industrial experiment or is used to flag in real time when an experiment is exploding the maximum number of files allowed, for instance.

RKrahl · 2020-06-12T08:41:23Z

@antolinos, nothing prevents you from verifying this. You may for instance run a maintenance script to check the sizes. There is even already a prototype for such a script.

RKrahl · 2020-06-15T11:32:40Z

The triggers on a fresh install are missing.

The database triggers to update Dataset.datasetSize and Investigation.investigationSize are defined in the upgrade scripts. So for an upgrade from from 4.10 everything looks fine. But when doing a fresh install of the new version, the code initializing the ICAT database does not create the triggers.

RKrahl · 2020-06-15T12:25:49Z

Size attributes are not updated if NULL.

If the size attributes are set to an integer, the triggers do their job:

>>> # Get an investigation having investigationSize set to zero
>>> inv = client.assertedSearch("Investigation [name='test-Zero']")[0]
>>> inv.investigationSize
0
>>> inv.investigationSize is None
False
>>> # Create a dataset
>>> dataset = client.new("dataset", name="test", investigation=inv, type=ds_type, complete=False, datasetSize=0)
>>> dataset.create()
>>> # Fetch the new dataset from the server to verify datasetSize is set
>>> dataset = dataset.get("Dataset")
>>> dataset.datasetSize
0
>>> dataset.datasetSize is None
False
>>> # Create a datafile
>>> datafile = client.new("datafile", name="test.dat", dataset=dataset, datafileFormat=df_format, fileSize=38)
>>> datafile.create()
>>> # Fetch the dataset and the investigation again to verify the updated size attributes
>>> dataset = dataset.get("Dataset")
>>> dataset.datasetSize
38
>>> inv = inv.get("Investigation")
>>> inv.investigationSize
38

But if the size attributes are not set, the triggers do not work:

>>> inv = client.assertedSearch("Investigation [name='test-None']")[0]
>>> inv.investigationSize
>>> inv.investigationSize is None
True
>>> dataset = client.new("dataset", name="test", investigation=inv, type=ds_type, complete=False)
>>> dataset.create()
>>> dataset = dataset.get("Dataset")
>>> dataset.datasetSize
>>> dataset.datasetSize is None
True
>>> datafile = client.new("datafile", name="test.dat", dataset=dataset, datafileFormat=df_format, fileSize=38)
>>> datafile.create()
>>> dataset = dataset.get("Dataset")
>>> dataset.datasetSize
>>> dataset.datasetSize is None
True
>>> inv = inv.get("Investigation")
>>> inv.investigationSize
>>> inv.investigationSize is None
True

Note that the update script does not initialize the size attributes if a dataset or investigation has no files.

I tried this with a MariaDB backend.

RKrahl · 2020-06-15T13:07:15Z

Size attributes are not updated if NULL.

The reason for this is obviously the arithmetic in MariaDB / MySQL:

MariaDB [(none)]> select NULL + 38;
+-----------+
| NULL + 38 |
+-----------+
|      NULL |
+-----------+
1 row in set (0.00 sec)

antolinos · 2020-06-15T13:17:28Z

Hi @RKrahl

If done by triggers, will there be a cost in the performance?

RKrahl · 2020-06-15T13:21:55Z

If done by triggers, will there be a cost in the performance?

My assumption would be that the performance with triggers will be way better then the performance of updating the value by the client. (Obviously not keeping the value up to date at all will always be the fastest option.) But this is still to be tested.

antolinos · 2020-06-15T13:30:20Z

I agree but it does not mean that will be acceptable. I was wondering that if ICAT runs slower because of the calculations someone might prefer not to calculate these values however if done with triggers there is no choice (at least you remove the triggers and somehow diverge from the ICAT's standard deployment). Also, you might want to spend time calculating the size of the investigations carried out by users but you don't care about the in-house, for instance.
Just some thoughts.

EmilJunker · 2020-06-16T07:08:38Z

Size attributes are not updated if NULL

The reason for this is obviously the arithmetic in MariaDB / MySQL

I think I fixed it now by using the MySQL IFNULL function and Oracle's NVL function, respectively.

RKrahl · 2020-06-16T12:00:05Z

Size attributes are not updated if NULL

I think I fixed it now [...]

Tested. This fix seems to work, at least for MariaDB / MySQL.

The update script still leaves the size attributes to be NULL for Investigations and Datasets having no Datafiles. But I guess that is ok and it is consistent with the behavior of the trigger for Investigations and Datasets created after the upgrade.

RKrahl · 2020-06-22T14:33:20Z

I still found a case where the triggers (for MariaDB / MySQL) are not working correctly. If you have a dataset with some datafiles having fileSize not set, the dataset.datasetSize should be 0. If you now update the datafiles, setting a positive fileSize, dataset.datasetSize and investigation.investigationSize are not updated accordingly. Consider the following example:

>>> # create a brand new investigation
>>> investigation = client.new("investigation", facility=facility, type=inv_type, name="sizetest", visitId="N/A", title="size attribute test")
>>> investigation.create()
>>> investigation.get("Investigation")
(investigation){
   createId = "simple/root"
   createTime = 2020-06-22 15:50:31+02:00
   id = 10
   modId = "simple/root"
   modTime = 2020-06-22 15:50:31+02:00
   name = "sizetest"
   title = "size attribute test"
   visitId = "N/A"
 }
>>> investigation.investigationSize is None
True
>>> # add a dataset with some files, but do not set the fileSize
>>> dataset = client.new("dataset", investigation=investigation, type=ds_type, name="testds", complete=False)
>>> for df_count in range(10):                
...     datafile = client.new("datafile", name="df_%04d" % df_count)
...     dataset.datafiles.append(datafile)
... 
>>> dataset.create()
>>> # check that dataset.datasetSize and investigation.investigationSize are 0
>>> dataset.get("Dataset")
(dataset){
   createId = "simple/root"
   createTime = 2020-06-22 15:51:31+02:00
   id = 771
   modId = "simple/root"
   modTime = 2020-06-22 15:51:31+02:00
   complete = False
   datasetSize = 0
   name = "testds"
 }
>>> investigation.get("Investigation")
(investigation){
   createId = "simple/root"
   createTime = 2020-06-22 15:50:31+02:00
   id = 10
   modId = "simple/root"
   modTime = 2020-06-22 15:50:31+02:00
   investigationSize = 0
   name = "sizetest"
   title = "size attribute test"
   visitId = "N/A"
 }
>>> # now update the datafiles, setting the fileSize
>>> query = Query(client, "Datafile", conditions={ "dataset.id": "= %d" % dataset.id }, includes="1")
>>> datafiles = client.search(query)
>>> assert len(datafiles) == 10
>>> for datafile in datafiles:
...     datafile.fileSize = 997
...     datafile.update()
... 
>>> # dataset.datasetSize and investigation.investigationSize should be updated now, but they aren't
>>> dataset.get("Dataset")
(dataset){
   createId = "simple/root"
   createTime = 2020-06-22 15:51:31+02:00
   id = 771
   modId = "simple/root"
   modTime = 2020-06-22 15:51:31+02:00
   complete = False
   datasetSize = 0
   name = "testds"
 }
>>> investigation.get("Investigation")
(investigation){
   createId = "simple/root"
   createTime = 2020-06-22 15:50:31+02:00
   id = 10
   modId = "simple/root"
   modTime = 2020-06-22 15:50:31+02:00
   investigationSize = 0
   name = "sizetest"
   title = "size attribute test"
   visitId = "N/A"
 }
>>> # verify the overall size of the datafiles
>>> ds_size_query = Query(client, "Datafile", conditions={ "dataset.id": "= %d" % dataset.id }, attribute="fileSize", aggregate="SUM")
>>> client.assertedSearch(ds_size_query)[0]
9970
>>>

I have no idea why this is not working.

EmilJunker · 2020-06-23T07:17:03Z

I still found a case where the triggers (for MariaDB / MySQL) are not working correctly

The problem is that the trigger only updates the dataset.datasetSize and investigation.investigationSize if the fileSize has changed. This is checked like this:

ELSEIF NEW.FILESIZE != OLD.FILESIZE THEN

Evidently, if either the new fileSize or the old fileSize is NULL then this condition is not met for some reason.
This can be fixed by changing the above line to:

ELSEIF IFNULL(NEW.FILESIZE, 0) != IFNULL(OLD.FILESIZE, 0) THEN

Now the trigger should work correctly even if the fileSize is/was NULL.

I am not sure if Oracle has this problem too, but just to be safe I will change the Oracle trigger as well.

RKrahl · 2020-06-23T19:50:44Z

Now the trigger should work correctly even if the fileSize is/was NULL.

Tested. This fix seems to work, at least for MariaDB / MySQL.

I am not sure if Oracle has this problem too, but just to be safe I will change the Oracle trigger as well.

Yes. In any case, I'd say we should try to keep both version as similar as possible. Thus, I'd opt for adding the analogous fix also to the Oracle version.

RKrahl · 2020-06-23T20:12:53Z

Now I also did some performance testing. The test script is available in icat-contrib. I ran the same script against the current icat.server release 4.10.0 and against an icat.server built from this branch having the DB triggers in place. I only tested it with a MariaDB backend.

Here is the output with the timings for the 4.10.0 release:

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 23.413 s / 37.519 s / 27.161 s
INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 11.937 s / 16.303 s / 12.764 s
INFO: Test case 3: update datafile.fileSize from not set to a positive value
INFO: Test case 3: done 100 datasets having 1000 datafiles each
INFO: Test case 3: min/max/avg time per dataset: 13.346 s / 19.657 s / 16.722 s
INFO: Test case 4: update datafile.fileSize to a different positive value
INFO: Test case 4: done 100 datasets having 1000 datafiles each
INFO: Test case 4: min/max/avg time per dataset: 13.322 s / 19.460 s / 16.654 s

And here for this PR code:

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 25.233 s / 41.447 s / 27.944 s
INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 12.375 s / 24.787 s / 13.264 s
INFO: Test case 3: update datafile.fileSize from not set to a positive value
INFO: Test case 3: done 100 datasets having 1000 datafiles each
INFO: Test case 3: min/max/avg time per dataset: 13.975 s / 19.984 s / 17.234 s
INFO: Test case 4: update datafile.fileSize to a different positive value
INFO: Test case 4: done 100 datasets having 1000 datafiles each
INFO: Test case 4: min/max/avg time per dataset: 14.281 s / 20.696 s / 16.886 s

As you can see, as expected, there is a performance penalty from the triggers. But it is barely measurable and lies within the range of fluctuation.

I'd appreciate someone trying it with an Oracle backend.

antolinos · 2020-06-23T20:55:13Z

Thanks @RKrahl

Question, what is the difference between test case 1 and test case 2? I see in the code:

1. Create datafiles having a positive fileSize.
2. Create datasets with datafiles (in one call using cascading) having
   a positive fileSize.

Then I interpret that on test case 1 the datasets already exist and files are attached and in test case 2 datasets with datafiles are created?

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 25.233 s / 41.447 s / 27.944 s

INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 12.375 s / 24.787 s / 13.264 s

Why the test 2 is twice faster?

RKrahl · 2020-06-23T21:03:11Z

@antolinos, the difference between case 1 and case 2 is that case 2 uses cascading. In case 1, a dataset is created first and then 1000 datafiles are created in that dataset with a separate call each. In case 2, the dataset and 1000 datafiles are created in one single call at once.

RKrahl · 2020-06-25T15:19:18Z

I'd say it would be rather easy to also add a fileCount to Investigation and Dataset as suggested in #238.

RKrahl · 2020-06-25T15:26:22Z

There has been the suggestion in the monthly meeting that the triggers should be optional.

dfq16044 · 2020-06-25T15:42:27Z

Here are the use cases for DLS:

Currently a user cannot download an entire investigation, they need to select datasets. The idea was to add a select all datasets or select an investigation to download when the size of the investigation is lower than 10 TB. We cannot do it at the moment because the calculation of the size of the investigation is not efficient.
We started to use the number of files per dataset in TopCat instead of dataset size because of performance.
The calculation of the total size of the investigation is in a hidden tab and we need to press a button to do the calculation.
In the download cart, the number of files and volume are being calculated before the download. To give a sense of progression, the user needs to look at the number of files or size changes in the front end before he can continue to the next stage of the download

antolinos · 2020-06-25T15:44:07Z

There has been the suggestion in the monthly meeting that the triggers should be optional.

I think it is a good idea

dfq16044 · 2020-06-26T09:59:46Z

At DLS, we already have a trigger in the ICAT Datafile table, but this is used for a different purpose. Each time a datafile is added, updated or deleted it will add/remove/update datafiles in another database schema called FUSE.
FUSE is used to show the datafiles available on tape in DLS filesystem. It has been there for many years and the ingest performance is ok.
Here is the trigger sql code present in our test system:

create or replace TRIGGER "TESTICAT_DLS45"."UPDATE_FILESYSTEM_LOG" after insert or delete or update of location or update of filesize on datafile
for each row
begin
case
WHEN inserting THEN
insert into TESTICAT_DLS45.filesystem_log(operation ,new_location,seq) values('I',:NEW.location,TESTICAT_DLS45.filesystem_seq.nextval);
when deleting then
INSERT INTO TESTICAT_DLS45.filesystem_log(operation ,old_location,seq) VALUES('D',:OLD.LOCATION,TESTICAT_DLS45.filesystem_seq.nextval);
INSERT INTO TESTICAT_DLS45.icatdls44_report_deleted_files(dataset_id, filesize, file_Id,operation) VALUES(:old.dataset_id,:old.filesize,:old.id,'d');
when updating('location') then
INSERT INTO TESTICAT_DLS45.filesystem_log(operation ,old_location,new_location,seq) VALUES('U',:OLD.LOCATION,:NEW.LOCATION,TESTICAT_DLS45.filesystem_seq.nextval);
WHEN updating('filesize') THEN
INSERT INTO TESTICAT_DLS45.icatdls44_report_deleted_files(dataset_id, filesize, file_Id, operation) VALUES(:old.dataset_id,:old.filesize-:NEW.filesize,:old.id,'u');
end case;
END;

EmilJunker · 2020-07-20T12:13:30Z

As suggested in #238, this PR also adds fileCount columns to the Investigation and Dataset tables.

The triggers have been modified to automatically update these fileCount columns in addition to the datasetSize and investigationSize columns whenever a datafile or dataset is added/modified/deleted.

The upgrade script automatically initializes the fileCount, datasetSize and investigationSize columns for all existing datasets and investigations.

As discussed in the last meeting, the triggers are now optional, so the upgrade script no longer creates them by default. Instead, there are create_triggers_*.sql and drop_triggers_*.sql scripts available for both MySQL and Oracle that do the job.

These scripts are also the easiest way to add the triggers after a fresh ICAT installation, or to remove them at any point.

RKrahl · 2021-08-11T14:22:40Z

It has been decided in today's meeting:

to use simply size rather than investigationSize and datasetSize as attribute names,
to move the initialization of the new attributes from the upgrade scripts to the create trigger scripts.

The rationale for the second item is that it is better to have the attributes not set at all rather than to have some value that will never update and be completely wrong soon in the case that a site decides not to install the triggers. E.g. any site that chooses not to install the triggers would need to decide whether to initialize the attributes or not.

bodinm · 2021-08-18T11:35:30Z

In Oracle, there are some errors (ORA-00972: identifier is too long) while executing the scripts src/main/scripts/upgrade_oracle_5_0.sql and src/main/scripts/create_triggers_oracle.sql, due to the procedures or triggers names (must be at most 30 characters)

CREATE TRIGGER RECALCULATE_SIZES_FILECOUNT_ON_DATAFILE_INSERT AFTER INSERT ON DATAFILE
               *
ERROR at line 1:
ORA-00972: identifier is too long


CREATE TRIGGER RECALCULATE_SIZES_FILECOUNT_ON_DATAFILE_UPDATE AFTER UPDATE ON DATAFILE
               *
ERROR at line 1:
ORA-00972: identifier is too long


CREATE TRIGGER RECALCULATE_SIZES_FILECOUNT_ON_DATAFILE_DELETE AFTER DELETE ON DATAFILE
               *
ERROR at line 1:
ORA-00972: identifier is too long


CREATE TRIGGER RECALCULATE_SIZES_FILECOUNT_ON_DATASET_UPDATE AFTER UPDATE ON DATASET
               *
ERROR at line 1:
ORA-00972: identifier is too long


CREATE TRIGGER RECALCULATE_SIZES_FILECOUNT_ON_DATASET_DELETE AFTER DELETE ON DATASET
               *
ERROR at line 1:
ORA-00972: identifier is too long

bodinm · 2021-08-19T09:19:52Z

I did some performance testing on Oracle on my local env. We can do more tests on our test server later. I ran the script against the icat.server release 4.11.1 and against an icat.server icat.server-5.0.0-SNAPSHOT built from this branch (with the triggers that I locally renamed for the tests - see my previous comment #233 (comment) about the oracle error).

Here is the output with the timings for the 4.11.1 release:

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 10.746 s / 21.662 s / 11.686 s
INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 4.704 s / 6.540 s / 5.373 s
INFO: Test case 3: update datafile.fileSize from not set to a positive value
INFO: Test case 3: done 100 datasets having 1000 datafiles each
INFO: Test case 3: min/max/avg time per dataset: 8.036 s / 10.965 s / 9.818 s
INFO: Test case 4: update datafile.fileSize to a different positive value
INFO: Test case 4: done 100 datasets having 1000 datafiles each
INFO: Test case 4: min/max/avg time per dataset: 8.474 s / 10.484 s / 9.750 s

And here for this PR code 5.0.0-SNAPSHOT with triggers

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 11.552 s / 22.449 s / 13.296 s
INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 4.935 s / 7.009 s / 5.437 s
INFO: Test case 3: update datafile.fileSize from not set to a positive value
INFO: Test case 3: done 100 datasets having 1000 datafiles each
INFO: Test case 3: min/max/avg time per dataset: 8.692 s / 12.123 s / 10.832 s
INFO: Test case 4: update datafile.fileSize to a different positive value
INFO: Test case 4: done 100 datasets having 1000 datafiles each
INFO: Test case 4: min/max/avg time per dataset: 8.825 s / 11.859 s / 10.784 s

So it's globally the same trend as for mysql.

EmilJunker · 2021-09-20T15:28:38Z

use simply size rather than investigationSize and datasetSize as attribute names

Done 04d0b92

move the initialization of the new attributes from the upgrade scripts to the create trigger scripts

Done 7432b51

procedures or triggers names (must be at most 30 characters)

Fixed 6fceb3e

kevinphippsstfc · 2022-02-14T12:21:35Z

I'm adding the results from running these tests on against our Oracle development database.

The database used was a copy of the Diamond ICAT schema. It contained a reduced number of Datafiles but still over 500 million!

The results from the first run on ICAT 4.10.0 (no triggers):

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 28.765 s / 32.471 s / 30.265 s
INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 10.921 s / 11.538 s / 11.158 s
INFO: Test case 3: update datafile.fileSize from not set to a positive value
INFO: Test case 3: done 100 datasets having 1000 datafiles each
INFO: Test case 3: min/max/avg time per dataset: 21.624 s / 25.210 s / 23.240 s
INFO: Test case 4: update datafile.fileSize to a different positive value
INFO: Test case 4: done 100 datasets having 1000 datafiles each
INFO: Test case 4: min/max/avg time per dataset: 21.628 s / 23.580 s / 22.465 s

And from the second run on an ICAT 5.0.0 snapshot (with triggers):

INFO: Test case 1: create datafiles having a positive fileSize
INFO: Test case 1: done 100 datasets having 1000 datafiles each
INFO: Test case 1: min/max/avg time per dataset: 27.483 s / 31.766 s / 29.059 s
INFO: Test case 2: create datasets with datafiles having a positive fileSize
INFO: Test case 2: done 100 datasets having 1000 datafiles each
INFO: Test case 2: min/max/avg time per dataset: 11.325 s / 11.859 s / 11.614 s
INFO: Test case 3: update datafile.fileSize from not set to a positive value
INFO: Test case 3: done 100 datasets having 1000 datafiles each
INFO: Test case 3: min/max/avg time per dataset: 25.899 s / 28.538 s / 26.472 s
INFO: Test case 4: update datafile.fileSize to a different positive value
INFO: Test case 4: done 100 datasets having 1000 datafiles each
INFO: Test case 4: min/max/avg time per dataset: 25.794 s / 29.515 s / 26.600 s

So the strange anomaly here is that the Test 1 was actually faster on the second run! This is not unusual. I have done similar tests before and had results like this. I put them down to the fact that the ICAT is running on a VM sharing resources with other VMs on the same hypervisor, and this is connected across the site network to our departmental development Oracle database which is also running numerous other databases. So the results vary depending on how busy the VM cluster, network and Oracle database are at any particular point in time.

The table below summarises the results from HZB, ESRF and STFC with the numbers being the percentage increase in time taken to run the test with the triggers in place (the negative number indicating the decrease in the first STFC test).

Site	Test 1	Test 2	Test 3	Test 4
HZB	3	4	3	1
ESRF	13	1	10	11
STFC	-4	4	14	18

The good news from a DLS point of view is that Test 2 using the createMany method (used to create most Datafiles in the DLS ICAT) shows both the smallest increase in time taken and the smallest variation across the 3 sites.

EmilJunker added 2 commits March 10, 2020 12:15

modify .gitignore

6e1539e

investigation and dataset sizes

ee4e7c9

RKrahl added enhancement schema this involves changes to the ICAT schema labels May 28, 2020

RKrahl and others added 4 commits June 4, 2020 09:01

Merge branch 'master' into schema-sizes-triggers

32fd8c9

Fix integration test TestWS.testInvestigation()

a2dde83

Bump version to 5.0.0 in pom.xml.

026d894

fix investigation sizes after dataset update

650f71c

RKrahl mentioned this pull request Jun 12, 2020

Add volume and fileCount to Investigation and Dataset #238

Closed

fix update of sizes if NULL

964a894

small syntax fix

c09b78b

fix update trigger if fileSize is/was null

dc07e40

EmilJunker added 3 commits July 6, 2020 14:18

add fileCount column to Dataset and Investigation (#238)

86fa1d2

fix triggers to handle deletion of a dataset

098a745

make triggers optional

e603f3c

RKrahl linked an issue Jul 22, 2020 that may be closed by this pull request

Add volume and fileCount to Investigation and Dataset #238

Closed

louise-davies mentioned this pull request Nov 19, 2020

Select All button might add items to cart larger than the user is able to download ral-facilities/datagateway#483

Open

2 tasks

EmilJunker added 3 commits September 20, 2021 17:11

Make size and fileCount initialization optional

7432b51

Rename investigationSize and datasetSize to size

04d0b92

Shorten names of triggers and procedures

6fceb3e

EmilJunker mentioned this pull request Sep 27, 2021

ICAT schema extensions for ICAT Server 5.0 release #256

Merged

kevinphippsstfc modified the milestone: 5.0.0 Feb 4, 2022

kevinphippsstfc marked this pull request as draft February 4, 2022 16:27

kevinphippsstfc merged commit fcd8edd into icatproject:master Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICAT schema extension with columns for investigation and dataset sizes #233

ICAT schema extension with columns for investigation and dataset sizes #233

EmilJunker commented May 26, 2020

RKrahl commented Jun 3, 2020

antolinos commented Jun 12, 2020

RKrahl commented Jun 12, 2020

RKrahl commented Jun 15, 2020

RKrahl commented Jun 15, 2020 •

edited

Loading

RKrahl commented Jun 15, 2020

antolinos commented Jun 15, 2020

RKrahl commented Jun 15, 2020 •

edited

Loading

antolinos commented Jun 15, 2020

EmilJunker commented Jun 16, 2020

RKrahl commented Jun 16, 2020 •

edited

Loading

RKrahl commented Jun 22, 2020

EmilJunker commented Jun 23, 2020 •

edited

Loading

RKrahl commented Jun 23, 2020

RKrahl commented Jun 23, 2020

antolinos commented Jun 23, 2020

RKrahl commented Jun 23, 2020

RKrahl commented Jun 25, 2020

RKrahl commented Jun 25, 2020

dfq16044 commented Jun 25, 2020

antolinos commented Jun 25, 2020

dfq16044 commented Jun 26, 2020

EmilJunker commented Jul 20, 2020

RKrahl commented Aug 11, 2021 •

edited

Loading

bodinm commented Aug 18, 2021

bodinm commented Aug 19, 2021

EmilJunker commented Sep 20, 2021

kevinphippsstfc commented Feb 14, 2022

ICAT schema extension with columns for investigation and dataset sizes #233

ICAT schema extension with columns for investigation and dataset sizes #233

Conversation

EmilJunker commented May 26, 2020

RKrahl commented Jun 3, 2020

antolinos commented Jun 12, 2020

RKrahl commented Jun 12, 2020

RKrahl commented Jun 15, 2020

RKrahl commented Jun 15, 2020 • edited Loading

RKrahl commented Jun 15, 2020

antolinos commented Jun 15, 2020

RKrahl commented Jun 15, 2020 • edited Loading

antolinos commented Jun 15, 2020

EmilJunker commented Jun 16, 2020

RKrahl commented Jun 16, 2020 • edited Loading

RKrahl commented Jun 22, 2020

EmilJunker commented Jun 23, 2020 • edited Loading

RKrahl commented Jun 23, 2020

RKrahl commented Jun 23, 2020

antolinos commented Jun 23, 2020

RKrahl commented Jun 23, 2020

RKrahl commented Jun 25, 2020

RKrahl commented Jun 25, 2020

dfq16044 commented Jun 25, 2020

antolinos commented Jun 25, 2020

dfq16044 commented Jun 26, 2020

EmilJunker commented Jul 20, 2020

RKrahl commented Aug 11, 2021 • edited Loading

bodinm commented Aug 18, 2021

bodinm commented Aug 19, 2021

EmilJunker commented Sep 20, 2021

kevinphippsstfc commented Feb 14, 2022

RKrahl commented Jun 15, 2020 •

edited

Loading

RKrahl commented Jun 15, 2020 •

edited

Loading

RKrahl commented Jun 16, 2020 •

edited

Loading

EmilJunker commented Jun 23, 2020 •

edited

Loading

RKrahl commented Aug 11, 2021 •

edited

Loading