-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MH-13277 improve scheduler performance #614
MH-13277 improve scheduler performance #614
Conversation
2c0a9a1
to
f02f34b
Compare
Is there any way to store custom properties, in scheduled events? We had planned to tighter integrate our Participation Manager (opt-out) with the using Asset managers properties to store addition choices like editing, camera and or screen recording, without having to parse the workflow and capture agent configs. |
It depends.If you know the keys and types of your properties in advance, you can just add one column per property to oc_scheduled_extended_event and store them there. If the properties you want to store are arbitrary and don't have to be queried individually, you can add another string column, serialize the properties using gson and store them in this column. For reading those properties, you just read the string from the database and de-serialize them again. Just like we did it for the capture agent and workflow properties. The problem with the workflow and capture agent properties was, that they can consist of arbitrary key-value pairs, so we couldn't create one column per property. But since these properties weren't queried individually, we could just serialize them and store them as a whole. And of course, you can still save and load asset manager properties. However, using the asset manager properties is expensive. |
@JamesUoM & @krinnewitz Opencast 6.x introduces a new processing settings persistent layer that uses the asset manager to store workflow properties. At some point in the future, it likely makes sense to remove the workflow properties from the scheduler table and make the scheduler use the processing settings persistent layer of the asset manager. At the end of the day, a scheduled event is a event and it would be perfectly fine to store the workflow properties just for the event (vs. scheduled event). |
@krinnewitz Please also adopt the page https://docs.opencast.org/develop/developer/scheduler/ |
e2c172a
to
b09a926
Compare
Thanks @staubesv. I updated the documentation. Regarding the workflow properties: Maybe we could change that in a follow-up PR if there are not too many review findings with this one. However, the problem with storing anything in the asset manager properties is, that even just insertion is quite slow. So when scheduling multiple events at once, that could already be a problem. |
Appologies for the diversion: We schedule events for two weeks ahead which can be 2000+ new events being added on a Monday. If we moved away from our intermediate table of via recordings and user options is then 25,000 events at the start of the semester and a busy 72 hours when the majority opt-out decisions occur. I think we're likely to keep our user options in our own Participation Management (PM) tables if the Asset Manager was not able to cope with this, and I would be very concerned if the creation and modification performance of CA and WF configs was negatively impacted. Very simply we ingest the whole time table and create potential recordings against which users opt-out, this is synchronized daily as the timetable changes often. We then created (delete and modify) scheduled events for a rolling two week window synchronized on a daily basis.
|
b09a926
to
f3c8e49
Compare
I've tagged this as a feature, even though it's not really a feature. It does however warrant the extra testing that comes with being a feature IMO. |
Sorry for the late start of the review.
Does Opencast's internal processing of RRULES differ from what's possible in iCalendar? Because I think this would be a completely valid schedule with overlapping events using an RRULE (12 events á 2h scheduled hourly): BEGIN:VEVENT
DTSTAMP:20190113T185826Z
UID:20190113T185826Z-1469192696
DTSTART;TZID=Europe/Berlin:20190201T120000
DTEND;TZID=Europe/Berlin:20190201T140000
RRULE:FREQ=HOURLY;INTERVAL=1;COUNT=12
SUMMARY:Test
DESCRIPTION:Some overlapping test events
END:VEVENT |
f3c8e49
to
ab439b1
Compare
@lkiesow You're right. I hope the admin UI actually prevents this from happening, but surely there are other ways to "inject" such an RRULE into the system. Either way, I've added some code to check those overlaps. |
ab439b1
to
79a9478
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a number of questions, comments and suggestions.
In general, this looks good though.
I will do some actual testing later today/tomorrow.
modules/migration/src/main/java/org/opencastproject/migration/EventIdPK.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opencastproject/scheduler/impl/persistence/SchedulerServiceDatabaseImpl.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opencastproject/scheduler/impl/persistence/SchedulerServiceDatabaseImpl.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/opencastproject/scheduler/impl/persistence/SchedulerServiceDatabaseImpl.java
Outdated
Show resolved
Hide resolved
...er-remote/src/main/java/org/opencastproject/scheduler/remote/SchedulerServiceRemoteImpl.java
Outdated
Show resolved
Hide resolved
...er-remote/src/main/java/org/opencastproject/scheduler/remote/SchedulerServiceRemoteImpl.java
Outdated
Show resolved
Hide resolved
I've tested this in a production-like environment and automated scheduling of ~400 events worked very well. The only problem I really noticed was the weird naming I already found in the code earlier when manually scheduling multiple events. Apart from that, everything seems to work. |
Btb, some review findings are just comments and may not necessarily require any change. |
...er-remote/src/main/java/org/opencastproject/scheduler/remote/SchedulerServiceRemoteImpl.java
Outdated
Show resolved
Hide resolved
Hi, since this pull has a lot of them, can someone explain why it is good for Opencast to make new iterations of NotFoundException, etc, that do not extend plain old Java NotFoundExceptions, etc? I do not understand the gained benefits over the complexity added. |
Hi @karendolan I think in most cases it makes sense to not throw exceptions which are part of the java API since you want to handle exceptions thrown by the java class library differently from the ones thrown by your own code. And in most cases you want to express a different situation than the "standard" java exception stands for. For example here is a snippet from the javadoc of
That may not make sense in all places of your own code. |
Thanks @lkiesow and @karendolan. I have fixed most of your comments. |
Is there anything still outstanding on this review? |
Essentially just my final review. Unfortunately, I have been very busy last week. One thing I like to take another look at is the separation of workflow parameters for which we now have two separate stores. I'd like to make sure that we do not get any inconsistencies there. |
I'm getting test failures with this branch on a local build. The expected and actual times are out by one hour, which makes me think it's timezone related. Building in SAST, which is GMT+2 without DST adjustment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a final in-depth review and this patch looks good except for one line of SQL update code which needs to be removed. Scheduling stuff works, capture agents work with the scheduler, migration works (I did upgrade from 6 to 7). @krinnewitz, can once the SQL line is removed, I'm going to merge this.
Ups, missed @smarquard's comment. Is this issue new? Did the tests work in the same timezone without this patch? |
Btb, to properly test this on
|
I can reproduce test failures when building scheduler-impl with |
@lkiesow @smarquard I added yet another commit which should fix the test failures. |
That solution to fixing the timezone issues is concerning, because at the beginning of our "timezone journey", we found that the default timezone could be set incorrectly when multiple threads were changing the timezone default at the same time. Thread 1 changes the default, thread 2 gets the default (which is now not the actual user/system default, but what thread 1 has changed it to), thread 1 changes it back to the user/system default but thread 2 has now hung on to an incorrect default and sets the default incorrectly. So we made this change: which your commit essentially reverses. IMO it's always unsafe to call setDefault(...) because it's changing a value which is JVM-wide, and you don't know what other threads are doing at that time. |
299856b
to
171cea0
Compare
Thanks for the hint. I removed the last commit. Since the problem with failing tests already exists in develop and is not introduced by this PR, I suggest to merge it. |
I agree with merging this, despite the test failures which I'm hoping will be taken care of in #604 that @karendolan is working on. That PR (as it stands) removes all the timezone setDefault calls. |
- Remove unused blacklist code - Remove superfluous migration and transaction code
This speeds up the conflict check as well as the insertion of new events.
Since these events never have conflicts, we can add them in parallel.
- Add more SchedulerServiceImpl Tests - Add more tests for SchedulerServiceDatabaseImpl
171cea0
to
85e6430
Compare
Just rebased onto develop to resolve merge conflict |
The failing tests pass on my local build if I use: mvn -Duser.timezone=Europe/Zurich |
I'll then merge it like this. Memo to myself: Mid-term, add |
…pencast into develop Pull request #614 MH-13277 improve scheduler performance
It's merged. Thanks for the patch. |
Seems like this has some issues after all. I'm now getting test failures on a regular basis (not every time!) on
It also happened on the Travis build here: |
Curious that the test failure shows up unpredictably. I haven't seen any failures like this, either on develop or our 6.x custom branch with this code merged in. These WARNs show up for successful builds, so are not an issue:
|
Thanks for taking a look at this. |
This patchset improves the scheduler performance as part of the opencast crowdfunding 2018. As a benchmark, we scheduled 50k events and then added 6 events on top. Previously, adding the 6 events took between 19 and 20 seconds. Of these 20 seconds, the conflict check took about 80% of the time. After our changes, it takes about 2 seconds to add the 6 events on top. Of these 2 seconds, the conflict check now only takes a very small percentage of time. The benchmark was done using Opencast with MariaDB on an AMD FX(tm)-8350 Eight-Core Processor with 32GB RAM and an SSD.
The following changes were done to achieve the goal of faster conflict checks and higher throughput when inserting events:
Remove Unused YAGNI Code
Store Properties of Scheduled Events in a Dedicated Table
Properties such as start date, end date, capture device id, of scheduled events were stored as asset manager properties which are basically key-value pairs which cannot be stored efficiently using a relational DBMS. For this reason, the conflict check was done in-memory, which was a workaround for the slow database queries. However, this still was slow since all events had to be loaded into memory before performing the conflict check. The scheduler now uses a proper "SQL-friendly" table to store these properties. This drastically improves the conflict check performance (which can be done on the database again) as well as the insertion speed when adding new events.
Reduce Number of ActiveMQ Messages When Adding Events
When scheduling multiple events, one message per event was sent over ActiveMQ. This seemed to cause some buffers filling up and blocking the message sender which in turn slowed down event insertion. We changed the behavior to send just one message for a whole bunch of events, which solves this problem.
Parallelize Adding Multiple Events
When scheduling multiple events (using an RRULE), these events cannot have conflicts with each other. This allows us to parallelize event insertion in this case, which also improves the throughput.
Other Changes
Additionally, we improved the test coverage of the scheduler and added a migration which is necessary because of the database schema changes,