Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Task.initialize() executed in background #4324

Merged
merged 9 commits into from
Jun 17, 2019
Merged

Conversation

mfranciszkiewicz
Copy link
Contributor

@mfranciszkiewicz mfranciszkiewicz commented Jun 12, 2019

  • task.rpc: run Task.initialize and enqueue_new_task in another thread
  • task.taskstate: new creating and errorCreating TaskStatus vals
  • initial TaskStatus is creating, the task is remembered immediately
  • when task creation fails, TaskStatus is set to errorCreating

- task.taskstate: new 'creating' and 'errorCreating' TaskStatus vals
- initial TaskStatus is 'creating', the task is remembered immediately
- when task creation fails, TaskStatus is set to 'errorCreating'
@mfranciszkiewicz mfranciszkiewicz force-pushed the bg-task-init branch 2 times, most recently from 05df499 to 1273f98 Compare June 12, 2019 15:29
Prevent importing the reactor in tests
from twisted.internet.task import deferLater
from twisted.python.failure import Failure


class DeferredSeq:
Copy link

@Wiezzel Wiezzel Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this class is supposed to compute fn(...f3(f2(f1(None)))...) and do it asynchronously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and by executing those functions in a separate thread.

Since deferred.callback does not allow a Deferred object to be passed as an argument (in a callback chain), this class simply yields on those before passing them further.

@Krigpl
Copy link
Contributor

Krigpl commented Jun 13, 2019

Why do we need that right now? Is it still gonna be useful with the simple-task-api?

@mfranciszkiewicz
Copy link
Contributor Author

@Krigpl

This PR solves some pending issues encountered by the CGI team. Until their app is refactored for the simple-task-api, this piece of code will prove useful to them because:

  1. resource initialization may take a really long time, so we must not block the main thread
  2. they need to be able to query the state that the task-in-creation is in, by returning the task id immediately and by introducing new TaskStatuses

Point 2 should still be relevant for simple-task-api.

@Krigpl
Copy link
Contributor

Krigpl commented Jun 13, 2019

Point 2 should still be relevant for simple-task-api.

simple-task-api is not affected by this issue, so that won't be relevant.

@mfranciszkiewicz
Copy link
Contributor Author

I agree only if we create tasks in a fire and forget fashion.

golem/core/deferred.py Outdated Show resolved Hide resolved
Use a wrapper function instead of lambda + bool cast
@codecov
Copy link

codecov bot commented Jun 14, 2019

Codecov Report

Merging #4324 into develop will decrease coverage by 1.5%.
The diff coverage is 89.02%.

@@             Coverage Diff             @@
##           develop    #4324      +/-   ##
===========================================
- Coverage    88.76%   87.26%   -1.51%     
===========================================
  Files          225      224       -1     
  Lines        19753    19794      +41     
===========================================
- Hits         17534    17273     -261     
- Misses        2219     2521     +302

Copy link
Contributor

@maaktweluit maaktweluit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Some small nitty comments and one q

Will approve on reply / fix

golem/task/taskmanager.py Outdated Show resolved Hide resolved
golem/task/taskmanager.py Outdated Show resolved Hide resolved
golem/task/taskmanager.py Show resolved Hide resolved
golem/task/taskmanager.py Outdated Show resolved Hide resolved
golem/task/taskmanager.py Outdated Show resolved Hide resolved
Co-Authored-By: maaktweluit <10008353+maaktweluit@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #4324 into develop will increase coverage by 0.02%.
The diff coverage is 91.66%.

@@             Coverage Diff             @@
##           develop    #4324      +/-   ##
===========================================
+ Coverage    88.76%   88.78%   +0.02%     
===========================================
  Files          225      225              
  Lines        19753    19793      +40     
===========================================
+ Hits         17534    17574      +40     
  Misses        2219     2219

@mfranciszkiewicz mfranciszkiewicz merged commit 8397dcc into develop Jun 17, 2019
@mfranciszkiewicz mfranciszkiewicz deleted the bg-task-init branch June 17, 2019 20:47
@mfranciszkiewicz mfranciszkiewicz restored the bg-task-init branch June 17, 2019 20:47
@mfranciszkiewicz mfranciszkiewicz deleted the bg-task-init branch June 18, 2019 13:49
nieznanysprawiciel added a commit that referenced this pull request Jul 1, 2019
* Don't use DockerCommandHandler

We don't need this. All commands are executed via docker-py.

* Renamed cleanup() methods to clean_up()

Because 'cleanup' is a noun and method names should be verbs.

* Added failure after 100 tries on the same step

* Make step_verify_income retry like all others.

* Add 30 seconds of sleep after requesting tETH or tGNT

* Upgrade SCI to 1.10.0 (#4231)

* Fix installation script (#4220)

* Non-hypervised Docker CPU Environment

A temporary subclass of DockerCPUEnvironment that always uses
DummyHypervisor and therefore performs no operations on Docker virtual
machine. It is meant to enable usage of the new Environment API without
removing DockerManager yet. Using standard DockerCPUEnvironment would
cause potential conflicts during VM reconfiguration.

* Moved __init__() to start of class for mypy

* Added 'Failure' to step_wait_subtask_completed()
Co-Authored-By: Adam Wierzbicki <awierzbicki@golem.network>

* Increase subtask_timeout for timeout_test because the buildbot node is slower then local machines

* moved rpc and sentry spam INFO logs to DEBUG

* Reduced severity from vague warning message

* Fix tests broken by introducing DockerCPUEnvironment

Setting up DockerCPUEnvironment in TaskComputer.__init__() requires two
conditions to be met:
  * Twisted reactor running (because prepare() returns a Deferred)
  * ClientConfigDescriptor with proper memory and CPU settings

* fixed separate_hyperg test after update to develop

* Wamp serialization error fix: huge unsigned int (#4236)

* Wamp serialization error fix: huge unsigned int

* deep bigint to string conversion definition for collections

* lint

* all integers converted to string

* Made dummy task runner more asynchronous

Client.start() requires to be run with reactor.

* Fix race condition in taskkeeper test by adding sleeps after each header

* Update urllib3 to 1.24.3 (#4134)

* Use freezegun instead of sleeps

* Add log message for the selected docker-hypervisor

* Migrate Payments to TaskPayment

* Remove entrypoint.sh from Docker images (#4242)

* Remove entrypoint.sh from Docker images

* use proper tags

* mypy vs peewee_migrate

* Add funds validation to task/subtask restart endpoints (#4247)

Adds funds validation and error reporting to RPC endpoints responsible for restarting tasks and subtasks.
Replaces the RPC endpoint `comp.task.restart_subtasks` with `comp.task.subtasks.restart`, updating it to handle both finished and active tasks.

* [review] Test client.get_task

* [review] Move dt_task to TYPE_CHECKING

* address issues caused by `EthereumConfig` getting changed behind the scenes

* lenghten retry limits in concent tests

* add "fixme" note
* extend the concent tests' timeouts

* lint

* lint...

* fix tests

* simpleserializer: support Enum

* disabled/lenient verification

* Fix get_next_subtask() call

* Remove unused variables

* Automatic change

* Use datetimes instead of integers

* Revert automatic rust change

* tasksession: eliminate race condition when preserving `ReportComputedTask` for use in `ForceReportComputedTask`

* bump messages to 3.6

* Finish rename of rpc command comp.task.subtasks.restart

* [review] Grant @Krigpl 's wish

* Fix foreign key migration (TaskPayment)

* [review] Move eth rpcs dependend on GM to separate module

* Adjust migration numbering after merge

* Less verbose faucet error messages (#4270)

* * eliminate message send / sign race condition (#4269)

+ add a failure trigger for `SubtaskResultsSettled` in the concent force accept integration test

* fix force payment playbook (#4272)

* hopefully fix the windows build... (#4273)

* Crossbar json serializer (#4258)

* remove unneeded CrossbarRouterOptions

* remove dead crossbar_log_level

* Enable json serializer in crossbar router

* Decrease severity taskeeper maintenance logs #4201 (#4263)

* decrease severity of TaskKeeper maintenance logs to DEBUG (#4201)

* TaskKeeper maintanence logs redacted (#4201)

TaskKeeper maintenance logs redacted to more explicitly state what happened

* lints removed (#4201)

* Fix call of methods with new signatures

* Remove unused imports

* [review] 00 to 0x

* [review] Remove default WalletOperation.status

* Rename rpc.api.ethereum

It fails in lintdiff.sh (confusion with ethereum module)

* [review] Rename processed_ts_deadline

* StatusPublisher's events with message codes (#4209)

* requirements update (#4277)

* bump requirements to address security concerns

* one more bump

* fix concent acceptance tests (#4282)

* add promissory note signatures to concent acceptance tests
* fixes to concent acceptance tests

* Fix golemcli account info

* [WTCT] explicit ETH addr instead of derivation from ETH pub key (#4170)

* [WTCT] explicit ETH key instead of derivation from ETH pub key
* tests for taskserver fixed
* requirements + egg info
* inline get_eth_addr() method
* bump Golem-Messages to 3.7.0

* Fix golemcli account info

* Don't allow use to chose Docker Toolbox if Hyper-V is installed (#4268)

* Don't allow use to chose Docker Toolbox if Hyper-V is installed

The installer doesn't support downgrading from Hyper-V to Docker Toolbox
Therefore user should not be presented the dialog window prompting to
select the preferred hypervisor if Hyper-V is already installed.

* Changed installation condition for DockerForWin

The old condition didn't use `HYPER_V_INSTALLED` variable so it would
not install Docker binaries if Hyper-V had been already installed on
user's machine.

* Install hyperg from the 'simple-transfer' repo

* Add missing field in walletoperation table

* Fix volume mounting on windows (#4275)

* fix incorrect volumes
* fix: create work dir instead of mounting root
* creating resource directory if not exists
* copy all result files fix
* support for multi resources

* fix gvisor installation in the Linux installer

* Install environment prerequisites

When a task header with environment prerequistes is received provider
will attempt to install the specified prerequisites and report positive
support status only if the installation was succesful.

* Unit tests for TaskHeaderKeeper.check_support()

* Bump min HyperG version to 0.3.1 (#4299)

* Docker: run hello-world after the service is restarted (#4298)

Docker: provide more meaningful error message

* Implemented subtasks restart cost estimation RPC

This adds the RPC endpoint `comp.task.subtasks.estimated.cost`. It
accepts an array of subtask IDs and returns a cost estimation for
restarting the selected subtasks.

* Added disable_concent flag to task restart RPC

* Fixed funds validation for partial restarts

* Created integration test for failed subtasks restart

* Updated Golem-Messages to 3.8.0

Golem-Messages 3.8.0 version introduces environment_prerequisites field
to TaskHeader.

* Synchronize TaskPayment model with migrations

* treat retrieval of an incompatible message from history as a "message… (#4305)

* treat retrieval of an incompatible message from history as a "message not found"

* Added config to node_integration_tests, ignoring non test directories (#4289)

* Fix GLambda benchmark (#4306)

* Fix GLambda benchmark
* Add a comment explaining why set HOME for GLambda

* sqlite, you're doing it wrong ;)

* Update issue templates

* bring `node_name` back to `SubtaskState` ... (#4312)

* bring `node_name` back to `SubtaskState` ...
* grant @maaktweluit's wish ;)

* fix issue when nodekeeper doesn't have information about a payer node in the Income list (#4317)

* Test DROP NOT NULL

* Enhanced transcoding integration tests with ffprobe report (#4249)

* StreamOperator: Implement get_metadata()

* ffprobe_report: Implement FuzzyDuration and FuzzyInt

* ffprobe_report: Implement FfprobeFormatReport

* ffprobe_report: Implement stream report classes with a common base class

* ffprobe_report: Add FfprobeFormatReport.stream_reports property

* ffprobe_report: Cache stream reports in the object instead of rebuilding them every time

* ffprobe_report: Rewrite diff() methods

* StreamOperator.get_metadata(): Detect missing and invalid data returned by ffprobe

* ffprobe_report: Add __repr__ for format and stream report classes

* ffprobe_report: Don't duplicate return value if both resolution and widthxheight present but they're identical

* Add a dependency on parameterized

* ffprobe_report: Implement number_if_possible(), fuzzy_duration_if_possible(), fuzzy_int_if_possible()

* ffprobe_report: Convert numeric properties to numbers when possible

* ffprobe_report: Just fail if codec_type is missing

- Lots of other things will fail if we just return None here and mypy can see that. Let's just fail earlier.

* ffprobe_report: Add select_streams()

* test_ffmpegintegration: Extract task definition into _create_task_def_for_transcoding()

* test_ffmpegintegration: Extract part of the code into an intermediate base class

* test_ffmpegintegration: Add optional parameters to _create_task_def_for_transcoding

* ffprobe_report tests: Add file with sample raw reports to test FfprobeFormatReport

* ffprobe_report tests: Unit tests for FfprobeFormatReport

* ffprobe_report tests: Add raw report with mpeg4 video codec and 'nb_frames' field

* StreamOperator.get_metadata(): accept work_dir and output_dir as parameters

* Implement FfprobeReportSet

* ffprobe_report: Accept path to the temporary directory in FfprobeFormatReport.build() and put results for each file in a separate subdirectory

* ffprobe_report: Make video_paths a list in build() rather than variable argument list

* ffprobe_report: Add 'excludes' parameter to diff()

* ffprobe_report_set: Make information about mismatched streams more compact

* Add more tests for FfprobeReport

* Add tests for FfprobeReportSet

* test_ffmpegintegration: Codec change tests

* Implement SimulatedTranscodingOperation

* ffprobe_report: Extract a function for parsing frame_rate

* SimulatedTranscodingOperation: Use parse_ffprobe_frame_rate() for the frame rate

* Add tests for SimulatedTranscodingOperation

* test_ffmpegintegration: Tests for splitting into various numbers of segments

* test_ffmpegintegration: Disable maxDiff in FfmpegIntegrationTestCase

- To make sure that diffs from FfprobeFormatReport are shown in full

* test_ffmpegintegration: Explicitly set codec and resolution in codec change and segment split tests

- It used to be possible to transcode without setting resolution or codec explicitly but now this does not pass validations

* test_ffmpegintegration: Don't compare bitrate and frame_count in codec change and segment split tests

- Bitrate preservation is not reliable and thus not integrated into ffmpeg_tools yet
- ffprobe reports wrong frame_count (100) for test_video.mp4 transcoded using the current method even though the actual numbe of frames in the file is correct (50).

* test_ffmpegintegration: Resolution change tests

* test_ffmpegintegration: Frame rate change tests

* testutils: Define a custom exception type for a failing Docker job in TestTaskIntegration

* test_ffmpegintegration: Explicitly set codec in resolution change tests

* test_ffmpegintegration: Explicitly set codec and resolution in frame rate change tests

* test_ffmpegintegration: Don't compare bitrate in resolution change tests

- Bitrate preservation is not reliable and thus not integrated into ffmpeg_tools yet

* test_ffmpegintegration: Don't compare bitrate in frame rate change tests

- Bitrate preservation is not reliable and thus not integrated into ffmpeg_tools yet

* Merge TestffmpegIntegration and FfmpegIntegrationTestCase back into a single test case

- There's only one class derived from the base class so we don't really need this separation. It was meant as a starting point for further refactoring but eventually we just created helper classes for support code.
- ci_skip does not seem to support the case where one class inherits from another and also uses class variables from that class. This breaks Windows and Mac OS tests in CI. Merging those classes lets us side-step the problem.

* File structure fix - move files related to ffprobe reports to utils directory

* Save FfprobeReportSet to a file in the test directory

* Hard-coding important fixture values in tests instead of verifying them with assert

* Constants for reason field in the diff

* Rename streams: modified -> expected, original -> actual

* Make == in FuzzyDuration and FuzzyInt still take tolerance into account even if only one of the values is fuzzy

* SimulatedTranscodingOperation: Make _set_override() public

* simulated_transcoding_operation: Fix output file name generation

- The extension was not stripped correctly from the input file
- In some cases there were two dots before the output file name

* TestFfprobeFormatReport: Enable showing longer diffs in tests

* simulated_transcoding_operation: Put container immediately after codec in input column names

- This way sorting works better for the codec change test. Results with the same codec/container but different resolutions get grouped together.

* simulated_transcoding_operation: Make it possible not to include some of the explicitly specified task options in the table column headers

- This makes it possible to for example specify a different resolution for each file in codec change test and still have the results in a single column. Otherwise each resolution gets its own column.

* test_ffmpegintegration: Use dont_include_in_option_description to get one column for each tested parameter combination

* Move DiffReason to ffprobe_report

* Replace all remaining hard-coded reason values with DiffReason

* protect `comp.tasks.stats` from changes to the structure of subtasks' `extra_data` (#4321)

* Migrate payment to taskpayment (#4297)

* Migrate payment to taskpayment

* Update migration numbering

* add `eventlet` to requirements (#4322)

* Remove the 'duration' field from RPC Task dict representation (#4327)

* Add hook for numpy in pyinstaller

* Use original hook-numpy-core.py from pyinstaller, but changed to the `DLLs\` folder

* hmm...

* Remove key difficulty pow (#4325)

* Add paging to scripts/get-slow-argument.py

* Split out key_reuse.py from base.py and conftest.py.

* Added granary as a provider, enabled by hostname

* review comments - fixed redunant if and print

*  - private logging function
 - better name for the key reuse singleton
 - added comments for all key reuse classes
 - better default password
 - better use of json.load / dump

*  - Split set_dir() from get() in NodeKeyReuseConfig
 - Improved logging, less private more public calls

* cleaned up for loops, only take the variables we need

* linter

* fixes the `object has no attribute '_handshake_error'` error (#4331)

* fixes the `object has no attribute '_handshake_error'` error
* fix the logging message

closes #4261

* add `apps/rendering/resources/taskcollector/Release/` to `.gitignore` (#4333)

* bump hyperg requirement to `0.3.2` (#4335)

* bump hyperg requirement to `0.3.2`

* + display current version

* add an artificial limit to `pay.payments` RPC to work around crossbar payload size limits before we implement: (#4338)

todo: https://github.com/golemfactory/golem/issues/3970
todo: https://github.com/golemfactory/golem/issues/3971

* Finished rename of processed_ts to creation_date

* Proper docker tag (1.5) for blender_verifier (#4336)

* Updated golem-messages to v3.9.0

* Migrate deposit payments to TaskPayment

* Migrate Incomes to TaskPayment

* Review comment, use model.TaskPayment

* Take test fix from #4342

* [review] migration

* [review] Refactor ETS.IK tests

* Move Ethereum RPC logic to its module (#4344)

* WIP

* expect_income() patch (#4346)

expect_income() patch

* Handling arbitrary args in _restart_task_error

* WIP

* fix of incorrect resources hierarchy (#4350)

* fix of incorrect resources hierarchy

* Prevent negative ETH estimations

* fix the integration tests that rely on a different package (Blender s… (#4349)

* fix the integration tests that rely on a different package (Blender scene) supplied to them

(those tests stopped using those scenes and reverted to the default `test_task_1` after the last refactoring

+ add a specific integration test to test a blender scene composed of multiple `.blend` files

* move the config to playbook (as, actually, the playbook is very closely bound with the scene anyway)

* Fix HyperV warning events (#4353)

* Task.initialize() executed in background (#4324)

- task.rpc: run initialize_task and enqueue_new_task in another thread
- task.taskstate: new 'creating' and 'errorCreating' TaskStatus vals
- initial TaskStatus is 'creating', the task is remembered immediately
- when task creation fails, TaskStatus is set to 'errorCreating'

* fix tests

* Disable Task.initialize execution timeout (#4354)

* Add tests for subtask_accepted & get_incomes_list

* + `test_large_result` node integration test optimization

* Added unit test for eth_for_batch_payment

* [review] Remove depositpayment

* CODEOWNERS: @Wiezzel @maaktweluit

* Rename deposit_payment to deposit_transfer

* * reduce severity of mask mismatch
* refactor `taskkeeper.check_support` tests :p

* grant @jiivan's wish ;p

* Moved requesting resources out of TaskComputer

* add missing integration tests to pytest

* + ownership of the concent-related code
+ ownership of the integration tests

* additional logs for `comp.task.create`

* fix tests

* Made 'cubes' the default scene, this saves 200mb per test

* Updated node_integration_tests to disable concent by default

* Fixed low funds error not being returned by task restart RPC

* Also initialize task resources on restarts

* concent: verify concent_enabled flag in overdue payments

Resolves #4304

* Fix dummy job test (#4373)

* Add run_benchmark to environments (#4372)

* Add run_benchmark to environments

* Apply suggestions from code review

Co-Authored-By: Adam Wierzbicki <awierzbicki@golem.network>

* Adjust operation type in forced payment

* Refactored resource_collected and resource_failure

The methods were moved to TaskServer while TaskComputer was added two
new public methods: task_interrupted() and start_computation(). This
wasy TaskComputer is agnostic of any resource management.

* Fix docker image for cpu benchmark (#4383)

* send srr if fgtrf is in history

resolves #4216

* Cleaner logs in golem/task/rpc.py

* Moved requesting tasks out of TaskComputer to TaskServer

* Assert that assigned subtask is not None in _task_finished()

* Moved interval checking to the top of _request_random_task()

* + Docker image integrity verifier

* + README.md for the dockerhub image verification script

* lint...

* fix readme

* resolution of the dispute with @Krigpl

* fix readme once again

* + add `requests` to the specifically-required files in `requirements-test`

* @Krigpl ... that better? ;p

* Merge first step

* Added video set to gitignore

* Fix Docker benchmark (#4401)

* Fix TestTaskIntegration to enable database usage and remove need of TestTaskManager

* Fix tests by adding initialization step
mplebanski pushed a commit that referenced this pull request Jul 4, 2019
- task.rpc: run initialize_task and enqueue_new_task in another thread
- task.taskstate: new 'creating' and 'errorCreating' TaskStatus vals
- initial TaskStatus is 'creating', the task is remembered immediately
- when task creation fails, TaskStatus is set to 'errorCreating'
@@ -402,7 +403,7 @@ def enqueue_new_task(client, task, force=False) \

def _create_task_error(e, _self, task_dict, **_kwargs) \
-> typing.Tuple[None, typing.Union[str, typing.Dict]]:
logger.error("Cannot create task %r: %s", task_dict, e)
_self.client.task_manager.task_creation_failed(task_dict.get('id'), str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_dict never contains id. See #4850

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants