Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RP does not close the session #1964

Closed
AymenFJA opened this issue Sep 24, 2019 · 11 comments
Closed

RP does not close the session #1964

AymenFJA opened this issue Sep 24, 2019 · 11 comments

Comments

@AymenFJA
Copy link
Contributor

Hello all,

I am facing a problem with Radical pilot. RP does not close the session unless I hit ctrl+c once and wait for the pilot to generate the log and json files (sometimes it won't). Below is my radical-stack and some of the terminal output.

submit 1 unit(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok
^C
wait for 1 pilot(s)......

  python               : 2.7.16
  pythonpath           : 
  virtualenv           : sift_env

  radical.analytics    : 0.72.0
  radical.entk         : 0.72.0
  radical.pilot        : 0.72.0
  radical.saga         : 0.72.1
  radical.utils        : 0.72.0

Below is the zip file for the session attached
session_sift.zip

@andre-merzky
Copy link
Member

Thanks Aymen. It seems that the pilot gets canceled all right, but the session does not pick it up. What is the code you have been running?

@AymenFJA
Copy link
Contributor Author

AymenFJA commented Sep 24, 2019

Hello @andre-merzky :
I am running and EnTK code for one of my use case. The first pipeline with one stage and one task to generate a set of images. Then another set of pipelines (3 stages / 1 task each stage ) will be generated based on the number of images from the first pipeline. N = number of images = P = Number of pipelines.

Note: I have turned off the feature autoterminate=False
and added the following to the end of code :

    appman.workflow = set(pipelines)

    # Run the Application Manager
    appman.run()
    appman.resource_terminate()
    print('done')

I have created a clean and new env for my experiments.

Sample for my code that I am running :

appman = AppManager(hostname=hostname, port=port, name='entk.session-%s-%s'
                        % (args.name, random.randint(9999, 100000)),
                        autoterminate=False, write_workflow=True)
    # Assign resource request description to the Application Manager
    appman.resource_desc = res_dict
    parser_pipeline = generate_discover_pipeline(args.dataset, args.src_img)
    appman.workflow = set([parser_pipeline])

    # Run the Application Manager for the parser_pipeline
    appman.run()
    jsonfile = open("images.json", "r")
    jsonObj = json.load(jsonfile)
    counter = 0
    pipelines = list()
    # Generate pipelines based on the number of the images found in the target dataset
    for item in range(0, len(jsonObj["Dataset"])):
        img1 = jsonObj['Dataset'][0]['img1']
        img2 = jsonObj['Dataset'][counter]['img2']
        x1 = jsonObj['Dataset'][0]['x1']
        x2 = jsonObj['Dataset'][0]['x2']
        y1 = jsonObj['Dataset'][counter]['y1']
        y2 = jsonObj['Dataset'][counter]['y2']
        counter = counter+1
        p1 = generate_pipeline(img1, img2, x1, y1, x2, y2, name='Pipeline%s' % item)
        pipelines.append(p1)

    # Assign the workflow as a set or list of Pipelines to the Application Manager
    # Note: The list order is not guaranteed to be preserved
    appman.workflow = set(pipelines)
    # Run the Application Manager for the main pipeline
    appman.run()
    print('Done')
    # Now that all images have been performed the matching
    # and filtering process
    # release the resources.
    appman.resource_terminate()

@andre-merzky
Copy link
Member

Ah, details - so this is not actually an RP issue, but the RE AppManager's resource_terminate call does not return?

If you don't mind, can you please write a small reproducer which does only run, say, a single /bin/date on a local pilot? Easiest way to do so is to start from your script, reduce to one pipeline,then reduce to one task, etc etc, until there is nothing to remove. If the problem persists, please attach that script - that makes it easy for me to reproduce and debug :-) Thanks!

@AymenFJA
Copy link
Contributor Author

AymenFJA commented Sep 24, 2019

Hello @andre-merzky @mturilli .

Ok, I will do that and I will update the ticket. In the meantime, my session came with no JSON file due to the issue in this ticket. So I was wondering if there is a way that I can extract the JSON file from my *.prof files?

@andre-merzky
Copy link
Member

I have just the thing for you :-)
https://github.com/radical-cybertools/radical.pilot/blob/devel/bin/radical-pilot-fetch-json

@AymenFJA
Copy link
Contributor Author

Thanks @andre-merzky .

@AymenFJA
Copy link
Contributor Author

Hello @andre-merzky

Update :
I have made the following change. I have reduced the number of pipelines from 196 to only 5 pipelines with 2 stages with 1 task each running only /bin/date.

everything is working fine. I used the same code with the same data (only replaced my kernel with /bin/date).
session_test.zip

I have attached the session_test.zip

@andre-merzky
Copy link
Member

Thanks @AymenFJA , and sorry for the delay. If you don't mind, please also attach the script you used to create above session.

@AymenFJA
Copy link
Contributor Author

AymenFJA commented Oct 1, 2019

Thanks @andre-merzky, below is my script attached :

script.zip

@andre-merzky
Copy link
Member

andre-merzky commented Oct 1, 2019

Hmm, with 'minimal' I meant something like this:

#!/usr/bin/env python

import radical.entk as re

hostname = 'localhost'
port     = 5672

def generate_discover_pipeline():
    p = re.Pipeline()
    s = re.Stage()
    t = re.Task()
    t.executable = '/bin/date'
    s.add_tasks(t)
    p.add_stages(s)
    return p

def generate_pipeline():
    p1 = re.Pipeline()
    s1 = re.Stage()
    t1 = re.Task()
    t1.executable = '/bin/date'
    s1.add_tasks(t1)
    p1.add_stages(s1)

    s2 = re.Stage()
    t1 = re.Task()
    t1.executable = '/bin/date'
    s2.add_tasks(t1)
    p1.add_stages(s2)
    return p1


if __name__ == '__main__':
    appman = re.AppManager(hostname=hostname,
                           port=port,
                           autoterminate=False,
                           write_workflow=True)
    appman.resource_desc = {
                'resource' : 'local.localhost',
                'walltime' : 10,
                'cpus'     : 10}

    parser_pipeline = generate_discover_pipeline()
    appman.workflow = set([parser_pipeline])
    appman.run()

    appman.workflow = set([generate_pipeline() for _ in range(5)])
    appman.run()
    appman.resource_terminate()

It still has your pipeline structure, but no dependencies on data or bridges. Well, now that script does work as expected and terminates ok, so obviously I removed to much - but can you try to throw all lines out in your script which don't make a difference in the context of this ticket? Thanks! :-)

@AymenFJA
Copy link
Contributor Author

AymenFJA commented Oct 9, 2019

The issue is disappeared for now. I am using the latest radical-stack.

  python               : 2.7.16
  pythonpath           : 
  virtualenv           : sift_env

  radical.analytics    : 0.72.0
  radical.entk         : 0.72.0
  radical.pilot        : 0.72.0
  radical.saga         : 0.72.1
  radical.utils        : 0.72.0

Therefore I will close this ticket for now.

Thanks.

@AymenFJA AymenFJA closed this as completed Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants