New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking if the input file or directory exists when running a workflow #1554

Merged
merged 9 commits into from Sep 7, 2018

Conversation

Projects
None yet
5 participants
@parichit
Copy link
Contributor

parichit commented Jun 7, 2018

  1. Added a utility function 'file_existence_check' to
    check if the input file or directory exists or not.

  2. The workflows now report a friendly message to the user
    if the input file or directory does not exist.

  3. The error stack is still reported in addition to the message.

@pep8speaks

This comment has been minimized.

Copy link

pep8speaks commented Jun 7, 2018

Hello @parichit, Thank you for updating !

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 23, 2018 at 20:04 Hours UTC
@Garyfallidis

This comment has been minimized.

Copy link
Member

Garyfallidis commented Jun 11, 2018

@parichit you need to provide a test.

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Jun 12, 2018

@parichit, can you rebase too ?

@skoudoro skoudoro added the gsoc2018 label Jun 13, 2018

@parichit

This comment has been minimized.

Copy link
Contributor

parichit commented Jun 18, 2018

@Garyfallidis: Thank you for the suggestion, The function is covered by all the workflows (by default) because whenever an IOIterator is created, this function is being invoked by it that's why I have not created a test for this addition.

@serge: Thank for the good advice (you are correct, i should rebase). I tried rebasing but it says that the branch is up-to-date. I am closing this pull request and will open a new request with a clean history because this branch has also picked up the history from older commits.

@parichit parichit closed this Jun 18, 2018

@parichit parichit reopened this Jun 18, 2018

@parichit

This comment has been minimized.

Copy link
Contributor

parichit commented Jun 18, 2018

@serge: Sorry about this but can you please check if this branch is okay now? I did rebased it after fetching the upstream. If this is not yet rebased then please tell me and I will create a new branch for this feature.

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Jul 3, 2018

@parichit , Do not forget to add test for this PR. Thank you

@parichit

This comment has been minimized.

Copy link
Contributor

parichit commented Jul 3, 2018

@serge and @Garyfallidis: Thanks for pointing the test case out. I will implement the test case for this PR and recommit.

@parichit

This comment has been minimized.

Copy link
Contributor

parichit commented Jul 3, 2018

Added the test case for this PR.
Thanks :)

@skoudoro
Copy link
Member

skoudoro left a comment

I just requested some few changes but otherwise, it looks good to me. Can you make this quick change @parichit and then I can merge it.

@@ -253,3 +253,10 @@ def __iter__(self):
IO = np.concatenate([I, O], axis=1)
for i_o in IO:
yield i_o

@staticmethod

This comment has been minimized.

@skoudoro

skoudoro Aug 22, 2018

Member

Why you need it as a static method? I think you can remove it

This comment has been minimized.

@parichit

parichit Aug 23, 2018

Contributor

Thanks for the suggestion. I have removed the @staticmethod.

def file_existence_check(args):
input_args = list(args)
for input_path in sorted(input_args):
if len(glob(input_path)) == 0:

This comment has been minimized.

@skoudoro

skoudoro Aug 22, 2018

Member

This condition works but it is not clear. Can you change it and use something more explicit like:
if not os.path.isfile(input_path) or not os.path.isdir(input_path)

This comment has been minimized.

@parichit

parichit Aug 23, 2018

Contributor

@serge thank you for checking out the code. So you are right the way I am doing it is not intuitive but below are the reasons why I am doing it this way, So I am writing the code to do what you suggested,

    for input_path in sorted(input_args):
        if os.path.isdir(input_args):
            if not os.path.isdir(input_path):
                err = True
        elif os.path.isfile(input_path):
            if not os.path.isfile(input_path):
                err = True
        if err:
            raise IOError('File not found: ' + input_path)

You can see that O have to separately check for the file and a directory and accordingly decide what is missing, this increases the number of IF-ELSE conditions for this small check and since this check is performed for all the workflows so this will cause the slowdown when the workflows are executed multiple times.

So, what i did is to keep the default glob object and check if it's length is 0 so that there is just one IF condition to check.

Please tell if there is a better way to implement this check that I can use? otherwise, I will implement the above solution which is more intuitive but a bit slower.

Thanks.

This comment has been minimized.

@parichit

parichit Aug 23, 2018

Contributor

I will wait for your suggestion and only commit the changes after that.

This comment has been minimized.

@skoudoro

skoudoro Aug 23, 2018

Member

With just one IF, this should works:

for input_path in sorted(input_args):
    if not os.path.isfile(input_path) or not os.path.isdir(input_path):
        raise IOError('File not found: ' + input_path)

This comment has been minimized.

@parichit

parichit Aug 23, 2018

Contributor

Hey serge, thanks for your suggestion, here is where I think this check can fail,

We know that currently, the workflows work for both absolute file/directory paths and also for wildcards, in case the user gives absolute file or directory names then this is fine but when the user supplies a wildcard for example, in the apply_affine workflow to provide all the input files at once,
For example:
dipy_apply_affine *input --affine=affine.txt

In this case, you can see the above check will always fail since this check will simply look for file *input but actually, it can be a file called input1.nii.gz or input2.nii.gz etc. and so it will always fail with the file not found.

I tested this behavior when I wrote this check and realized that a simple condition won't be enough. So, to keep the code to the minimum, I used the len(glob(input_path)) so that for both absolute file names and wildcard based file names the check can work because the glob will automatically expand the wildcard to the actual file names.

I realized this when I was starting to make the changes again based on your suggestion.

Let me know what you think about it?

This comment has been minimized.

@skoudoro

skoudoro Aug 24, 2018

Member

Ok, I see, I totally forgot the wildcards. Thank you for all this explanation. I understand why you use glob.

No need to change, You can let it as it is.

input_args = list(args)
for input_path in sorted(input_args):
if len(glob(input_path)) == 0:
raise OSError('File not found: '+input_path)

This comment has been minimized.

@skoudoro

skoudoro Aug 22, 2018

Member
  • Can you replace OSError by IOError?
  • pep8: Can you add space around the operator +

This comment has been minimized.

@parichit

parichit Aug 23, 2018

Contributor

Yes, this is a logical suggestion. Thank you for pointing this out. I have replaced the OSError by IOError.

I used OSError because the file/directory related errors/exceptions are placed inside the OSError on the python's exception page here https://docs.python.org/3/library/exceptions.html#os-exceptions.

I have changed the OSError to IOError. :)

@parichit parichit force-pushed the parichit:workflow_dir_file_notfound_error branch from bf16a79 to 6903ab0 Aug 23, 2018

@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Aug 27, 2018

Hi @parichit, this PR is failing, can you fix this error please? I think, you are just missing a * on args

Parichit Sharma and others added some commits Jun 7, 2018

Added a function to check if a file or directory exists
1) Added a utility function 'file_existence_check' to
check if the input file or direcory exists or not.

2) The workflows now reports a friendly message to the user
if the input file or directory does not exist.
Added the test case for PR 1554.
1) A dummy workflow is now used to invoke the OSError when the input is absent.
Removed the @staticmethod and replaced OSError with IOError in multi_…
…io.py file as per the suggestions made by @serge on PR.

@parichit parichit force-pushed the parichit:workflow_dir_file_notfound_error branch from 6903ab0 to 91a4865 Sep 4, 2018

@parichit

This comment has been minimized.

Copy link
Contributor

parichit commented Sep 4, 2018

hey, @skoudoro, I have added the 'self' keyword in the file_existense_check() function, the error was occurring because the number of arguments was mismatching. I have tested the code manually on the terminal, the relevant test case for this PR appears in the test_workflow.py file. Please let me know if you think we need more changes.

@@ -253,3 +253,9 @@ def __iter__(self):
IO = np.concatenate([I, O], axis=1)
for i_o in IO:
yield i_o

def file_existence_check(self, args):
input_args = list(args)

This comment has been minimized.

@skoudoro

skoudoro Sep 6, 2018

Member

Thank you for this update @parichit.

As you can see here, the tests are still failing. I think you can replace this line with something like:
input_args = [fname for fname in list(args) if isinstance(fname, str)]

it allow to keep only string value and not integers. Indeed, we can have integers as positional argument sometimes

This comment has been minimized.

@parichit

parichit Sep 6, 2018

Contributor

Thanks for this suggestion @skoudoro , I wasn't sure why the PR was failing even after I made the change. You are right, there were other arguments type than string. I changed the line to accept only strings. Waiting for the checks to complete.

Changed the file_existence_check() function to accpet only string typ…
…e arguments for checking the file and directory as suggested by @skoudoro.
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Sep 7, 2018

Codecov Report

Merging #1554 into master will increase coverage by 0.01%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1554      +/-   ##
==========================================
+ Coverage   87.33%   87.34%   +0.01%     
==========================================
  Files         246      246              
  Lines       32177    32266      +89     
  Branches     3495     3504       +9     
==========================================
+ Hits        28102    28183      +81     
- Misses       3242     3247       +5     
- Partials      833      836       +3
Impacted Files Coverage Δ
dipy/workflows/multi_io.py 68.7% <100%> (+1.5%) ⬆️
dipy/workflows/tests/test_workflow.py 88.63% <90%> (+0.4%) ⬆️
dipy/core/graph.py 73.8% <0%> (-1.2%) ⬇️
dipy/viz/tests/test_ui.py 82.14% <0%> (-0.4%) ⬇️
dipy/viz/actor.py 82.79% <0%> (+0.01%) ⬆️
dipy/viz/tests/test_actors.py 74.18% <0%> (+0.28%) ⬆️
dipy/viz/window.py 64.8% <0%> (+0.57%) ⬆️
dipy/viz/utils.py 61.67% <0%> (+5.98%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4890903...7c01276. Read the comment docs.

@skoudoro skoudoro merged commit 606739b into nipy:master Sep 7, 2018

3 checks passed

codecov/patch 93.75% of diff hit (target 87.33%)
Details
codecov/project 87.34% (+0.01%) compared to 4890903
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@skoudoro

This comment has been minimized.

Copy link
Member

skoudoro commented Sep 7, 2018

Thank you @parichit! merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment