[BUG] IsADirectoryError while selecting S3 artifact in the UI #3154

amiryi365 · 2020-07-22T11:59:08Z

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos 7.4
MLflow installed from (source or binary): binary
MLflow version (run mlflow --version): 1.9.0
Python version: 3.7.3
npm version, if running the dev UI: NA
Exact command to reproduce:
S3 packages: botocore 1.14.14, boto3 1.11.14

Describe the problem

My mlflow server runs on centos with Postgresql backend storage and S3 (minio) artifact storage:
mlflow server --backend-store-uri postgresql://<pg-location-and-credentials> --default-artifact-root s3://mlflow -h 0.0.0.0 -p 8000
I set all S3 relevant env vars:
MLFLOW_S3_ENDPOINT_URL, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
I've successfully ran several runs from other machine against this server:

Runs all finished OK, with params, metrics and artifacts.
Postrgesql mlflow tabels were updated accordingly.
All arifacts were stored in minio bucket as expected and I can display and download them by minio browser.

However, when I select any artifact in the UI, I get Internal Server Error in the browser.

Other info / logs

In the mlflow server I see the following error:

ERROR mlflow.server: Exception on /get-artifact [GET]
# I skip most of the traceback
File "<python-path>/site-packages/mlflow/serverhandlers.py": line 180, in get_artifact_handler
    return send_file(filename, mimetype='text/plain', as_attachment=True)
File "<python-path>/site-packages/flask/helpers.py", line 629, in send_file
    file = open(filename, "rb")
IsADirectoryError: [Error 21] Is a directory: '/tmp/<generated-name>/<my-file>'

Actually, there is a '/tmp/<generated-name>/' which is really a directory and not a file!
This folder contains another directory with a generated name, and inside there's nothing!
I didn't find any similar error regarding mlflow and s3.
What's wrong?

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging

The text was updated successfully, but these errors were encountered:

harupy · 2020-07-22T13:26:22Z

@amiryi365 Thanks for filing this issue. I was able to reproduce the same error using a folder named test.txt. Does your folder name contain text file extensions listed below?

mlflow/mlflow/server/handlers.py

Lines 169 to 171 in df7cc84

    
           _TEXT_EXTENSIONS = ['txt', 'log', 'yaml', 'yml', 'json', 'js', 'py', 
        
                               'csv', 'tsv', 'md', 'rst', MLMODEL_FILE_NAME, MLPROJECT_FILE_NAME]

amiryi365 · 2020-07-23T05:04:09Z

Thanks for replying @harupy
Yes, all my artifacts are text files from the list (log, json, py). I also found this list in the source...
Is there a workaround for this bug?

harupy · 2020-07-23T05:10:06Z

@amiryi365 The error indicates your folder name contains a text file extension. Can you share the code you used to log artifacts?

amiryi365 · 2020-07-23T05:35:19Z

@harupy the bug is probably not in my code, i,e, I wrote the run code, but it worked well (no exceptions, I can see all run details in PG tables and in the UI, I can see all the files in minio browser and download them, and they look fine).
The bug is taking place in the mlflow server while the web client requests to get an artifact content.
In my code I do the following (only mlflow related lines):
At start:

mlflow.set_experiment(experiment_name)
active_run = mlflow.start_run(run_name, nested)
experiment = mlflow.get_experiment_by_name(experiment_name)
mlflow.set_tags(experiment.tags)
mlflow.set_params(params)
mlflow.log_artifact(config)   # config is py file

I don't use with mlflow.start_run(run_name, nested) as active_run because I have my own with on my class, The above code is in its __enter__().
During the run:
I do things like:

mlflow.log_metric(key, value, step)
mlflow.log_artifact(local_path)

Finally I do in my __exit__():

mlflow.log_artifact(log_file_path)
# status is RunStatus.to_string(RunStatus.FAILED) or RunStatus.to_string(RunStatus.FINISHED) according to __exit__ exc_type arg (i.e. for exception it's FAILED)
mlflow.end_run(status)

harupy · 2020-07-23T05:43:30Z

@amiryi365 I see. Are you experiencing an error like below?

amiryi365 · 2020-07-23T05:48:57Z

Exactly!

harupy · 2020-07-23T05:52:15Z

@amiryi365 Can you take a screenshot and share it if possible?

amiryi365 · 2020-07-23T06:06:12Z

@harupy I can't. But you say you can reproduce the bug...
It seems to be a server bug of reading from S3.
In both server and client sides I use env vars for S3, maybe it worth to run the server with configuration file for AWS instead?

More Info: My client is Windows10 and I run it directly as python (from pycharm), not with mlflow run

harupy · 2020-07-23T06:11:17Z

@amiryi365 Got it. Do your artifacts contain a folder like mine in the image above?

amiryi365 · 2020-07-23T06:27:04Z

@harupy I logged all artifacts under 'log' but It didn't help...

amiryi365 · 2020-07-23T06:28:44Z

@harupy In your image I can see you're using file uri and not s3 uri

harupy · 2020-07-23T06:36:02Z

@amiryi365 Yep I just wanted to show that a folder named like xxx.txt or log causes the error.

amiryi365 · 2020-07-23T06:39:22Z

I've just used 2 folders: 'logs' and 'config' - the bug is still there for all files!

harupy · 2020-07-23T06:43:06Z

So your folder structure looks like:

- config (folder)
  - foo.xxx (file)
  ...

- logs (folder)
  - bar.yyy (file)
  ...

and when you try to open foo.xxx or bar.yyy, the error occurs, correct?

amiryi365 · 2020-07-23T06:44:09Z

@harupy Exactly!

harupy · 2020-07-23T08:34:16Z

@amiryi365

Actually, there is a '/tmp/<generated-name>/' which is really a directory and not a file!
This folder contains another directory with a generated name, and inside there's nothing!

Does this mean that you have a folder named like /tmp/path/to/foo.xxx and there is nothing in it?

amiryi365 · 2020-07-23T08:50:16Z

@harupy There's a folder named like /tmp/path/to/foo.xxx
Inside this folder there's another folder with a auto generated name (a long name of letters and digits).
Inside that folder there's nothing

harupy · 2020-07-23T09:58:43Z

@amiryi365 I have setup up minio server following this doc and tested artifact logging, but wasn't able to reproduce the issue.

code:

import mlflow

EXPERIMENT_NAME = 'minio'
BUCKET_NAME = 'test'

if not mlflow.get_experiment_by_name(EXPERIMENT_NAME):
    mlflow.create_experiment(EXPERIMENT_NAME, f's3://{BUCKET_NAME}')

mlflow.set_experiment(EXPERIMENT_NAME)

with mlflow.start_run():
    mlflow.log_param('p', 1)
    mlflow.log_metric('m', 1)
    mlflow.log_artifact('minio.py')
    mlflow.log_artifact('minio.py', artifact_path='data')

amiryi365 · 2020-07-23T11:19:16Z

@harupy I see...
I'm trying to find meaningful differences between our mlflow servers:

What is your botocore and boto3 packages versions? (I have botocore 1.14.14, boto3 1.11.14)
Could it be related to server OS, env, other services, etc.? (my server runs on linux Centos 7.4)
How do you define minio credentials in the server? (I tried env vars and also ./aws/credentials config file)

amiryi365 · 2020-07-26T13:31:15Z

@harupy Hi there!
I seemingly found a bug in the mlflow code:
at: mlflow/store/artifact/s3_artifact_repo.py function _download_file:
Original code is:

def _download_file(self, remote_file_path, local_path):
        (bucket, s3_root_path) = data.parse_s3_uri(self.artifact_uri)
        s3_full_path = posixpath.join(s3_root_path, remote_file_path)
        s3_client = self._get_s3_client()
        s3_client.download_file(bucket, s3_full_path, local_path)

My patch fix is:

def _download_file(self, remote_file_path, local_path):
        (bucket, s3_root_path) = data.parse_s3_uri(self.artifact_uri)
        s3_full_path = s3_root_path   # CHANGE IS HERE
        s3_client = self._get_s3_client()
        s3_client.download_file(bucket, s3_full_path, local_path)

It seems s3_root_path includes already the filename, e.g.:
s3_root_path = '3/xxxxxxxxxxxxxxxxxxx/artifacts/myfile.log'
and remote_file_path is the file path under the artifacts, e.g:
remote_file_path = 'myfile.log'
So, s3_full_path = posixpath.join(s3_root_path, remote_file_path) doubles the filename, e.g.:
3/xxxxxxxxxxxxxxxxxxx/artifacts/myfile.log/myfile.log
And that causes the bug!
I don't know why it behaves like that in my machine, and differently in other machines...
(if I'm right, you should have the same bug in your test)

Any idea?
Thanks :-)

harupy · 2020-07-26T14:09:59Z

@amiryi365 Thanks!

It seems s3_root_path includes already the filename, e.g.: s3_root_path = '3/xxx/artifacts/myfile.log'

This indicates that self.artifact_uri is set to something like s3://your_bucket_name/3/xxx/artifacts/myfile.log Is artifact_uri set correctly?

amiryi365 · 2020-07-27T06:35:41Z

@harupy I'm not sure...
Actually I don't know how to debug the server as it has nested runs...
I also failed to find a way to get a good debug level log (using --gunicorn-opts "--log-level debug" didn't help a lot..)
If you could help me with those it would be great!
I just took code samples from the mlflow package, changed them a bit to make them independent, then ran them with my own params, maybe I was wrong...

harupy · 2020-07-27T06:49:52Z

@amiryi365 You can use mlflow.get_artifact_uri which shows the current artifact_uri.

with mlfow.start_run() as parent_run:
    print(mlflow.get_artifact_uri())

    with mlflow_start_run(nested=True) as child_run:
        print(mlflow.get_artifact_uri())
        ...

amiryi365 · 2020-07-27T07:12:17Z

@harupy I get: s3://mlflow/3/xxx/artifacts, so I was wrong putting the params in my test...
Anyway, I'm talking about the server, not the client!
My client seems to work well and I can see the result artifacts in the minio browser as I told you before.
I meant the mlflow server is kind of a nested python runs (not mlflow runs), so running it with "-m pdb" doesn't help to debug...
I want to debug the server in order to understand what's wrong and why. Could you help me with that?
I need the right technique to use the debugger inside mlflow/store/artifact/s3_artifact_repo.py for instance.

harupy · 2020-07-27T07:26:23Z

@amiryi365 What do you mean by a nested python runs?

amiryi365 · 2020-07-27T07:36:05Z

@harupy I ran somthing like: mlflow server -m pdb ... and put breakpoints in s3_artifact_repo.py, but it never stopped there.
Looking at the code, I understand that the server run internal python command, and that's is probably the reason..
In general' what I look for is the right technique to debug the server!
Also, because it's an installed package, I couldn't change the code (e.g. add prints) because it compiled the original code even if I delete the pyc file... (I don't have a lot of experience doing python play like this)

harupy · 2020-07-27T07:39:57Z

@amiryi365 DId you try inserting print to check values passed to _download_file?

amiryi365 · 2020-07-27T07:43:10Z

@harupy see above (I edited it), I already told you what was my problem adding prints

harupy · 2020-07-27T07:59:49Z

@amiryi365 Sorry I missed the edit. Actually, you can change the insalled package code. The code below shows where s3_artifact_reop.py is located. You can just open it and tweak the code to debug (assuming you have access to centos that mlflow server runs on).

python -c "import mlflow; print(mlflow.store.artifact.s3_artifact_repo.__file__)"

amiryi365 · 2020-07-27T08:09:12Z

@harupy that what I did! but although I changed the py file, and also deleted its pyc file, it didn't run my new file!
It compiled the original py again and ran it.
I just read that the right method to do that is to clone the source code, do the change there and install it as a package. It's complicated for me because it runs in a close network...
Do you know an alternative way to do that?

amiryi365 · 2020-07-27T08:15:04Z

@harupy I didn't.
I did yesterday exactly what you suggested me just now, and I failed because it used the original version somehow.

harupy · 2020-07-27T08:17:06Z

@amiryi365 Did you run pip installl -e . after the change yesterady?

amiryi365 · 2020-07-27T08:28:59Z

@harupy I just tried - it doesn't work on the installed package.
It should work on the cloned source code from github (with the setup.py etc.)

harupy · 2020-07-27T08:31:57Z

Just to confirm, what you did yesterday is directly fixing the source code of the installed mlfow?

amiryi365 · 2020-07-27T12:53:06Z

@harupy I added prints to the source code of the installed mlfow, but it didn't "catch" it because it ran the original code.
I also made a fix but now I think it's wrong...
I can't find a way to check what's going on inside the server...

harupy · 2020-07-27T13:03:33Z

@amiryi365

I added prints to the source code of the installed mlfow, but it didn't "catch" it because it ran the original code.

How did you confirm that?

amiryi365 · 2020-07-27T13:19:44Z

@harupy I didn't see my prints. Also, when I deleted the pyc file, it recreated the original one.
See here how to do it right.
But in may case I can't do that...

Other tries:
Also, I found that pdb cannot attach an already running python process.
I also tried to run httpry to see rest messages between the mlflow server and minio, but it displayed nothing (maybe because they all run in the same machine).

harupy · 2020-07-27T13:23:07Z

@amiryi365 What did you do after changing the code?

amiryi365 · 2020-07-27T13:26:19Z

@harupy rerun it in the same way: mlflow server ...

harupy · 2020-07-27T13:28:18Z

@amiryi365 You added some prints to _download_file in s3_artifact_repo.py, correct?

amiryi365 · 2020-07-27T13:31:06Z

@harupy sure

harupy · 2020-07-27T13:32:31Z

@amiryi365 I think just running mlflow server ... doesn't call _download_file. Can you launch the UI and open artifacts? or have you already tried this?

amiryi365 · 2020-07-27T13:44:00Z

@harupy of course it should call _download_file when I click on an artifact in the UI

harupy · 2020-07-27T13:50:51Z

@amiryi365 So you clicked an artifact in the UI and nothing printed out. Did you still get the same error?

amiryi365 · 2020-07-27T14:11:32Z

@harupy of course I get the error. If I wouldn't, I could say I solved it..

harupy · 2020-07-27T14:13:08Z

@amiryi365 Can you open a file in the error stack trace and edit it to debug?

amiryi365 · 2020-07-28T09:28:56Z

@harupy I found the bug!!!
It's inconsistent behavior of the minio (probably of the specific version), to my opinion.
I share here the details how to debug it and exactly what's the error.

How to debug mlflow server:

Run mlflow server with flags to reduce num of workers to 1 and increase timeout so it wouldn't exit the worker during the debug: mlflow server --gunicorn-opts "--timeout 1000" --workers 1 --backend-store-uri <s3-uri> ...
Open pycharm on the same machine, create a project with the same python interpreter, then attach to the worker process: Run -> Attach to Process
Now you can set breakpoints and debug!

Bug: minio getting contained items list

In artifact_repo.py:127, download_artifact() it checks if the artifact path is a file or a dir, by calling _is_directory()
_is_directory() calls list_artifacts() and return True if number of contained artifacts > 0
s3 impl. of list_artifacts() in s3_artifact_repo.py:94 uses "list_objects_v2" operation on minio by a paginator. This function usually returns a buggy result list, e.g. for "index.log" file it returns a list of one item (some auto-generated long name), which make it believe it's a dir.
This makes the caller download_artifacts() go to internal function download_artifact_dir() instead of download_file(). That function calls list_artifacts() again which results with the fictional auto-generated item, which is also a dir that contains no items inside. This is exactly the path it builds under the /tmp while downloading this "file" from minio - but because it's a dir (not a file) it has no contents and the send_file() in handlers.py:170 raises IsADirectoryError ...

Inconsistency:
When I found that, I copied the list_artifacts() function to my own test and ran it separately with the same params.
To my surprise it worked well!! (i.e. it returned an empty list for "index.log").
Then I checked again the mlflow UI, and noticed that in some runs (where I had the bug before) the artifacts suddenly works and it displays the artifacts contents! For other runs it doesn't...

Possible Solution:
I'm using the last minio version 2020-07-18T18:48:16Z
I suspect this version is still unstable, so I'm going to check it with older minio version.

amiryi365 · 2020-08-03T05:36:24Z

I found that this bug disappear with an older release of minio - 2020-04-28T23:56:56Z
See minio/minio#10157
Actually: although the browser displays well all artifacts files, the server still raises the IsADirectoryError when I click on a dir in the artifacts tree (in mlflow run web page). But it doesn't have any external effect so it doesn't bother me.

harupy · 2020-08-03T05:52:30Z

@amiryi365 Thanks for the investigation :)

Subhraj07 · 2021-05-29T17:31:02Z

I am getting this issue while I deployed mlflow in kubernetes and minio as artifacts store. Getting following error.

sfc-gh-adlee · 2021-11-25T07:16:31Z

@Subhraj07 did u solve this? Im having the same issue too :(

amiryi365 added the bug Something isn't working label Jul 22, 2020

harupy added the area/artifacts Artifact stores and artifact logging label Jul 22, 2020

harupy added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Jul 23, 2020

amiryi365 mentioned this issue Jul 30, 2020

ListObjectsV2 gets unexpected CommonPrefixes minio/minio#10157

Closed

amiryi365 closed this as completed Aug 3, 2020

christianbaumberger mentioned this issue May 12, 2022

[BUG] IsADirectoryError while selecting Azure artifact in the UI #5856

Closed

20 tasks

[BUG] IsADirectoryError while selecting S3 artifact in the UI #3154

[BUG] IsADirectoryError while selecting S3 artifact in the UI #3154

Comments

amiryi365 commented Jul 22, 2020 • edited by harupy

System information

Describe the problem

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

harupy commented Jul 22, 2020 • edited

amiryi365 commented Jul 23, 2020

harupy commented Jul 23, 2020 • edited

amiryi365 commented Jul 23, 2020 • edited

harupy commented Jul 23, 2020 • edited

amiryi365 commented Jul 23, 2020

harupy commented Jul 23, 2020

amiryi365 commented Jul 23, 2020

harupy commented Jul 23, 2020 • edited

amiryi365 commented Jul 23, 2020

amiryi365 commented Jul 23, 2020

harupy commented Jul 23, 2020

amiryi365 commented Jul 23, 2020

harupy commented Jul 23, 2020 • edited

amiryi365 commented Jul 23, 2020

harupy commented Jul 23, 2020

amiryi365 commented Jul 23, 2020 • edited

harupy commented Jul 23, 2020 • edited

amiryi365 commented Jul 23, 2020

amiryi365 commented Jul 26, 2020

harupy commented Jul 26, 2020 • edited

amiryi365 commented Jul 27, 2020 • edited

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020 • edited

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020 • edited

harupy commented Jul 27, 2020

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 27, 2020

harupy commented Jul 27, 2020 • edited

amiryi365 commented Jul 28, 2020

amiryi365 commented Aug 3, 2020

harupy commented Aug 3, 2020

Subhraj07 commented May 29, 2021

sfc-gh-adlee commented Nov 25, 2021

amiryi365 commented Jul 22, 2020 •

edited by harupy

harupy commented Jul 22, 2020 •

edited

harupy commented Jul 23, 2020 •

edited

amiryi365 commented Jul 23, 2020 •

edited

harupy commented Jul 23, 2020 •

edited

harupy commented Jul 23, 2020 •

edited

harupy commented Jul 23, 2020 •

edited

amiryi365 commented Jul 23, 2020 •

edited

harupy commented Jul 23, 2020 •

edited

harupy commented Jul 26, 2020 •

edited

amiryi365 commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

amiryi365 commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

amiryi365 commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

amiryi365 commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited

harupy commented Jul 27, 2020 •

edited