Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange folder structure #21

Closed
lukemarsden opened this issue Jun 17, 2021 · 6 comments
Closed

Strange folder structure #21

lukemarsden opened this issue Jun 17, 2021 · 6 comments
Assignees
Labels
S O(days)
Milestone

Comments

@lukemarsden
Copy link
Contributor

lukemarsden commented Jun 17, 2021

when using dataset.download(), the files gets downloaded into a bonkers directory structure like http%blah

@lukemarsden lukemarsden added this to the S1 milestone Jun 17, 2021
@lukemarsden lukemarsden added the S O(days) label Jun 17, 2021
@albscui albscui self-assigned this Jun 29, 2021
@lukemarsden
Copy link
Contributor Author

deprio this because we're demoing via tabular datastores which go to a pandas dataframe

@lukemarsden lukemarsden changed the title Bugs discovered in end-to-end Strange folder structure Jul 6, 2021
@lukemarsden
Copy link
Contributor Author

reprio becuase we want to support files and tables now

@albscui
Copy link
Contributor

albscui commented Jul 6, 2021

According to Andrei, the download path is not handled by rslex currently. This might be resolved automatically by a future update.

@albscui albscui removed their assignment Jul 6, 2021
@lukemarsden
Copy link
Contributor Author

lukemarsden commented Jul 6, 2021 via email

@anliakho2
Copy link

This is caused by the inability to detect folder structure from python (because it does a list today and tries to find a common prefix)
There needs to be an improvement in the SDK to make the logic smarter
As of right now it does the safe option of creating a fully qualified path so even if your dataset has multiple paths pointing to different pachyderm clusters files dowloaded would not collide.

@albscui albscui self-assigned this Aug 9, 2021
@albscui
Copy link
Contributor

albscui commented Aug 9, 2021

I wonder if we control the path that gets passed back to whatever code handles the downloads though

On Tue, 6 Jul 2021, 19:22 Albert, @.***> wrote: According to Andrei, the download path is not handled by rslex currently. This might be resolved automatically by a future update. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACATUSILUSYQQAYDIBYDALTWNCYFANCNFSM4637W7WQ .

As a matter of fact, we do! We were adding the http://localhost:30600... prefix to the StreamInfos returned by the Searcher. We didn't need to do this because the rslex-http-stream library calls the ReadRequest API which is implemented by the RequestBuilder. The RequestBuilder contains the HTTP schema, host, and port info already.

@albscui albscui closed this as completed Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S O(days)
Projects
None yet
Development

No branches or pull requests

3 participants