Restore ability to read Parquet files in S3 directories

**Bug report**

During the resolution of https://github.com/lincc-frameworks/nested-pandas/issues/365, in https://github.com/lincc-frameworks/nested-pandas/pull/385/commits/84d3a0948c42240e10eb79cbcced835d8c4b932d, the ability of `nested_pandas.read_parquet` to read files from S3 directories was regressed.  This was because:

  1. During development and testing, it was clear that `.read_parquet` had never been able to read HTTP directories, and that was considered to be true for all remote network directories;
  2. The use of `UPath.is_dir()` was observed to be much too slow for testing remote filesystem paths.

This change then caused a regression in LSDB, which was worked around in PR https://github.com/astronomy-commons/hats/pull/576 .

`nested_pandas.read_parquet` should be changed to restore the use of S3 directories and any other network-based filesystems that it was able to use before, but without incurring any undue cost via `UPath.is_dir()`.  One possible solution would be to trust the presence of a trailing slash on the path as a clue to the user's intent; however, this was not required before.  Another would be to accept the cost of `UPath.is_dir()` (as LSDB does in its workaround), as long as it was much less than the cost of reading the Parquet file itself.

**Before submitting**
Please check the following:

- [X] I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
- [X] I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
- [X] If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restore ability to read Parquet files in S3 directories #392

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Restore ability to read Parquet files in S3 directories #392

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions