Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New upload and download functions #30

Merged
merged 9 commits into from
Oct 7, 2021

Conversation

MrAlecJohnson
Copy link
Contributor

Adds 3 new functions and their tests to the s3 module:

  • write_local_folder_to_s3
  • write_s3_file_to_local
  • write_s3_folder_to_local

Also, in test_s3.py, buckets for most tests are now created in a fixture, to reduce repetition.

New features, so bumped version to 1.3.0 in pyproject.toml

if not overwrite:
location = Path(local_file_path)
if location.is_file():
raise FileExistsError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should state the filepath in the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 now added

Path(folder).mkdir(parents=True, exist_ok=True)

# Download the file
s3 = boto3.client("s3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3_client instead of s3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - thanks!

root.mkdir(parents=True, exist_ok=True)

# Get an object representing the bucket
s3 = boto3.resource("s3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3_resource instead of s3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now changed - thank you!


# Raise an error if file already exists and not overwriting
if not overwrite and destination.is_file():
raise FileExistsError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again state filepath that exists in error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 added

if include_hidden_files or not obj.name.startswith("."):
# Construct relative path to retain local folder structure
relative_to_root = str(obj.relative_to(root_folder))
file_s3_path = f"{s3_path}{relative_to_root}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an os.path.join is better here (even though you add a slash) you let the function do that work for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, good point - thank you

root_folder: Union[Path, str],
s3_path: str,
overwrite: bool = False,
include_hidden_files: bool = True,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better defaulting to False? I don't think needing to add hidden files is a very common use case. Could easily see a bunch of git stuff getting uploaded by accident! Nice to have the functionality as an option though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, good point! Have flipped it. No one wants .DS_store...

relative_to_root = str(obj.relative_to(root_folder))
file_s3_path = f"{s3_path}{relative_to_root}"
write_local_file_to_s3(str(obj), file_s3_path, overwrite)
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps I am being picky here but maybe an explicit elif obj.is_dir() would be better. I think i agree that it could be only two things, but probably just safer to be explicit. I will leave this up to you though, not very important.

Copy link
Contributor Author

@MrAlecJohnson MrAlecJohnson Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks - have changed. So if it is something else, it should just skip it

:param overwrite: if True, overwrite existing files in the target location
if False, raise ValueError if existing files are found in the target location
:param include_hidden_files: if False, ignore files whose names start with a .
:param current_folder: leave as None - only used during recursion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this should be left as None why is it an option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what? I'm a total wally. The satisfaction of doing it recursively blinded me to a much simpler approach that removes the need for this extra parameter and should run more efficiently too. Now just uses Path.rglob() rather than iterating through each directory.

@gkelly900
Copy link

Thanks for the PR Alec! Just a few minor comments from Karik and me but this looks like it adds really useful functionality.

@MrAlecJohnson
Copy link
Contributor Author

MrAlecJohnson commented Oct 4, 2021

Hurrah, thanks @gkelly900 and @isichei. I've made all those changes now - how does this look? In particular I've removed the unnecessary extra parameter when copying a folder.

Copy link

@gkelly900 gkelly900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alec! Looks good feel free to go ahead and merge!

@MrAlecJohnson MrAlecJohnson merged commit b9c0f72 into master Oct 7, 2021
@MrAlecJohnson MrAlecJohnson deleted the new-upload-and-download-functions branch October 7, 2021 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants