-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New upload and download functions #30
Conversation
dataengineeringutils3/s3.py
Outdated
if not overwrite: | ||
location = Path(local_file_path) | ||
if location.is_file(): | ||
raise FileExistsError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should state the filepath in the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 now added
dataengineeringutils3/s3.py
Outdated
Path(folder).mkdir(parents=True, exist_ok=True) | ||
|
||
# Download the file | ||
s3 = boto3.client("s3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3_client
instead of s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - thanks!
dataengineeringutils3/s3.py
Outdated
root.mkdir(parents=True, exist_ok=True) | ||
|
||
# Get an object representing the bucket | ||
s3 = boto3.resource("s3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3_resource
instead of s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now changed - thank you!
dataengineeringutils3/s3.py
Outdated
|
||
# Raise an error if file already exists and not overwriting | ||
if not overwrite and destination.is_file(): | ||
raise FileExistsError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again state filepath that exists in error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 added
dataengineeringutils3/s3.py
Outdated
if include_hidden_files or not obj.name.startswith("."): | ||
# Construct relative path to retain local folder structure | ||
relative_to_root = str(obj.relative_to(root_folder)) | ||
file_s3_path = f"{s3_path}{relative_to_root}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an os.path.join
is better here (even though you add a slash) you let the function do that work for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, good point - thank you
dataengineeringutils3/s3.py
Outdated
root_folder: Union[Path, str], | ||
s3_path: str, | ||
overwrite: bool = False, | ||
include_hidden_files: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better defaulting to False? I don't think needing to add hidden files is a very common use case. Could easily see a bunch of git stuff getting uploaded by accident! Nice to have the functionality as an option though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, good point! Have flipped it. No one wants .DS_store...
dataengineeringutils3/s3.py
Outdated
relative_to_root = str(obj.relative_to(root_folder)) | ||
file_s3_path = f"{s3_path}{relative_to_root}" | ||
write_local_file_to_s3(str(obj), file_s3_path, overwrite) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps I am being picky here but maybe an explicit elif obj.is_dir()
would be better. I think i agree that it could be only two things, but probably just safer to be explicit. I will leave this up to you though, not very important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks - have changed. So if it is something else, it should just skip it
dataengineeringutils3/s3.py
Outdated
:param overwrite: if True, overwrite existing files in the target location | ||
if False, raise ValueError if existing files are found in the target location | ||
:param include_hidden_files: if False, ignore files whose names start with a . | ||
:param current_folder: leave as None - only used during recursion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this should be left as None
why is it an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what? I'm a total wally. The satisfaction of doing it recursively blinded me to a much simpler approach that removes the need for this extra parameter and should run more efficiently too. Now just uses Path.rglob()
rather than iterating through each directory.
Thanks for the PR Alec! Just a few minor comments from Karik and me but this looks like it adds really useful functionality. |
Hurrah, thanks @gkelly900 and @isichei. I've made all those changes now - how does this look? In particular I've removed the unnecessary extra parameter when copying a folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alec! Looks good feel free to go ahead and merge!
Adds 3 new functions and their tests to the s3 module:
Also, in
test_s3.py
, buckets for most tests are now created in a fixture, to reduce repetition.New features, so bumped version to 1.3.0 in pyproject.toml