Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up S3 Access logs #1658

Open
carolyncole opened this issue Jan 23, 2024 · 1 comment
Open

clean up S3 Access logs #1658

carolyncole opened this issue Jan 23, 2024 · 1 comment

Comments

@carolyncole
Copy link
Member

https://s3.console.aws.amazon.com/s3/buckets/pdc-describe-logs?bucketType=general&region=us-east-1&tab=objects#
Logs need to be purged regularly.

@hectorcorrea
Copy link
Member

hectorcorrea commented Aug 14, 2024

As an experiment, I ran the following command to get a list of all the files that we have in the pdc-describe-logs bucket:

aws s3 ls --summarize --human-readable --recursive s3://pdc-describe-logs/ > pdc_describe_logs.txt

Lo and behold we have 9 million files (9,273,338 to be exact). They add up to 290 GB.

The command took about 75 minutes to run and it gave me a list of all the files and their dates. The list is available here: https://drive.google.com/file/d/1fhyNJLWYCEfHVPCcq5bS8DmBxZ4FcQGP/view?usp=drive_link

We could go throught that list and start deleting files older than X.

One problem might be that the aws s3 CLI tool does not seem to accept wildcards for the rm command.

Another approach to address this problem could be to start preserving files to a new S3 bucket and delete this bucket with 9 million files in two years or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants