# Data Engineer certificate


## File upload to S3

aws CLI automatically splits large file into multi-upload 
```bash
$ aws s3 cp large_test_file s3://DOC-EXAMPLE-BUCKET/
```
For example in a case of multipart upload for a 100 GB file you would have the following API calls for the entire process:  
- A CreateMultipartUpload call to start the process   
- 1000 individual UploadPart calls, each uploading a part of 100 MB, for a total size of 100 GB   
- A CompleteMultipartUpload call to finish the process   
There would be a total of 1002 API calls   

To store directly into S3 Glacier:
```bash
aws s3 cp your-file.txt s3://your-bucket-name/your-file.txt --storage-class DEEP_ARCHIVE
```

```bash
aws s3api put-bucket-lifecycle-configuration --bucket your-bucket-name --lifecycle-configuration file://lifecycle_policy.json
```

To add a lifecycle policy:
```json
{
    "Rules": [
        {
            "ID": "Move to Glacier after 30 days",
            "Filter": {
                "Tag": {
                    "Key": "Lifecycle",
                    "Value": "Archive"
                }
            },
            "Status": "Enabled",
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "GLACIER"
                }
            ]
        },
        {
            "ID": "Delete after 365 days",
            "Filter": {
                "Tag": {
                    "Key": "Lifecycle",
                    "Value": "Archive"
                }
            },
            "Status": "Enabled",
            "Expiration": {
                "Days": 365
            }
        }
    ]
}
```

