## Download files from AWS Open Data set

For this example, we are using below Open Data set which is publicly available. 

[NapierOne Mixed File Dataset](https://registry.opendata.aws/napierone/)

Although this data set has many files, we will be focusing on download and reading just one text file
```
http://napierone.com.s3.amazonaws.com/NapierOne/Data/7ZIP/7ZIP-BZIP2-small_zip_hashes.txt
```

To read these files, we are using Python and boto3 library

### Step 1: Install boto3 library

In [39]:
pip install boto3

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'C:\Users\prakh\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip' command.


In [40]:
import boto3
from botocore.handlers import disable_signing
from botocore import UNSIGNED
from botocore.config import Config
import os

### Step 2: Set Bucket name and Files location as variables

In [41]:
S3_BUCKET = "napierone.com"
SAMPLE_FILE = "NapierOne/Data/7ZIP/7ZIP-BZIP2-small_zip_hashes.txt"
TEMP_FILE = "temp.txt"

### Step 3: Register S3 client and resource
Note: Since we are reading files from an open data set, we are not using any credentials.

In [42]:
s3_client = boto3.client('s3', config=Config(signature_version=UNSIGNED)) # equivalent to --no-sign-request
s3_resource = boto3.resource('s3')
s3_resource.meta.client.meta.events.register('choose-signer.s3.*', disable_signing) # equivalent to --no-sign-request

### Step 4: Read all objects in specified location in this bucket

In [43]:
file_names = []
bucket = s3_resource.Bucket(S3_BUCKET)

for object in bucket.objects.all():
    file_names.append(object.key)

print(f"Available {len(file_names)} files")

Available 768 files


Download file to Temp location and read contents from the file

In [44]:
s3_client.download_file(S3_BUCKET, SAMPLE_FILE, TEMP_FILE)

with open(TEMP_FILE, "r") as text_file:
    for line in text_file.read().splitlines():
        print(line)




****************************************************************

    Hash Console v1.5 by SecurityXploded

    http://securityxploded.com/hash-console.php

*****************************************************************


 ________________________________________________________

 :: Generating Hash for File '7ZIP-small.zip'
 ________________________________________________________



 Hash Type         Hash Length      Hash Value

 CRC32             4                bd3b3377

 ADLER32           4                fcb99950

 RIPEMD160         20               d326302b64e417aedc79456378b75bbebb41e823

 MD2               16               969f083164d3a54088d382736e65e106

 MD4               16               c027a8b6c4a4ecff4c7f2642c2955391

 MD5               16               d89acaa2425eec5a8547df0e976a41c8

 SHA1              20               483c456bba35e571d206100924d9220ac7a30cfa

 SHA256            32               3169eb9530a0b34a3c5a17d1d1aebd8ba80ed7d3e2b3305bce0edd05f6b98c08


Finally, clean up temp file.

In [45]:
os.remove(TEMP_FILE)