## Download and read file from AWS Open Data set

For this example, we are using below Open Data set which is publicly available. 

[NapierOne Mixed File Dataset](https://registry.opendata.aws/napierone/)

Although this data set has many files, we will be focusing on download and reading just one text file
```
http://napierone.com.s3.amazonaws.com/NapierOne/Data/7ZIP/7ZIP-BZIP2-small_zip_hashes.txt
```

To read this file, we are using Python and boto3 library

### Step 1: Install boto3 library

In [None]:
pip install boto3

In [None]:
import boto3
from botocore.handlers import disable_signing
from botocore import UNSIGNED
from botocore.config import Config
import os

### Step 2: Set Bucket name and Files location as variables

In [None]:
S3_BUCKET = "napierone.com"
SAMPLE_FILE = "NapierOne/Data/7ZIP/7ZIP-BZIP2-small_zip_hashes.txt"
TEMP_FILE = "temp.txt"

### Step 3: Register S3 client and resource
Note: Since we are reading files from an open data set, we are not using any credentials.

In [None]:
s3_client = boto3.client('s3', config=Config(signature_version=UNSIGNED)) # equivalent to --no-sign-request
s3_resource = boto3.resource('s3')
s3_resource.meta.client.meta.events.register('choose-signer.s3.*', disable_signing) # equivalent to --no-sign-request

### Step 4: Read all objects in specified location in this bucket

In [None]:
file_names = []
bucket = s3_resource.Bucket(S3_BUCKET)

for object in bucket.objects.all():
    file_names.append(object.key)

print(f"Available {len(file_names)} files")

### Step 5: Download file to temp location and read contents from the file

In [None]:
s3_client.download_file(S3_BUCKET, SAMPLE_FILE, TEMP_FILE)

with open(TEMP_FILE, "r") as text_file:
    for line in text_file.read().splitlines():
        
        # Printing only lines with text
        if len(line.replace(' ','')) > 0:
            print(line)


#### Step 6: Finally, clean up temp file.

In [None]:
os.remove(TEMP_FILE)