## Example: Migrate files from AWS S3 bucket to GCP Storage and Azure Blob Storage

### Prerequisites

1. Install Cloud Data Connector.

For this example, create a bucket in AWS S3, GCP Storage and Azure Blob Storage.

Before running the notebook, configure environment variables and set your credentials as follows:

#### AWS
```
$ export AWS_ACCESS_KEY_ID=<your_key_id>
$ export AWS_SECRET_ACCESS_KEY=<your_secret_key>
$ export AWS_BUCKET_NAME=<your_bucket_name>
```

#### GCP
```
$ export GOOGLE_APPLICATION_CREDENTIALS=path/to/your_key_id.json
$ export GCP_PROJECT_NAME=<your_gcp_project_name>
$ export GCP_BUCKET_NAME=<your_gcp_project_name>
```

#### Azure
```
$ export AZURE_BLOB_NAME=<your_azure_blob_name>
$ export AZURE_CONNECTION_STRING=<your_azure_connection_string>
```

### Specify key value for the new file you will upload to AWS S3 bucket

In AWS S3, files are stored in buckets. S3 supports the folder concept as a means of grouping objects, so you can specify a folder name where to put a file as a key. For example, a key should be `dir_name/file_name`.

Below cell defines a key for this example. The file will be saved in `1937` folder and its name will be `hello_world.txt`.

In [None]:
dir_name = "1937"
file_name = "hello_world.txt"
key = f"{dir_name}/{file_name}"
print(key)

### Prepare data
Create a downloads directory to save downloaded files.

In [None]:
import os
download_dir = 'downloads'
if not os.path.exists(download_dir):
    os.mkdir(download_dir)

Create a uploads directory to save all files you will upload.

In [None]:
uploads_dir = 'uploads'
if not os.path.exists(uploads_dir):
    os.mkdir(uploads_dir)

Create a txt file in uploads directory and add a plain text string.

In [None]:
file_text = "Hello World!"
file_path = f"{uploads_dir}/{file_name}"
with open(file_path, "w", encoding="UTF-8") as f:
    f.write(file_text)

### Migrate data with Cloud Data Connector

Read bucket name from environment variables.

In [None]:
try:
    aws_bucket_name = os.environ["BUCKET_NAME"]
except KeyError: 
    print("Environment variable does not exist, please set a value for aws_bucket_name")

#### Upload file to AWS S3

Import `Connector` and `Uploader` from data_connector package. Create a `Connector` to get a S3 client. By default, the `connect` function reads the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values from environment variables.

In [None]:
from cloud_data_connector.aws import Connector, Uploader
s3_client = Connector().connect()

Next step is to create an `Uploader`, add the S3 client returned by `connect` as parameter and call to `upload`. Set bucket name, file name and key parameters.

In [None]:
s3_uploader = Uploader(s3_client)
s3_uploader.upload(aws_bucket_name, file_path, key)

### Download file with Cloud Data Connector from AWS S3 bucket

Download file `hello_world.txt` and save it in `downloads/`.

In [None]:
from cloud_data_connector.aws import Downloader
s3_downloader = Downloader(s3_client)
s3_downloader.download(aws_bucket_name, key, f"{download_dir}/{file_name}")

### Upload file to GCP bucket



Read credentials from environment variables.

In [None]:
try:
    gcp_app_credentials = os.environ["GOOGLE_APPLICATION_CREDENTIALS"]
    gcp_project_name = os.environ["GCP_PROJECT_NAME"]
    gcp_bucket_name = os.environ["GCP_BUCKET_NAME"]
except KeyError as error:
    print(f"Environment variable does not exist, please set a value for {error}")

You can reuse all code from above cells to create a GCP Connector and Uploader. Change the cloud provider in export instruction to use `gcp` and add required parameters.

In [None]:
from cloud_data_connector.gcp import Connector, Uploader
gcp_storage_client = Connector("storage").connect(connection_string=gcp_project_name)

Create an Upload is identical like AWS, the only diference is the GCP client as parameter.

In [None]:
gcp_uploader = Uploader(gcp_storage_client)
gcp_uploader.upload_to_bucket(gcp_bucket_name, f"{download_dir}/{file_name}", f"{dir_name}/{file_name}")

### Upload file to Azure Blob

Read credentials from environment variables.

In [None]:
try:
    azure_blob_name = os.environ["AZURE_BLOB_NAME"]
    azure_connection_string = os.environ["AZURE_CONNECTION_STRING"]
except KeyError as error:
    print(f"Environment variable does not exist, please set a value for {error}")

Prepare file to upload.

In [None]:
file_text = "Hello World!"
uploads_dir="1937"
file_path = f"{uploads_dir}/{file_name}"
if not os.path.exists(uploads_dir):
    os.mkdir(uploads_dir)
with open(file_path, "w", encoding="UTF-8") as f:
    f.write(file_text)

Reuse the code for GCP, change the export instruction to Azure and add all required parameters.

In [None]:
from cloud_data_connector.azure import Connector, Uploader
azure_storage_client = Connector().connect(connection_string=azure_connection_string)

Create an Uploader and add the storage client created.

In [None]:
azure_uploader = Uploader(azure_storage_client)
azure_uploader.upload(f"{uploads_dir}/{file_name}", azure_blob_name)

### Migrate data without Cloud Data Connector

#### Upload file to AWS S3 bucket

Upload file from uploads_dir

In [None]:
uploads_dir="uploads"

Create a S3 client with boto3.

In [None]:
import boto3
s3 = boto3.client('s3')

Add all required parameters to upload_file function.

In [None]:
s3.upload_file(f"{uploads_dir}/{file_name}", aws_bucket_name, f"{dir_name}/{file_name}")

Download the file to download_dir.

In [None]:
s3.download_file(aws_bucket_name, f"{dir_name}/{file_name}", f"{download_dir}/{file_name}")

#### Upload file to GCP

Import storage package and create a GCP Client.

In [None]:
from google.cloud import storage
storage_client = storage.Client()

To upload a file, you need to create a bucket and a blob object. Execute the upload_from_filename method.

In [None]:
bucket = storage_client.bucket(gcp_bucket_name)
blob = bucket.blob(f"{dir_name}/{file_name}")
blob.upload_from_filename(f"{download_dir}/{file_name}")

### Upload file to Azure

Create a BlobServiceClient.

In [None]:
from azure.storage.blob import BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(azure_connection_string)

Create a container client and upload the file.

In [None]:
container_client = blob_service_client.get_container_client(container=azure_blob_name)
with open(file=os.path.join(download_dir, file_name), mode="rb") as data:
    blob_client = container_client.upload_blob(name=f"{dir_name}/{file_name}", data=data, overwrite=True)

### Notes
With Cloud Data Connector, there is a common import instruction for AWS, GCP and Azure, just specify the cloud provider name and set required parameters to create a Connector. However, without it you need to import `boto3`, `google.cloud` and `azure.storage.blob`.

With Cloud Data Connector, there is a common connect method to get a client for AWS S3, GCP Storage and Azure Blob, just add a connection string, project name or leave connect function reads your credentials from your environment variables.

With Cloud Data Connector, there is a common upload method, just need to set the cloud client to create a `Uploader` and call `upload`.

The next examples show lines of codes needed to upload a file with Cloud Data Connector and with AWS, GCP and Azure SDK for Python. 

Code to upload a file with Cloud Data Connector:

In [None]:
from cloud_data_connector import aws, gcp, azure
s3_client = aws.Connector().connect(connection_string="")
gcp_client = gcp.Connector("storage").connect(connection_string=gcp_project_name)
azure_client = azure.Connector().connect(connection_string=azure_connection_string)
aws.Uploader(s3_client).upload(aws_bucket_name, file_path, key)
gcp.Uploader(gcp_client).upload_to_bucket(gcp_bucket_name, f"{download_dir}/{file_name}", f"{dir_name}/{file_name}")
azure.Uploader(azure_client).upload(f"{uploads_dir}/{file_name}", azure_blob_name)

Code to upload a file to with boto3, google-cloud and azure blob storage clients:

In [None]:
import boto3
from google.cloud import storage
from azure.storage.blob import BlobServiceClient
s3_client = boto3.client('s3')
gcp_storage_client = storage.Client()
azure_blob_client = BlobServiceClient.from_connection_string(azure_connection_string)
s3_client.upload_file(f"{uploads_dir}/{file_name}", aws_bucket_name, f"{dir_name}/{file_name}")
bucket = gcp_storage_client.bucket(gcp_bucket_name)
blob = bucket.blob(f"{dir_name}/{file_name}")
blob.upload_from_filename(f"{download_dir}/{file_name}")
container_client = azure_blob_client.get_container_client(container=azure_blob_name)
with open(file=os.path.join(download_dir, file_name), mode="rb") as data:
    blob_client = container_client.upload_blob(name=f"{dir_name}/{file_name}", data=data, overwrite=True)