In [25]:
import pandas as pd
import pickle

# Google Cloud Platform

## Enable GCP Services (From LeWagon Bootcamp)

### Configure Cloud sdk

Authenticate the gcloud CLI with the google account you used for GCP

In [None]:
!gcloud auth login

Login to your Google account on the new tab opened in your web browser

List your active account and check your email address you used for GCP is present

In [None]:
!gcloud auth list

Set your current project (replace PROJECT_ID with the ID of your project, e.g. wagon-bootcamp-123456)

In [None]:
!gcloud config set project PROJECT_ID

List your active account and current project and check your project is present

In [None]:
!gcloud config list

### Create a service account key

**See the process below on how to create it on GCP console**

The browser has now saved the service account json file 🔑 in your downloads directory (it is named according to your service account name, something like le-wagon-data-123456789abc.json)

Store the service account json file somewhere you'll remember, for example:

In [None]:
/Users/MACOS_USERNAME/code/GITHUB_NICKNAME/gcp/SERVICE_ACCOUNT_JSON_FILE_CONTAINING_YOUR_SECRET_KEY.json

You can find the absolute path of a file by drag/dropping it into a terminal window.

Store the absolute path to the JSON file as an environment variable:

In [None]:
!echo 'export GOOGLE_APPLICATION_CREDENTIALS=/path/to/the/SERVICE_ACCOUNT_JSON_FILE_CONTAINING_YOUR_SECRET_KEY.json' >> ~/.zshrc

Note: every time you run this command, it will add this line to your zshrc file regardless of whether you already have it. If you made a mistake and need to fix it, preferably open the file and edit the line!

You can do so by running this in the terminal!

In [None]:
!code ~/.zshrc

Restart your terminal and run:

In [None]:
!echo $GOOGLE_APPLICATION_CREDENTIALS

The ouptut should be the following:

In [None]:
/some/absolute/path/to/your/gcp/SERVICE_ACCOUNT_JSON_FILE_CONTAINING_YOUR_SECRET_KEY.json

Now let's verify that the path to your service account json file is correct:

👉 This command should display the content of your service account json file.

In [None]:
cat $(echo $GOOGLE_APPLICATION_CREDENTIALS)

Your code and utilities are now able to access the resources of your GCP account.

Let's proceed with the final steps of configuration...

    List the service accounts associated to your active account and current project


In [None]:
!gcloud iam service-accounts list


    Retrieve the service account email address, e.g. SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
    List the roles of the service account from the cli (replace PROJECT_ID and SERVICE_ACCOUNT_EMAIL)
    
    You should see that your service account has a role of roles/owner


In [None]:
gcloud projects get-iam-policy PROJECT_ID \
--flatten="bindings[].members" \
--format='table(bindings.role)' \
--filter="bindings.members:SERVICE_ACCOUNT_EMAIL"

## Set Up Cloud Storage Authorization (From Google Website)

### Installing the client library

In [None]:
!pip install --upgrade google-cloud-storage

### Setting up authentication

Create a service account:

    1) In the Google Cloud console, go to the Create service account page. 
    
    https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project&_ga=2.1176241.1291299176.1665473547-121098221.1665473547
    
    2) Select your project.

    3) In the Service account name field, enter a name. The Google Cloud console fills in the Service account ID field based on this name.

    In the Service account description field, enter a description. For example, Service account for quickstart.

    4)Click Create and continue.

    5) To provide access to your project, grant the following role(s) to your service account: Cloud Storage > Storage Admin .

    In the Select a role list, select a role.

    For additional roles, click add Add another role and add each additional role. 
    
    6) Click Continue.

    7) Click Done to finish creating the service account.

    Do not close your browser window. You will use it in the next step.
    
Create a service account key: 
  
    1) In the Google Cloud console, click the email address for the service account that you created.
    
    2) Click Keys.
    
    3) Click Add key, and then click Create new key.
    
    4) Click Create. A JSON key file is downloaded to your computer.
    
    5) Click Close.

Provide authentication credentials to your application code by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS. This variable applies only to your current shell session. If you want the variable to apply to future shell sessions, set the variable in your shell startup file, for example in the ~/.bashrc or ~/.profile file. 

In [None]:
!export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Replace KEY_PATH with the path of the JSON file that contains your service account key.

### Other Authorization Resources

Cloud Storage authentication

https://cloud.google.com/storage/docs/authentication

Credential Types Supporting Various Use Cases

https://cloud.google.com/storage/docs/gsutil/addlhelp/CredentialTypesSupportingVariousUseCases

Securely connecting to VM instances

https://cloud.google.com/solutions/connecting-securely

## Google Cloud Storage

### Create a new bucket

In [11]:
from google.cloud import storage


def create_bucket_class_location(bucket_name: str, 
                                 bucket_loc='eu', 
                                 store_class = 'STANDARD'):
    """
    Create a new bucket in the specified region (default EU) with the specified storage
    class (default STANDARD)
    """
    # bucket_name = "your-new-bucket-name"

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)
    bucket.storage_class = store_class
    new_bucket = storage_client.create_bucket(bucket, location=bucket_loc)

    print(
        "Created bucket {} in {} with storage class {}".format(
            new_bucket.name, new_bucket.location, new_bucket.storage_class
        )
    )
    return new_bucket



In [13]:
# Bucket name must be GLOBALLY unique

BUCKET_NAME = 'test-bucket-24875138'

create_bucket_class_location(BUCKET_NAME)

Created bucket test-bucket-24875138 in EU with storage class STANDARD


<Bucket: test-bucket-24875138>

### List the buckets in a project

In [14]:
from google.cloud import storage


def list_buckets():
    """Lists all buckets."""

    storage_client = storage.Client()
    buckets = storage_client.list_buckets()

    for bucket in buckets:
        print(bucket.name)



In [15]:
list_buckets()

indo-trading-algo-data
test-bucket-24875138


### List the objects in a bucket

#### The following sample lists all objects in a bucket:

In [7]:
from google.cloud import storage


def list_blobs(bucket_name):
    """Lists all the blobs in the bucket."""
    # bucket_name = "your-bucket-name"

    storage_client = storage.Client()

    # Note: Client.list_blobs requires at least package version 1.17.0.
    blobs = storage_client.list_blobs(bucket_name)

    # Note: The call returns a response only when the iterator is consumed.
    for blob in blobs:
        print(blob.name)



In [32]:
BUCKET_NAME = 'test-bucket-24875138'

list_blobs(BUCKET_NAME)

df_read_write_data
read_write_data.csv


#### The following sample lists objects with a given prefix:

In [6]:
from google.cloud import storage


def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
    """Lists all the blobs in the bucket that begin with the prefix.

    This can be used to list all blobs in a "folder", e.g. "public/".

    The delimiter argument can be used to restrict the results to only the
    "files" in the given "folder". Without the delimiter, the entire tree under
    the prefix is returned. For example, given these blobs:

        a/1.txt
        a/b/2.txt

    If you specify prefix ='a/', without a delimiter, you'll get back:

        a/1.txt
        a/b/2.txt

    However, if you specify prefix='a/' and delimiter='/', you'll get back
    only the file directly under 'a/':

        a/1.txt

    As part of the response, you'll also get back a blobs.prefixes entity
    that lists the "subfolders" under `a/`:

        a/b/
    """

    storage_client = storage.Client()

    # Note: Client.list_blobs requires at least package version 1.17.0.
    blobs = storage_client.list_blobs(bucket_name, prefix=prefix, delimiter=delimiter)

    # Note: The call returns a response only when the iterator is consumed.
    print("Blobs:")
    for blob in blobs:
        print(blob.name)

    if delimiter:
        print("Prefixes:")
        for prefix in blobs.prefixes:
            print(prefix)


### Upload an object to a bucket

#### The following sample uploads an object from a file:

In [16]:
from google.cloud import storage


def upload_blob(bucket_name, source_file_name, destination_blob_name):
    """Uploads a file to the bucket."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"
    # The path to your file to upload
    # source_file_name = "local/path/to/file"
    # The ID of your GCS object
    # destination_blob_name = "storage-object-name"

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)

    print(
        f"File {source_file_name} uploaded to {destination_blob_name}."
    )



In [17]:
SOURCE_FILE = 'save_files/read_write_data.csv'
DESTINATION = 'read_write_data.csv'

upload_blob(BUCKET_NAME, SOURCE_FILE, DESTINATION)

File save_files/read_write_data.csv uploaded to read_write_data.csv.


#### The following sample uploads an object from memory:

In [19]:
from google.cloud import storage


def upload_blob_from_memory(bucket_name, contents, destination_blob_name):
    """Uploads a file to the bucket."""

    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The contents to upload to the file
    # contents = "these are my contents"

    # The ID of your GCS object
    # destination_blob_name = "storage-object-name"

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_string(contents)

    print(
        f"{destination_blob_name} with contents {contents} uploaded to {bucket_name}."
    )


In [22]:
df_read_write_data = pd.read_csv('save_files/read_write_data.csv')
df_read_write_data

Unnamed: 0,name,subjects,marks
0,sravan,java,98
1,jyothika,java,79
2,harsha,java,89
3,ramya,python,97
4,sravan,python,82
5,jyothika,python,98
6,harsha,html/php,90
7,ramya,html/php,87
8,sravan,html/php,78
9,jyothika,php/js,89


In [31]:
# Use pickle.dumps() to serialized an object in to a binary string (OBJECT)

OBJECT = pickle.dumps(df_read_write_data)
print(OBJECT)

DESTINATION = 'df_read_write_data'

# Upload the binary string to GS blob (DESTINATION)

upload_blob_from_memory(BUCKET_NAME, OBJECT, DESTINATION)

b'\x80\x04\x95\xd8\x03\x00\x00\x00\x00\x00\x00\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x0cBlockManager\x94\x93\x94\x8c\tfunctools\x94\x8c\x07partial\x94\x93\x94\x8c\x1cpandas.core.internals.blocks\x94\x8c\tnew_block\x94\x93\x94\x85\x94R\x94(h\x0e)}\x94\x8c\x04ndim\x94K\x02sNt\x94b\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x01K\x0c\x86\x94h\x17\x8c\x05dtype\x94\x93\x94\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C`b\x00\x00\x00\x00\x00\x00\x00O\x00\x00\x00\x00\x00\x00\x00Y\x00\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x00\x00\x00\x00R\x00\x00\x00\x00\x00\x00\x00b\x00\x00\x00\x00\x00\x00\x00Z\x00\x00\x00\x00\x00\x00\x00W\x00\x00\x00\x00\x00\x00\x00N\x00\x00\x00\x00\x00\x00\x00Y\x00\x00\x00\x00\x00\x00\x00]\x00\x00\x00\x00\x00\x00\x00

### Download an object from a bucket

#### The following sample downloads an object to a file:

In [1]:
from google.cloud import storage


def download_blob(bucket_name, source_blob_name, destination_file_name):
    """Downloads a blob from the bucket."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The ID of your GCS object
    # source_blob_name = "storage-object-name"

    # The path to which the file should be downloaded
    # destination_file_name = "local/path/to/file"

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)

    print(
        "Downloaded storage object {} from bucket {} to local file {}.".format(
            source_blob_name, bucket_name, destination_file_name
        )
    )



#### The following sample downloads an object into memory:

In [28]:
from google.cloud import storage


def download_blob_into_memory(bucket_name, blob_name):
    """Downloads a blob into memory."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The ID of your GCS object
    # blob_name = "storage-object-name"

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(blob_name)
    contents = blob.download_as_string()

    print(
        "Downloaded storage object {} from bucket {} as the following string: {}.".format(
            blob_name, bucket_name, contents
        )
    )
    return contents



In [33]:
BLOB_NAME = 'df_read_write_data'   # Name of the object saved on gcs

# Retreive binary string from GS
binary_string = download_blob_into_memory(BUCKET_NAME, BLOB_NAME)

# Use pickle.loads() to convert the binary string back into the object
df_read_write_data = pickle.loads(binary_string)
  
df_read_write_data

Downloaded storage object df_read_write_data from bucket test-bucket-24875138 as the following string: b'\x80\x04\x95\xd8\x03\x00\x00\x00\x00\x00\x00\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x0cBlockManager\x94\x93\x94\x8c\tfunctools\x94\x8c\x07partial\x94\x93\x94\x8c\x1cpandas.core.internals.blocks\x94\x8c\tnew_block\x94\x93\x94\x85\x94R\x94(h\x0e)}\x94\x8c\x04ndim\x94K\x02sNt\x94b\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x01K\x0c\x86\x94h\x17\x8c\x05dtype\x94\x93\x94\x8c\x02i8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C`b\x00\x00\x00\x00\x00\x00\x00O\x00\x00\x00\x00\x00\x00\x00Y\x00\x00\x00\x00\x00\x00\x00a\x00\x00\x00\x00\x00\x00\x00R\x00\x00\x00\x00\x00\x00\x00b\x00\x00\x00\x00\x00\x00\x00Z\x00\x00\x00\x00\x00\x00\x00W\x00\x00\x00

Unnamed: 0,name,subjects,marks
0,sravan,java,98
1,jyothika,java,79
2,harsha,java,89
3,ramya,python,97
4,sravan,python,82
5,jyothika,python,98
6,harsha,html/php,90
7,ramya,html/php,87
8,sravan,html/php,78
9,jyothika,php/js,89


### Access A File Via GS Path

In [36]:
def gs_path(blob_name: str,
            bucket_name: str) -> str:
    """
    gs_path(blob_name:str,
            bucket_name:str = DATA_BUCKET_NAME)

    Returns the Google Storage path to a blob.

    # The ID of your GCS object
    # blob_name = "storage-object-name"

    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    """

    return f"gs://{bucket_name}/{blob_name}"

In [30]:
BLOB_NAME = 'read_write_data.csv'

client = storage.Client()

path = f"gs://{BUCKET_NAME}/{BLOB_NAME}"

df = pd.read_csv(path, nrows=5)

df

Unnamed: 0,name,subjects,marks
0,sravan,java,98
1,jyothika,java,79
2,harsha,java,89
3,ramya,python,97
4,sravan,python,82


### Delete an object

In [34]:
from google.cloud import storage


def delete_blob(bucket_name, blob_name):
    """Deletes a blob from the bucket."""
    # bucket_name = "your-bucket-name"
    # blob_name = "your-object-name"

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)
    blob.delete()

    print(f"Blob {blob_name} deleted.")

