# **Using Pandas with Google Cloud Storage**
This notebook will illustrate how to upload and read data from Google Cloud Storage with Pandas (and few other helpful operations)

#### **Dataset**
The following dataset is used:
https://archive.ics.uci.edu/dataset/352/online+retail

#### **Trouble shooting**

##### **Does not have storage.buckets.get access to the Google Cloud Storage bucket.**
Ensure the service account have the right role: "storage.admin" (https://cloud.google.com/storage/docs/creating-bucket)



In [None]:
from io import StringIO
import pandas as pd

from google.cloud import storage
from google.cloud.storage import Bucket

from google.oauth2 import service_account

from google.colab import drive

In [None]:
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [None]:
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'Colab Notebooks/'

datafile = base_dir + '...' #TODO: Update with correct path for datafile

In [None]:
data = pd.read_excel(datafile)
data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


### **Set up the Cloud Storage client**

In [None]:
credentials_jsonfile = "..." #TODO: Update this to the location of your service account json file

In [None]:
# Get Google Cloud client
credentials = service_account.Credentials.from_service_account_file(
    base_dir + credentials_jsonfile,
    scopes=["https://www.googleapis.com/auth/cloud-platform"]
)

In [None]:
client = storage.Client(credentials = credentials)

### **Initiate a few bucket variables**

In [None]:
bucket_name = 'my-bucket' # Name will have to be unique, as it is shared by all users of the system. Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name. https://cloud.google.com/storage/docs/buckets?hl=en#naming

data_path = 'test-data/file.csv'

### **Create bucket**

In [None]:
bucket_exists = Bucket(client, bucket_name).exists()

if(bucket_exists == False):
  client.create_bucket(bucket_name, location = 'US-EAST1') # 'US-EAST1',5 GiB Always Free tier https://cloud.google.com/storage/pricing#cloud-storage-always-free

### **Get bucket**

In [None]:
bucket = client.get_bucket(bucket_name)

### **Upload file to bucket**

In [None]:
#blob_exists = bucket.blob(data_path).exists() # NOTE: we can use this to see if file with the same name exists before we upload a new file

data_blob = bucket.blob(data_path) #Get the blob for where we want to store our file

data_blob.upload_from_string(data.to_csv(sep='\t', encoding='utf-8', index=False), 'text/csv') # Upload our csv file to the blob on google cloud storage

### **Read file from bucket**

In [None]:
data_blob_read = bucket.blob(data_path)

In [None]:
data_str = data_blob_read.download_as_text() # Read file from our blob

data_dowloaded = pd.read_csv(StringIO(data_str), sep='\t')

data_dowloaded.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


### **Copy file**

In [None]:
data_blob_copy = bucket.blob(data_path)

blob_copied = bucket.copy_blob(data_blob_copy, bucket, new_name='test-data/file-copy.csv')

### **Move file**

In [None]:
data_blob_move = bucket.blob('test-data/file-copy.csv')

blob_moved = bucket.copy_blob(data_blob_copy, bucket, new_name='test-data/file-moved.csv')

data_blob_move.delete() # Delete the old file. This is irreversible, so be cautious!

### **Renaming file**

In [None]:
data_blob_rename = bucket.blob('test-data/file-moved.csv')

blob_renamed = bucket.rename_blob(data_blob_rename, new_name='test-data/file-renamed.csv')

### **Delete file**

In [None]:
data_blob_delete = bucket.blob('test-data/file-renamed.csv')
data_blob_delete.delete() # Delete the old file. This is irreversible, so be cautious!

### **List files in bucket**

In [None]:
blobs = bucket.list_blobs(prefix='test-data')
for blob in blobs:
    print(blob.name)

test-data/online-retail.csv


### **Get size of file**

In [None]:
blob = bucket.get_blob('test-data/file.csv')
print('{bytes} bytes or {kilobytes} KB'.format(bytes=blob.size, kilobytes=blob.size/1000))

48030134 bytes or 48030.134 KB
