## Project to Upload Files to GCS using Python

As part of the series of lectures we will see how to upload files to GCS using Python. We will be using `glob`, `os`, `storage` from `google.cloud` to build the application logic.

Here are the design details.
* First, we need to get list of file names from the local file system to upload.
* We need to build `blob` object for each file.
* We can use `upload_from_filename` on top of blob object to upload file as blob in GCS.
* We will use metadata or data driven development approach to take care uploading all the files related to retail to GCS.
* Blobs will be named using file names as reference.

In [1]:
!gsutil rm -r gs://khazretail/pythondemo

Removing gs://khazretail/pythondemo/retail_db/orders/part-00000#1710438903634384...
/ [1 objects]                                                                   

Operation completed over 1 objects.                                              


In [33]:
!gsutil ls gs://khazretail/

gs://khazretail/pythondemo/
gs://khazretail/retail_db/
gs://khazretail/retail_db_parquet/


In [34]:
import glob

In [35]:
src_base_dir = '../../data/retail_db'

In [36]:
items = glob.glob(f'{src_base_dir}/**', recursive=True)

In [37]:
items

['../../data/retail_db\\',
 '../../data/retail_db\\categories',
 '../../data/retail_db\\categories\\part-00000',
 '../../data/retail_db\\create_db_tables_pg.sql',
 '../../data/retail_db\\customers',
 '../../data/retail_db\\customers\\part-00000',
 '../../data/retail_db\\departments',
 '../../data/retail_db\\departments\\part-00000',
 '../../data/retail_db\\load_db_tables_pg.sql',
 '../../data/retail_db\\orders',
 '../../data/retail_db\\orders\\part-00000',
 '../../data/retail_db\\order_items',
 '../../data/retail_db\\order_items\\part-00000',
 '../../data/retail_db\\products',
 '../../data/retail_db\\products\\part-00000',
 '../../data/retail_db\\schemas.json']

In [41]:
item = items[2]

In [42]:
item

'../../data/retail_db\\categories\\part-00000'

In [43]:
import os
os.path.isfile(item)

True

In [44]:
files = filter(lambda item: os.path.isfile(item), items)

In [45]:
list(files)

['../../data/retail_db\\categories\\part-00000',
 '../../data/retail_db\\create_db_tables_pg.sql',
 '../../data/retail_db\\customers\\part-00000',
 '../../data/retail_db\\departments\\part-00000',
 '../../data/retail_db\\load_db_tables_pg.sql',
 '../../data/retail_db\\orders\\part-00000',
 '../../data/retail_db\\order_items\\part-00000',
 '../../data/retail_db\\products\\part-00000',
 '../../data/retail_db\\schemas.json']

In [12]:
files = list(filter(lambda item: os.path.isfile(item), items))
file = files[0]

In [13]:
file

'../../data/retail_db\\categories\\part-00000'

In [21]:
(file.split('/')[3]).split('\\')

['retail_db', 'categories', 'part-00000']

In [14]:
file.split('/')[3:]

['retail_db\\categories\\part-00000']

In [22]:
'/'.join((file.split('/')[3]).split('\\'))

'retail_db/categories/part-00000'

In [23]:
tgt_base_dir = 'pythondemo'

In [24]:
from google.cloud import storage    

In [25]:
gsclient = storage.Client()

In [26]:
files = filter(lambda item: os.path.isfile(item), items)
bucket = gsclient.get_bucket('khazretail')
for file in files:
    print(f'Uploading file {file}')
    blob_suffix = '/'.join((file.split('/')[3]).split('\\'))
    blob_name = f'{tgt_base_dir}/{blob_suffix}'
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(file)

Uploading file ../../data/retail_db\categories\part-00000
Uploading file ../../data/retail_db\create_db_tables_pg.sql
Uploading file ../../data/retail_db\customers\part-00000
Uploading file ../../data/retail_db\departments\part-00000
Uploading file ../../data/retail_db\load_db_tables_pg.sql
Uploading file ../../data/retail_db\orders\part-00000
Uploading file ../../data/retail_db\order_items\part-00000
Uploading file ../../data/retail_db\products\part-00000
Uploading file ../../data/retail_db\schemas.json


In [27]:
!gsutil ls -r gs://khazretail/pythondemo

gs://khazretail/pythondemo/:

gs://khazretail/pythondemo/retail_db/:
gs://khazretail/pythondemo/retail_db/create_db_tables_pg.sql
gs://khazretail/pythondemo/retail_db/load_db_tables_pg.sql
gs://khazretail/pythondemo/retail_db/schemas.json

gs://khazretail/pythondemo/retail_db/categories/:
gs://khazretail/pythondemo/retail_db/categories/part-00000

gs://khazretail/pythondemo/retail_db/customers/:
gs://khazretail/pythondemo/retail_db/customers/part-00000

gs://khazretail/pythondemo/retail_db/departments/:
gs://khazretail/pythondemo/retail_db/departments/part-00000

gs://khazretail/pythondemo/retail_db/order_items/:
gs://khazretail/pythondemo/retail_db/order_items/part-00000

gs://khazretail/pythondemo/retail_db/orders/:
gs://khazretail/pythondemo/retail_db/orders/part-00000

gs://khazretail/pythondemo/retail_db/products/:
gs://khazretail/pythondemo/retail_db/products/part-00000


In [28]:
gsclient.list_blobs?

[1;31mSignature:[0m
[0mgsclient[0m[1;33m.[0m[0mlist_blobs[0m[1;33m([0m[1;33m
[0m    [0mbucket_or_name[0m[1;33m,[0m[1;33m
[0m    [0mmax_results[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpage_token[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprefix[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mdelimiter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mstart_offset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mend_offset[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0minclude_trailing_delimiter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mversions[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mprojection[0m[1;33m=[0m[1;34m'noAcl'[0m[1;33m,[0m[1;33m
[0m    [0mfields[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpage_size[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mtimeout[0m[1;33m=[0m[1;36m60[0m

In [29]:
gsclient.list_blobs(
    'khazretail',
    prefix='pythondemo'
)

<google.api_core.page_iterator.HTTPIterator at 0x19224630640>

In [30]:
blobs = list(gsclient.list_blobs(
    'khazretail',
    prefix='pythondemo'
))

In [31]:
blobs

[<Blob: khazretail, pythondemo/retail_db/categories/part-00000, 1710581150103091>,
 <Blob: khazretail, pythondemo/retail_db/create_db_tables_pg.sql, 1710581150702930>,
 <Blob: khazretail, pythondemo/retail_db/customers/part-00000, 1710581152931273>,
 <Blob: khazretail, pythondemo/retail_db/departments/part-00000, 1710581153476505>,
 <Blob: khazretail, pythondemo/retail_db/load_db_tables_pg.sql, 1710581157031672>,
 <Blob: khazretail, pythondemo/retail_db/order_items/part-00000, 1710581161144620>,
 <Blob: khazretail, pythondemo/retail_db/orders/part-00000, 1710581158783764>,
 <Blob: khazretail, pythondemo/retail_db/products/part-00000, 1710581162084409>,
 <Blob: khazretail, pythondemo/retail_db/schemas.json, 1710581162728935>]

In [32]:
len(blobs)

9