<a href="https://colab.research.google.com/github/junkyungauh/osa/blob/master/GCS_setup_guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Use [Cloud Storage FUSE](https://cloud.google.com/storage/docs/cloud-storage-fuse/overview) to mount a Cloud Storage location to access it directly as a local filesystem path.

In [1]:
# Authenticate.
from google.colab import auth
auth.authenticate_user()

# Install Cloud Storage FUSE.
!echo "deb https://packages.cloud.google.com/apt gcsfuse-`lsb_release -c -s` main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
!apt -qq update && apt -qq install gcsfuse

deb https://packages.cloud.google.com/apt gcsfuse-jammy main
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1022  100  1022    0     0   8148      0 --:--:-- --:--:-- --:--:--  8176
OK
52 packages can be upgraded. Run 'apt list --upgradable' to see them.
[1;33mW: [0mhttps://packages.cloud.google.com/apt/dists/gcsfuse-jammy/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.[0m
[1;33mW: [0mSkipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)[0m
The following NEW packages will be installed:
  gcsfuse
0 upgraded, 1 newly installed, 0 to remove and 52 not upgraded.
Need to get 14.6 MB of archives.
After this operation, 0 B of additional disk space will be use

Mount a Cloud Storage bucket or location

In [2]:
# Sample code
# No need to specify project that a bucket is in; bucket name is sufficient.
mount_path = "my-bucket"  # or a location like "my-bucket/path/to/mount"
local_path = f"/mnt/gs/{mount_path}"

!mkdir -p {local_path}
!gcsfuse --implicit-dirs {mount_path} {local_path}

{"timestamp":{"seconds":1737942139,"nanos":556544873},"severity":"INFO","message":"Start gcsfuse/2.8.0 (Go version go1.23.4) for app \"\" using mount point: /mnt/gs/my-bucket\n"}
{"timestamp":{"seconds":1737942139,"nanos":556600637},"severity":"INFO","message":"GCSFuse config","config":{"AppName":"","CacheDir":"","Debug":{"ExitOnInvariantViolation":false,"Fuse":false,"Gcs":false,"LogMutex":false},"EnableAtomicRenameObject":false,"EnableHns":true,"FileCache":{"CacheFileForRangeRead":false,"DownloadChunkSizeMb":50,"EnableCrc":false,"EnableODirect":false,"EnableParallelDownloads":false,"MaxParallelDownloads":16,"MaxSizeMb":-1,"ParallelDownloadsPerFile":16,"WriteBufferSize":4194304},"FileSystem":{"DirMode":"755","DisableParallelDirops":false,"FileMode":"644","FuseOptions":[],"Gid":-1,"HandleSigterm":true,"IgnoreInterrupts":true,"KernelListCacheTtlSecs":0,"PreconditionErrors":true,"RenameDirLimit":0,"TempDir":"","Uid":-1},"Foreground":false,"GcsAuth":{"AnonymousAccess":false,"KeyFile":"","R

In [3]:
# Mounting buckets in 'OSA Prediction' project
buckets_to_mount = ["osa_raw-data", "osa_processed-data"]
mount_path = "/mnt/gs"

for bucket in buckets_to_mount:
  local_path = f"{mount_path}/{bucket}"
  !mkdir -p {local_path}
  !gcsfuse --implicit-dirs {bucket} {local_path}

{"timestamp":{"seconds":1737942153,"nanos":7223024},"severity":"INFO","message":"Start gcsfuse/2.8.0 (Go version go1.23.4) for app \"\" using mount point: /mnt/gs/osa_raw-data\n"}
{"timestamp":{"seconds":1737942153,"nanos":7309809},"severity":"INFO","message":"GCSFuse config","config":{"AppName":"","CacheDir":"","Debug":{"ExitOnInvariantViolation":false,"Fuse":false,"Gcs":false,"LogMutex":false},"EnableAtomicRenameObject":false,"EnableHns":true,"FileCache":{"CacheFileForRangeRead":false,"DownloadChunkSizeMb":50,"EnableCrc":false,"EnableODirect":false,"EnableParallelDownloads":false,"MaxParallelDownloads":16,"MaxSizeMb":-1,"ParallelDownloadsPerFile":16,"WriteBufferSize":4194304},"FileSystem":{"DirMode":"755","DisableParallelDirops":false,"FileMode":"644","FuseOptions":[],"Gid":-1,"HandleSigterm":true,"IgnoreInterrupts":true,"KernelListCacheTtlSecs":0,"PreconditionErrors":true,"RenameDirLimit":0,"TempDir":"","Uid":-1},"Foreground":false,"GcsAuth":{"AnonymousAccess":false,"KeyFile":"","Re

In [4]:
# Able to access Cloud Storage bucket like a local path.
!ls /mnt/gs/osa_raw-data

pid100100.csv  pid190043.csv  pid333895.csv  pid432730.csv  pid539974.csv    pid624071.csv
pid100816.csv  pid199445.csv  pid334988.csv  pid439005.csv  pid542486-1.csv  pid627078.csv
pid102234.csv  pid208588.csv  pid349751.csv  pid442085.csv  pid542486.csv    pid630354.csv
pid103968.csv  pid215758.csv  pid350887.csv  pid445442.csv  pid543520.csv    pid631889.csv
pid104303.csv  pid219965.csv  pid367205.csv  pid450461.csv  pid545303.csv    pid635643.csv
pid107696.csv  pid224699.csv  pid369605.csv  pid450941.csv  pid555964.csv    pid637827.csv
pid109326.csv  pid234620.csv  pid370916.csv  pid456686.csv  pid557778.csv    pid638244.csv
pid109461.csv  pid248410.csv  pid383757.csv  pid457081.csv  pid561591.csv    pid639355.csv
pid112894.csv  pid253204.csv  pid391110.csv  pid457271.csv  pid564533-1.csv  pid641102.csv
pid119615.csv  pid261476.csv  pid391482.csv  pid464160.csv  pid564533.csv    pid641397.csv
pid124294.csv  pid272598.csv  pid392875.csv  pid469641.csv  pid566153.csv    pid644133.csv

Processing and uploading result to Cloud Storage

In [5]:
import pandas as pd

df = pd.read_csv("/mnt/gs/osa_raw-data/pid100100.csv")
df.head()

  df = pd.read_csv("/mnt/gs/osa_raw-data/pid100100.csv")


Unnamed: 0,Time Stamp,Sleep,Chin,Position,SpO2,Nasal Pressure,Heart Rate,Snore,Event1,Event2,Event3,Event4
0,[],[],[V],[?],[%],[cmH2O],[bpm],[cmH2O],,,,
1,10:16:13:7,,0.003285,-0.4269,,4.327,0,-0.003827,,,,
2,10:16:13:8,,0.003284,-0.4521,,4.327,0,0.0005935,,,,
3,10:16:13:9,,0.00328,-0.4598,,4.321,0,0.0006324,,,,
4,10:16:14:0,,0.003233,-0.498,,4.268,0,-0.001417,,,,


In [6]:
processed_df = df.head()
processed_df.to_csv("/mnt/gs/osa_processed-data/sample.csv", index=False)

In [7]:
# Check if processed file correctly uploaded to Cloud Storage
!ls /mnt/gs/osa_processed-data

patient-data-1_00.dta  sample.csv


References


* [Google Cloud Storage FUSE & colab integration](https://colab.research.google.com/notebooks/snippets/gcs.ipynb#scrollTo=ZWpIqYjsBJFn)
* [Local file, Drive, Cloud Storage colab integration](https://colab.research.google.com/notebooks/io.ipynb#scrollTo=S7c8WYyQdh5i)