# Example Project Notebook

## Imports

### *Library imports*

In [1]:
from google.cloud import storage
from io import StringIO

import os
import pandas as pd
import sys

### *Custom function imports from src*

In [2]:
sys.path.append(os.path.abspath('../src'))

from main import df_column_combos
from utils import all_column_combos

## Authentication

MUST run the following in cmd/bash to authenticate BEFORE executing any of the reads/writes:

```gcloud auth application-default login```

## Read Test

### *Data Import + Tidying*

In [3]:
client = storage.Client()
bucket = client.bucket('python_read_write_testing')
blob = bucket.blob('stineman_siblings.csv')
data = blob.download_as_text()
stineman_siblings = pd.read_csv(StringIO(data))

In [4]:
stineman_siblings

Unnamed: 0,name,birthdate,height_inches
0,Alexandra,2011-05-12 00:00:00+00:00,68.0
1,Nathan,2009-03-20 00:00:00+00:00,71.5
2,Sarah,2006-04-08 00:00:00+00:00,71.0
3,Nicholas,2000-08-27 00:00:00+00:00,74.0
4,Andrew,1997-10-21 00:00:00+00:00,76.75


In [5]:
stineman_siblings['birthdate'] = pd.to_datetime(stineman_siblings['birthdate'], utc = True)
stineman_siblings

Unnamed: 0,name,birthdate,height_inches
0,Alexandra,2011-05-12 00:00:00+00:00,68.0
1,Nathan,2009-03-20 00:00:00+00:00,71.5
2,Sarah,2006-04-08 00:00:00+00:00,71.0
3,Nicholas,2000-08-27 00:00:00+00:00,74.0
4,Andrew,1997-10-21 00:00:00+00:00,76.75


## Analysis

### *Unique Column Combos*

In [6]:
df_column_combos(stineman_siblings)

[(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]

In [7]:
all_column_combos(stineman_siblings)

[('name',),
 ('birthdate',),
 ('height_inches',),
 ('name', 'birthdate'),
 ('name', 'height_inches'),
 ('birthdate', 'height_inches'),
 ('name', 'birthdate', 'height_inches')]

*Note how we can import + use functions from both utils.py (independent) and main.py (imports from utils)*

## Write Test

In [9]:
blob = bucket.blob('stineman_siblings_3.csv')

if blob.exists():
    test = ''
    while test not in ['y', 'n']:
        test = input("You are attempting an overwrite. Proceed? Y/n").lower()

if test == 'n':
        print('Write canceled.')
else: 
    csv_buffer = StringIO()
    stineman_siblings.to_csv(csv_buffer, index = False)
    blob.upload_from_string(csv_buffer.getvalue(), content_type = 'text/csv')
    print('Write complete.')

Write canceled.


Experimented with reads/writes at https://console.cloud.google.com/storage/browser/python_read_write_testing?inv=1&invt=Ab0Ymw&project=user-insights-scores

Key findings:

1. As owner of the project, I can do whatever I want in terms of reading/writing/overwriting.
2. For my personal gmail account, I need several specific permissions:
    1. I need a project-level role (Service Usage Consumer) to use project resources
    2. I need a bucket-level role (Storage Object Viewer) to read objects
    3. I need a bucket-level role (Storage Object Creator) to write new objects
    4. I need a bucket-level permission to allow for overwriting of existing objects (Storage Object Admin - this is inclusive of the Viewer and Creator roles, so those roles can be dropped as duplicates)

Need to leverage the above research to inform the IAM scheme I use to share my projects within Gamebeast.  Perhaps a certain group gets read/write access, while I maintain read/write/overwrite access?