# Upload Data Set to Google Cloud Storage Bucket Using Access Key

**Link To Data**: https://data.cityofnewyork.us/City-Government/NYC-Citywide-Annualized-Calendar-Sales-Update/w2pb-icbu

**API end-point**: https://data.cityofnewyork.us/resource/w2pb-icbu.json

**Data Dictionary**: https://data.cityofnewyork.us/api/views/w2pb-icbu/files/8ed811b4-8238-4b5e-9acc-1e33d8705498?download=true&filename=Annualized_Calendar_Sales_Update%20Data_Dictionary.xlsx

**Cleaned Data Dictionary**: https://docs.google.com/spreadsheets/d/17XyGmnw2fZuTMCWVKB1XiWGHQuwqWOidm0w80lbIyjE/edit?usp=sharing

**IMPORTANT: This data set is 121.3 MB. Once downloaded, please keep the file in the same directory as this jupyter notebook file, so that the .csv file can be uploaded to the Google Cloud correctly. BigQuery will not accept local files larger than 10 MB, so it's important to upload the file to cloud first before moving onto the Data Warehouse**

# Grant Required Permissions to Your Google Cloud Service Account

1. Go to the Google Cloud Console
2. Navigate to the "IAM & Admin" section
3. Locate your service account that you want to use
4. Click "edit principle"
5. Under the "Row" field, select "Storage Admin" from the drop-down menu

**Note: The "Storage Admin" role grants full control of buckets and objects. This step is important for the code to be able to create a bucket**

# Creating an Access Key on Google Cloud Storage

1. Access Google Cloud Console:
2. Navigate to IAM & Admin > Service Accounts
3. Select Service Account, click the three dots on the right-hand side, and click "Manage Keys"
4. Click "Add Key"
5. Select "JSON" for Key Type
6. Click "Create" button

**NOTE: A .json file should be downloaded and should be found in your /downloads file. Rename the .json file to "GOOGLE_CLOUD_ACCESSKEY.json".**

# Install the google-cloud-storage library

In [None]:
pip install google-cloud-storage

# Import the Python 'os' module

In [None]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'GOOGLE_CLOUD_ACCESSKEY.json'

# Create a Google Cloud Storage Bucket 
**Note: Google Cloud Storage bucket names must follow the following rules:**
1. Be between 3 and 63 characters in length
2. Start and end with a lowercase letter or number
3. Contain only lowercase letters, numbers, hyphens (-), and cannot start or end with a hyphen

In [None]:
from google.cloud import storage

# Create a function that creates a new bucket in specified project id:
def create_bucket(bucket_name, project_id):
    storage_client = storage.Client(project=project_id)
    
    # Create a new bucket
    bucket = storage_client.create_bucket(bucket_name)
    
    print(f"Bucket {bucket.name} created")

# Replace 'your-project-id' with your actual Google Cloud project ID:
project_id = 'your-project-id'

# Create a new bucket and replace xx with your initials:
# (Please note the bucket naming guidelines above)
bucket_name = f'cis-4400-project-xx'
create_bucket(bucket_name, project_id)


# Upload the Data Set to your new Storage Bucket

In [None]:
from google.cloud import storage
from google.cloud.storage import Bucket

# Create a function that uploads source file to the bucket:
def upload_blob(bucket_name, source_file_name, destination_blob_name):
    # Creates a storage client
    storage_client = storage.Client()
    # Specifies the bucket by name
    bucket = storage_client.bucket(bucket_name)
    # Specifies the destination blob within the bucket
    blob = bucket.blob(destination_blob_name)
    # Uploads the file to the specified blob
    blob.upload_from_filename(source_file_name)

    print(f"File {source_file_name} uploaded as {destination_blob_name} in bucket: {bucket_name}.")
    
# Make sure that the source file is in the same directory as your script
# Replace xx with your actual initials
bucket_name = 'cis-4400-project-xx'
# This is the default file name that was downloaded from the Link to Data found at the top of the jupyter notebook
source_file_name = 'NYC_Citywide_Annualized_Calendar_Sales_Update_20231203.csv'
# Specify the desired name for the file in the bucket
destination_blob_name = 'NYC_sales.csv'

# Upload the CSV file to the bucket
upload_blob(bucket_name, source_file_name, destination_blob_name)