# Upload Data Set to Google Cloud Storage Bucket Using Access Key

**Link To Data**: https://data.cityofnewyork.us/City-Government/NYC-Citywide-Annualized-Calendar-Sales-Update/w2pb-icbu

**API end-point**: https://data.cityofnewyork.us/resource/w2pb-icbu.json

**Data Dictionary**: https://data.cityofnewyork.us/api/views/w2pb-icbu/files/8ed811b4-8238-4b5e-9acc-1e33d8705498?download=true&filename=Annualized_Calendar_Sales_Update%20Data_Dictionary.xlsx

**Cleaned Data Dictionary**: https://docs.google.com/spreadsheets/d/17XyGmnw2fZuTMCWVKB1XiWGHQuwqWOidm0w80lbIyjE/edit?usp=sharing

**IMPORTANT: This data set is 121.3 MB. Once downloaded, please keep the file in the same directory as this jupyter notebook file, so that the .csv file can be uploaded to the Google Cloud correctly. BigQuery will not accept local files larger than 10 MB, so it's important to upload the file to cloud first before moving onto the Data Warehouse**

# Creating an Access Key on Google Cloud Storage

1. Access Google Cloud Console:
2. Navigate to IAM & Admin > Service Accounts
3. Select Service Account, click the three dots on the right-hand side, and click "Manage Keys"
4. Click "Add Key"
5. Select "JSON" for Key Type
6. Click "Create" button

**NOTE: A .json file should be downloaded and should be found in your /downloads file. Rename the .json file to "GOOGLE_CLOUD_ACCESSKEY.json".**

# Import the Python 'os' module

In [None]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'GOOGLE_CLOUD_ACCESSKEY.json'

# Create a Google Cloud Storage Bucket 

In [None]:
from google.cloud import storage
from google.cloud.storage import Bucket

# Create a new bucket in specified project:
def create_bucket(bucket_name, project_id):
    storage_client = storage.Client(project=project_id)
    bucket = storage_client.bucket(bucket_name)
    new_bucket = storage_client.create_bucket(bucket, project=project_id)
    print(f"Bucket {new_bucket.name} created")

# Create a new bucket and replace XX with your initials:
bucket_name = 'cis-4400-project-XX'
create_bucket(bucket_name, project_id)

# Upload the Data Set to your new Storage Bucket

In [None]:
from google.cloud import storage
from google.cloud.storage import Bucket

# Upload source file to the bucket
def upload_blob(bucket_name, source_file_name, destination_blob_name):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)

    print(f"File {source_file_name} uploaded to {destination_blob_name} in bucket {bucket_name}.")
    

bucket_name = 'cis-4400-project-XX'
source_file_name = 'NYC_Citywide_Annualized_Calendar_Sales_Update_20231203.csv'
destination_blob_name = 'NYC_sales.csv'

# Upload the CSV file to the bucket
upload_blob(bucket_name, source_file_name, destination_blob_name)