## Introduction to Preliminary Global Extraction, Transformation, and Loading (ETL) Process

In the following notebook, our primary objective is to initiate the Extraction, Transformation, and Loading (ETL) process by extracting data from a client-provided drive. The data extracted will serve as the foundation for subsequent analysis and manipulation, ultimately leading to the creation of the final product as per the client's specifications. To achieve this, we leverage a variety of libraries designed to facilitate the extraction of a refined and organized database.

It is important to emphasize our commitment to conciseness and simplicity in code implementation. We aim to achieve this by minimizing the number of action cells, ensuring clarity and efficiency in our ETL process. In instances where code interactions are necessary, detailed comments will be provided to enhance code comprehension.

The structure of our code follows a modular approach, reminiscent of the Model-View-Controller (MVC) pattern. The sections include:

1. **Define Data Extraction Function:** This section encapsulates the functionality responsible for extracting data from the client's drive and transporting it to the lakehouse.

2. **Load Utils File:** Here, we incorporate a dedicated Utils file containing essential functions for seamless library integration and specific actions required for our ETL process.

3. **View Cell:** The View Cell serves as the interface, facilitating the visualization and interpretation of our processed data.

In alignment with our distinctive approach, we term this model the Library-Action-View (LAV) paradigm, embodying a systematic and efficient framework for executing the ETL process.

### Install necessary packages that are not found by default

In [None]:
pip install gdown google-api-python-client google-auth google-auth-oauthlib google-auth-httplib2

StatementMeta(, , , Waiting, )

Collecting gdown
  Downloading gdown-5.1.0-py3-none-any.whl (17 kB)
Collecting google-api-python-client
  Downloading google_api_python_client-2.116.0-py2.py3-none-any.whl (12.0 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/12.0 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/12.0 MB[0m [31m13.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/12.0 MB[0m [31m78.4 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m12.0/12.0 MB[0m [31m288.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m157.3 MB/s[0m eta [36m0:00:00[0m
Collecting google-auth-httplib2


  Downloading google_auth_httplib2-0.2.0-py2.py3-none-any.whl (9.3 kB)
Collecting httplib2<1.dev0,>=0.15.0 (from google-api-python-client)
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/96.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.9/96.9 kB[0m [31m49.4 MB/s[0m eta [36m0:00:00[0m
[?25h

Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5 (from google-api-python-client)
  Downloading google_api_core-2.16.2-py3-none-any.whl (135 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/135.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.2/135.2 kB[0m [31m54.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting uritemplate<5,>=3.0.1 (from google-api-python-client)
  Downloading uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)


Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5->google-api-python-client)
  Downloading googleapis_common_protos-1.62.0-py2.py3-none-any.whl (228 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/228.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m228.7/228.7 kB[0m [31m93.9 MB/s[0m eta [36m0:00:00[0m




Installing collected packages: uritemplate, httplib2, googleapis-common-protos, google-auth-httplib2, google-api-core, gdown, google-api-python-client
Successfully installed gdown-5.1.0 google-api-core-2.16.2 google-api-python-client-2.116.0 google-auth-httplib2-0.2.0 googleapis-common-protos-1.62.0 httplib2-0.22.0 uritemplate-4.1.1
Note: you may need to restart the kernel to use updated packages.


### Import the necessary libraries to extract data.

In [1]:
import os
import io
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from googleapiclient.errors import HttpError
from google.oauth2 import service_account
from google.auth.transport.requests import Request
import builtin.utils as ut

StatementMeta(, , , Waiting, )

## 1. Extract data 

We define a function that allows us to extract the data provided by the client from the different folders of the Google Drive

In [None]:
def extract_datasets(folder_id, destination_path, credentials_path):
    """
    Downloads all files and folders from a specified Google Drive folder to a local destination.

    Args:
        folder_id (str): The ID of the Google Drive folder to download.
        destination_path (str): The local directory path where files and folders will be saved.
        credentials_path (str): The path to the JSON file containing Google Drive API credentials.
    """

    # Google Drive API scopes
    SCOPES = ['https://www.googleapis.com/auth/drive.readonly']

    # Create path
    if not os.path.exists(destination_path):
        os.makedirs(destination_path)

    # Initialize credentials
    creds = None
    if os.path.exists(credentials_path):
        creds = service_account.Credentials.from_service_account_file(
            credentials_path, scopes=SCOPES)

    # Refresh credentials if expired
    if not creds or not creds.valid:
        if creds and creds.expired:
            creds.refresh(Request())
        else:
            flow = service_account.Credentials.from_service_account_file(
                credentials_path, scopes=SCOPES)
            creds = flow

    # Build Google Drive API service
    drive_service = build('drive', 'v3', credentials=creds)

    def download_file(file_id, file_name, parent_path):
        destination_file_path = os.path.join(parent_path, file_name)

        # Create request to get file media
        request = drive_service.files().get_media(fileId=file_id)
        fh = io.FileIO(destination_file_path, 'wb')
        downloader = MediaIoBaseDownload(fh, request)

        # Download file in chunks
        done = False
        while not done:
            status, done = downloader.next_chunk()
            print(f"Download {file_name}: {int(status.progress() * 100)}%")

    def download_folder(folder_id, parent_path):
        results = drive_service.files().list(q=f"'{folder_id}' in parents",
                                             fields="files(id, name, mimeType)").execute()
        items = results.get('files', [])

        for item in items:
            item_id = item['id']
            item_name = item['name']
            item_mime_type = item['mimeType']

            item_path = os.path.join(parent_path, item_name)

            if item_mime_type == 'application/vnd.google-apps.folder':
                # It's a folder, create the local directory and download its content
                os.makedirs(item_path, exist_ok=True)
                download_folder(item_id, item_path)
            else:
                # It's a file, download it
                download_file(item_id, item_name, parent_path)

    # Call the initial function to download from the specified folder
    download_folder(folder_id, destination_path)


StatementMeta(, , , Waiting, )

## 1.1 Extract data from Yelp folder

In [None]:
folder_id = '1TI-SsMnZsNP6t930olEEWbBQdo_yuIZF'
destination_path = '/lakehouse/default/Files/original/Yelp/'
credentials_path ='/lakehouse/default/Files/gkey/credentials.json'

# Call the function with the folder ID, destination folder, and credentials JSON file path
extract_datasets(folder_id, destination_path, credentials_path)

StatementMeta(, , , Waiting, )

Download business.pkl: 90%
Download business.pkl: 100%


Download user.parquet: 3%
Download user.parquet: 7%


Download user.parquet: 10%


Download user.parquet: 14%
Download user.parquet: 17%


Download user.parquet: 21%
Download user.parquet: 24%


Download user.parquet: 28%
Download user.parquet: 31%


Download user.parquet: 35%
Download user.parquet: 39%


Download user.parquet: 42%
Download user.parquet: 46%


Download user.parquet: 49%
Download user.parquet: 53%


Download user.parquet: 56%
Download user.parquet: 60%


Download user.parquet: 63%
Download user.parquet: 67%


Download user.parquet: 71%
Download user.parquet: 74%


Download user.parquet: 78%
Download user.parquet: 81%
Download user.parquet: 85%


Download user.parquet: 88%
Download user.parquet: 92%


Download user.parquet: 95%
Download user.parquet: 99%
Download user.parquet: 100%


Download review.json: 1%
Download review.json: 3%


Download review.json: 5%
Download review.json: 7%


Download review.json: 9%


Download review.json: 11%
Download review.json: 13%


Download review.json: 15%
Download review.json: 17%


Download review.json: 19%
Download review.json: 21%


Download review.json: 23%
Download review.json: 25%


Download review.json: 27%
Download review.json: 29%
Download review.json: 31%


Download review.json: 33%
Download review.json: 35%
Download review.json: 37%


Download review.json: 39%
Download review.json: 41%


Download review.json: 43%
Download review.json: 45%


Download review.json: 47%
Download review.json: 49%
Download review.json: 51%


Download review.json: 52%
Download review.json: 54%


Download review.json: 56%
Download review.json: 58%


Download review.json: 60%
Download review.json: 62%
Download review.json: 64%


Download review.json: 66%
Download review.json: 68%


Download review.json: 70%
Download review.json: 72%
Download review.json: 74%


Download review.json: 76%
Download review.json: 78%


Download review.json: 80%
Download review.json: 82%


Download review.json: 84%
Download review.json: 86%
Download review.json: 88%


Download review.json: 90%
Download review.json: 92%


Download review.json: 94%
Download review.json: 96%


Download review.json: 98%
Download review.json: 100%


Download tip.json: 58%


Download tip.json: 100%
Download checkin.json: 36%


Download checkin.json: 73%
Download checkin.json: 100%


## 1.2 Extract data from 'metadata-sitios' folder

In [None]:
folder_id = '1olnuKLjT8W2QnCUUwh8uDuTTKVZyxQ0Z'
destination_path = '/lakehouse/default/Files/original/metadata-sitios/'
credentials_path ='/lakehouse/default/Files/gkey/credentials.json'

# Call the function with the folder ID, destination folder, and credentials JSON file path
extract_datasets(folder_id, destination_path, credentials_path)

StatementMeta(, , , Waiting, )

Download 11.json: 36%


Download 11.json: 72%
Download 11.json: 100%


Download 10.json: 37%


Download 10.json: 74%


Download 10.json: 100%


Download 9.json: 37%


Download 9.json: 75%
Download 9.json: 100%


Download 8.json: 36%


Download 8.json: 73%


Download 8.json: 100%


Download 7.json: 38%


Download 7.json: 76%
Download 7.json: 100%


Download 6.json: 39%


Download 6.json: 78%
Download 6.json: 100%


Download 5.json: 39%


Download 5.json: 79%
Download 5.json: 100%


Download 4.json: 40%


Download 4.json: 80%


Download 4.json: 100%


Download 3.json: 40%


Download 3.json: 80%
Download 3.json: 100%


Download 2.json: 40%


Download 2.json: 81%


Download 2.json: 100%


Download 1.json: 40%


Download 1.json: 81%


Download 1.json: 100%


## 1.3 Extract data from 'review-estados' folder

In [None]:
folder_id = '19QNXr_BcqekFNFNYlKd0kcTXJ0Zg7lI6'
destination_path = '/lakehouse/default/Files/original/reviews-estados/'
credentials_path ='/lakehouse/default/Files/gkey/credentials.json'

# Call the function with the folder ID, destination folder, and credentials JSON file path
extract_datasets(folder_id, destination_path, credentials_path)

StatementMeta(, , , Waiting, )

Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 18.json: 100%


Download 17.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%
Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 6.json: 100%


Download 7.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 14.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%
Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 18.json: 100%


Download 17.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 10.json: 100%


Download 11.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 6.json: 100%
Download 7.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 19.json: 100%


Download 18.json: 100%


Download 17.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%
Download 12.json: 100%


Download 11.json: 100%
Download 10.json: 100%


Download 9.json: 100%
Download 8.json: 100%


Download 7.json: 100%
Download 6.json: 100%
Download 5.json: 100%


Download 4.json: 100%
Download 3.json: 100%
Download 2.json: 100%


Download 1.json: 100%
Download 18.json: 100%


Download 17.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 18.json: 100%


Download 17.json: 100%


Download 16.json: 100%


Download 15.json: 100%


Download 14.json: 100%


Download 13.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


Download 12.json: 100%


Download 11.json: 100%


Download 10.json: 100%


Download 9.json: 100%


Download 8.json: 100%


Download 7.json: 100%


Download 6.json: 100%


Download 5.json: 100%


Download 4.json: 100%


Download 3.json: 100%


Download 2.json: 100%


Download 1.json: 100%


## 2. Views

## 2.1. Data Acquisition

In this section, we focus on transforming the data from the original format (.json or .pickle) to a pandas DataFrame format to facilitate observation, elimination, transformation, and other processes that allow us a better visualization and optimization of the data.

The processes to be applied are:

* Transformation from .json or .pickle to DataFrame

* Resetting of indices

* Casting of data to corresponding formats

## 2.1.1 Data Acquisition: Google Maps -- Review Estados

In [2]:
dicc = ut.json_to_dataframe('/lakehouse/default/Files/original/reviews-estados')
ut.dataframe_to_parquet(dicc,'Review_estados_parquet')

StatementMeta(, , , Waiting, )

Data from review-Washington successfully loaded


Data from review-Michigan successfully loaded


Data from review-Alabama successfully loaded


Data from review-Kansas successfully loaded


Data from review-Nebraska successfully loaded


Data from review-Mississippi successfully loaded


Data from review-North_Dakota successfully loaded


Data from review-Delaware successfully loaded


Data from review-Georgia successfully loaded


Data from review-Colorado successfully loaded


Data from review-Montana successfully loaded


Data from review-South_Carolina successfully loaded


Data from review-West_Virginia successfully loaded


Data from review-South_Dakota successfully loaded


Data from review-Massachusetts successfully loaded


Data from review-Louisiana successfully loaded


Data from review-Wisconsin successfully loaded


Data from review-Tennessee successfully loaded


Data from review-New_York successfully loaded


Data from review-Maryland successfully loaded


Data from review-Arkansas successfully loaded


Data from review-Illinois successfully loaded


Data from review-Alaska successfully loaded


Data from review-Nevada successfully loaded


Data from review-Hawaii successfully loaded


Data from review-Minnesota successfully loaded


Data from review-Pennsylvania successfully loaded


Data from review-Arizona successfully loaded


Data from review-Missouri successfully loaded


Data from review-Wyoming successfully loaded


Data from review-District_of_Columbia successfully loaded


Data from review-Utah successfully loaded


Data from review-New_Mexico successfully loaded


Data from review-Texas successfully loaded


Data from review-Florida successfully loaded


Data from review-Ohio successfully loaded


Data from review-Oregon successfully loaded


Data from review-Vermont successfully loaded


Data from review-Kentucky successfully loaded


Data from review-Connecticut successfully loaded


Data from review-Maine successfully loaded


Data from review-North_Carolina successfully loaded


Data from review-Oklahoma successfully loaded


Data from review-Idaho successfully loaded


Data from review-New_Jersey successfully loaded


Data from review-Iowa successfully loaded


Data from review-New_Hampshire successfully loaded


Data from review-Rhode_Island successfully loaded


Data from review-Virginia successfully loaded


Data from review-California successfully loaded


Data from review-Indiana successfully loaded


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet
Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet
Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


Dataframes saved successfully in df_database/Review_estados_parquet


## 2.1.2 Data Acquisition: Google Maps -- Sitios

In [3]:
dicc = ut.json_to_dataframe('/lakehouse/default/Files/original/metadata-sitios')
ut.dataframe_to_parquet(dicc,'Metadata_sitios_parquet')

StatementMeta(, , , Waiting, )

Loading File JSON = 1.json


Loading File JSON = 10.json


Loading File JSON = 11.json


Loading File JSON = 2.json


Loading File JSON = 3.json


Loading File JSON = 4.json


Loading File JSON = 5.json


Loading File JSON = 6.json


Loading File JSON = 7.json


Loading File JSON = 8.json


Loading File JSON = 9.json


Dataframes saved successfully in df_database/Metadata_sitios_parquet


## 2.1.3 Data Acquisition: Yelp

In [4]:
dicc = ut.others_to_dataframe('/lakehouse/default/Files/original/Yelp')
ut.dataframe_to_parquet(dicc,'Yelp_parquet')

StatementMeta(, , , Waiting, )

Loading File Pickle = business.pkl
Loading File JSON = checkin.json


Loading File JSON = review.json


Loading File JSON = tip.json


loading File Parquet = user


Dataframes saved successfully in df_database/Yelp_parquet


Dataframes saved successfully in df_database/Yelp_parquet
Dataframes saved successfully in df_database/Yelp_parquet
Dataframes saved successfully in df_database/Yelp_parquet
Dataframes saved successfully in df_database/Yelp_parquet
