# Explore Raw Data

* In this Notebook we are exploring the data loaded in BRONZE layer.
* clean and filter the data with specific columns we require and move the data to silver layer.
* Creating BQ remote embedding model and generative model for data enrichment using GenAI functions in Bigquery.


<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/smvinodkumar910/market-mirror/blob/main/backend/02_explore_data.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2Fsmvinodkumar910%2Fmarket-mirror%2Frefs%2Fheads%2Fmain%2Fbackend%2F02_explore_data.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/smvinodkumar910/market-mirror/refs/heads/main/backend/02_explore_data.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://github.com/smvinodkumar910/market-mirror/blob/main/backend/02_explore_data.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/475654/github-color.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment.


In [None]:
import sys

if "google.colab" in sys.modules:
    # Support for third party widgets
    from google.colab import auth, output

    auth.authenticate_user()
    output.enable_custom_widget_manager()

### Setting-up Environment

* Please change the variables `PROJECT_ID`, `BUCKET_NAME`, `LOCATION` details to your own project as required.

In [1]:
import os

PROJECT_ID = "market-mirror-dev"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
BUCKET_NAME = "marke-mirror-dev-data"  # @param {type: "string", placeholder: "[your-bucket-name]", isTemplate: true}
LOCATION = "US"  # @param {type: "string", placeholder: "[your-region]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

if not LOCATION or LOCATION == "[your-region]":
    LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "US")


In [2]:
os.environ['GOOGLE_CLOUD_PROJECT'] = PROJECT_ID
os.environ['GOOGLE_CLOUD_REGION'] = LOCATION

In [3]:
BQ_BRONZE_DATASET = "APP_MARKET_BRONZE" # @param {type: "string", placeholder: "[bronze-dataset]", isTemplate: true}
BQ_SILVER_DATASET = "APP_MARKET_SILVER" # @param {type: "string", placeholder: "[silver-dataset]", isTemplate: true}
BQ_GOLD_DATASET = "APP_MARKET_GOLD" # @param {type: "string", placeholder: "[gold-dataset]", isTemplate: true}

### Objective

#### Data Definitions



**We have loaded 6 Tables in BQ as Follows:**

**Review Tables:**

1. `{PROJECT_ID}.{BQ_BRONZE_DATASET}.google_play_reviews`

    * This table contains user reviews of various Apps from Google Play store.

    * No. of columns : 9
    * No. of Records : 4888

2. `{PROJECT_ID}.{BQ_BRONZE_DATASET}.googleplaystore_user_reviews`

    * This table contains user reviews of various Apps in Google Playstore with sentiment information.

    * No. of columns : 5
    * No. of Records : 64295


**Product Information Tables:**

2. `{PROJECT_ID}.{BQ_BRONZE_DATASET}.cleanapp`

    * This table having all app related information from GooglePlaystore. Information include category, genre, app ratings, number of reviews, number of downloads etc.

    * No. Of Columns : 29
    * No. of Records : 11593

3. `{PROJECT_ID}.{BQ_BRONZE_DATASET}.AppleStore`

    * This table having all app related information from Apple Store. Information include category, genre, app ratings, size, price etc.

    * No. Of Columns : 17
    * No. of Records : 7197

4. `{PROJECT_ID}.{BQ_BRONZE_DATASET}.appleStore_description`

    * This table having elaborated description in various languages about the Apps available in Apple Store.

    * No. Of Columns : 5
    * No. of Records : 7197

5. `{PROJECT_ID}.{BQ_BRONZE_DATASET}.windows_store`

    * This table having elaborated description about the Apps available in Windows Store.

    * No. Of Columns : 9
    * No. of Records : 3960


#### Data Engineering

**Review Tables**

* We have two review tables `google_play_reviews` and `googleplaystore_user_reviews` in BRONZE layer.

* We explore the data, clean, keep only specific columns.

* Union both tables in to a single table and write to SILVER layer.

* Then use Bigquery GenAI capabilities to generate Sentiment on reviews.

* Utilize GenAI capabilities to generate response to each user.


**Product Information Tables**

* Combine the `AppleStore` and `appleStore_description` tables into a single table.

* We can see the `appleStore_description` is having various languages. Convert them into a single language.

* Generate Embeddings for the App description columns to enable vector search.

* Create the `cleannapp` and `windows_store` as separate tables with necessary columns.

#### Creating Vertex AI Remote Models

* In this section we are creating a CONNECTION object in BQ and then creating REMOTE MODELS.

* Below command creates connection to Vertex AI models in the name of `vertex-remote-models` in Bigquery.

In [4]:
!bq mk --connection --location=$GOOGLE_CLOUD_REGION --project_id=$GOOGLE_CLOUD_PROJECT \
    --connection_type=CLOUD_RESOURCE vertex-remote-models

BigQuery error in mk operation: Already Exists: Connection
projects/468982775008/locations/us/connections/vertex-remote-models


* Below steps create two  remote model in Bigquery - 

1. remote model named `embeddings` using the `text-embedding-005` model available in Vertex AI.
2. remote model named `gemini` using the `gemini-2.0-flash` model available in Vertex AI.


In [5]:
create_embed_model = f"""
CREATE OR REPLACE MODEL `{PROJECT_ID}.{BQ_SILVER_DATASET}.embeddings`
REMOTE WITH CONNECTION `us.vertex-remote-models`
OPTIONS (ENDPOINT = 'text-embedding-005');
"""

create_gen_model = f"""
CREATE OR REPLACE MODEL `{PROJECT_ID}.{BQ_SILVER_DATASET}.gemini`
REMOTE WITH CONNECTION `us.vertex-remote-models`
OPTIONS (ENDPOINT = 'gemini-2.0-flash');
"""


In [6]:
# @title Error Handling Tip

'''
If you get error while running below cells to create Remote models related to
Service account privilge, run the below command, after replacing the
`SERVICE_ACCOUNT_EMAIL` with the service account shown in the error.
'''
!gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT \
    --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \
    --role="roles/aiplatform.user"


ERROR: Policy modification failed. For a binding with condition, run "gcloud alpha iam policies lint-condition" to identify issues in condition.
[1;31mERROR:[0m (gcloud.projects.add-iam-policy-binding) INVALID_ARGUMENT: Invalid service account (SERVICE_ACCOUNT_EMAIL).


In [7]:
%%bigquery
$create_embed_model

Query is running:   0%|          |

In [8]:
%%bigquery
$create_gen_model

Query is running:   0%|          |

#### Review Table Data Processing

* In this section, we are 
    
    1. reading the two review tables in BRONZE layer
    2. Clean them and filter them with specific required columns.
    3. Union both the the tables.
    4. Write to the SILVER layer as a single table to keep all reviews as a single table.


In [9]:
import bigframes.pandas as bpd
import bigframes.bigquery as bbq
from bigframes.ml import llm

# Set BigQuery DataFrames options
# Note: The project option is not required in all environments.
# On BigQuery Studio, the project ID is automatically detected.
bpd.options.bigquery.project = PROJECT_ID

# Note: The location option is not required.
# It defaults to the location of the first table or query
# passed to read_gbq(). For APIs where a location can't be
# auto-detected, the location defaults to the "US" location.
bpd.options.bigquery.location = LOCATION

##### Exploring app review table `google_play_reviews`

In [None]:
#read data to dataframe
review_df1 = bpd.read_gbq(f'{PROJECT_ID}.{BQ_BRONZE_DATASET}.google_play_reviews')

In [11]:
review_df1.head()

Unnamed: 0.1,Unnamed: 0,app_name,app_genre,avg_rating,num_downloads,num_reviews,user,review_text,rating
0,2839,Cast to TV/Chromecast/Roku/TV+,,4.3star,5M+,44.6K reviews,AoD Wexler,Used this app for close to 2 years now. For th...,3
1,136,Geometry Dash SubZero,Action,4.3star,50M+,766K reviews,Troy Stonehocker,I love the difficulty because life needs a cha...,4
2,255,Adobe Captivate Prime,Business,3.7star,100K+,763 reviews,Tek Yan Chua,"Unresponsive, slow, buggy and keeps crashing. ...",1
3,2207,Urban Company,,4.8star,10M+,866K reviews,sociallyruby,"It was good when it started, recently however ...",2
4,1567,My Orange Moldova,Tools,3.9star,500K+,8.18K reviews,A Google user,The app doesn't show accurate internet consump...,2


In [12]:
review_df1.columns

Index(['Unnamed: 0', 'app_name', 'app_genre', 'avg_rating', 'num_downloads',
       'num_reviews', 'user', 'review_text', 'rating'],
      dtype='object')

In [None]:
#renaming column 'Unnamed: 0' to 'id'
review_df1 = review_df1.rename(columns={'Unnamed: 0':'id'})

In [None]:
#lets keep only necessary column
review_df1_subset = review_df1[['id', 'app_name','app_genre','review_text','rating']]

In [17]:
review_df1_subset.head(5)

Unnamed: 0,id,app_name,app_genre,review_text,rating
0,2839,Cast to TV/Chromecast/Roku/TV+,,Used this app for close to 2 years now. For th...,3
1,136,Geometry Dash SubZero,Action,I love the difficulty because life needs a cha...,4
2,255,Adobe Captivate Prime,Business,"Unresponsive, slow, buggy and keeps crashing. ...",1
3,2207,Urban Company,,"It was good when it started, recently however ...",2
4,1567,My Orange Moldova,Tools,The app doesn't show accurate internet consump...,2


In [18]:
review_df1_subset.info()

<class 'bigframes.dataframe.DataFrame'>
Index: 4888 entries, 0 to 4887
Data columns (total 5 columns):
  #  Column       Non-Null Count    Dtype
---  -----------  ----------------  -------
  0  id           4888 non-null     Int64
  1  app_name     4888 non-null     string
  2  app_genre    3018 non-null     string
  3  review_text  4888 non-null     string
  4  rating       4888 non-null     Int64
dtypes: Int64(2), string(3)
memory usage: 234624 bytes


##### Exploring app review table `googleplaystore_user_reviews`

In [19]:
review_df2 = bpd.read_gbq(f'{PROJECT_ID}.{BQ_BRONZE_DATASET}.googleplaystore_user_reviews')

In [20]:
review_df2.head(5)

Unnamed: 0,App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
0,Hot Wheels: Race Off,Last update screwed man !!! On daily challenge...,Positive,0.266667,0.155556
1,DELISH KITCHEN - 無料レシピ動画で料理を楽しく・簡単に！,Convenient and easy to use!,Positive,0.541667,0.833333
2,EyeCloud,Works pretty good honestly! I Pixel 2XL though...,Positive,0.44,0.62
3,BELONG Beating Cancer Together,It's nice talk others MBC. I live small town s...,Positive,0.046591,0.575
4,"Cymera Camera- Photo Editor, Filter,Collage,La...",Recently reinstalled device I've used problems...,Negative,-0.4875,0.6125


In [21]:
review_df2.columns

Index(['App', 'Translated_Review', 'Sentiment', 'Sentiment_Polarity',
       'Sentiment_Subjectivity'],
      dtype='object')

In [22]:
# keep only the necessary column

review_df2_subset = review_df2[['App','Translated_Review','Sentiment']]

In [23]:
review_df2_subset.info()

<class 'bigframes.dataframe.DataFrame'>
Index: 64295 entries, 0 to 64294
Data columns (total 3 columns):
  #  Column             Non-Null Count    Dtype
---  -----------------  ----------------  -------
  0  App                64295 non-null    string
  1  Translated_Review  37427 non-null    string
  2  Sentiment          37432 non-null    string
dtypes: string(3)
memory usage: 2057440 bytes


In [24]:
review_df2_subset = review_df2_subset.rename(columns={'App':'app_name','Translated_Review':'review_text','Sentiment':'sentiment'})

In [25]:
review_df2_subset.head(5)

Unnamed: 0,app_name,review_text,sentiment
0,Hot Wheels: Race Off,Last update screwed man !!! On daily challenge...,Positive
1,DELISH KITCHEN - 無料レシピ動画で料理を楽しく・簡単に！,Convenient and easy to use!,Positive
2,EyeCloud,Works pretty good honestly! I Pixel 2XL though...,Positive
3,BELONG Beating Cancer Together,It's nice talk others MBC. I live small town s...,Positive
4,"Cymera Camera- Photo Editor, Filter,Collage,La...",Recently reinstalled device I've used problems...,Negative


In [None]:
# Concat both the review tables.
review_df = bpd.concat([review_df1_subset,review_df2_subset],axis=0)

In [27]:
review_df.head()

Unnamed: 0,id,app_name,app_genre,review_text,rating,sentiment
0,2839,Cast to TV/Chromecast/Roku/TV+,,Used this app for close to 2 years now. For th...,3,
1,136,Geometry Dash SubZero,Action,I love the difficulty because life needs a cha...,4,
2,255,Adobe Captivate Prime,Business,"Unresponsive, slow, buggy and keeps crashing. ...",1,
3,2207,Urban Company,,"It was good when it started, recently however ...",2,
4,1567,My Orange Moldova,Tools,The app doesn't show accurate internet consump...,2,


In [30]:
review_df.count()

id              4888
app_name       69183
app_genre       3018
review_text    42315
rating          4888
sentiment      37432
dtype: Int64

In [31]:
review_df.isna().sum()

id             64295
app_name           0
app_genre      66165
review_text    26868
rating         64295
sentiment      31751
dtype: Int64

In [None]:
# Writing the table to silver layer
review_df.to_gbq(destination_table=f'{PROJECT_ID}.{BQ_SILVER_DATASET}.T_APP_REVIEWS',if_exists='replace')

'market-mirror-dev.APP_MARKET_SILVER.T_APP_REVIEWS'

#### Product Tables Data Processing

* In this section we are reading the Prodct description tables from the 3 platforms - 
    
    1. Google   - table_names: cleanapp
    2. Apple    - table_names: AppleStore, appleStore_description
    3. Windows  - table_names: windows_store

* Clean the data, filter with specific columns required and write the tables to SILVER layer.

##### Exploring Google Apps table `cleanapp`

In [33]:
google_apps_df = bpd.read_gbq(f'{PROJECT_ID}.{BQ_BRONZE_DATASET}.cleanapp')

In [34]:
google_apps_df.head(5)

Unnamed: 0,_c0,_id,title,description,summary,installs,minInstalls,realInstalls,score,ratings,...,contentRating,contentRatingDescription,adSupported,containsAds,released,updated,version,recentChanges,appId,url
0,4133,63e523d7020a6c8cabe37d92,CBN Family,CBN Family is a free Christian TV streaming ap...,CBN Family is a free Christian TV app broadcas...,"100,000+",100000.0,209982.0,4.74,1093.0,...,1,,0.0,0.0,"Aug 7, 2015",1657747349,20091,This update includes various bug fixes and app...,com.cbn.cbntv.app.android.christian.tv,https://play.google.com/store/apps/details?id=...
1,7579,63e523d7020a6c8cabe38b08,Draw a Stickman: EPIC,Pencil your way into one of the most creative ...,"Draw a stickman, then guide him through a fant...","100,000+",100000.0,232948.0,4.270968,11497.0,...,2,Fantasy Violence,0.0,0.0,"Nov 13, 2012",1429028558,1.4.3.113,- Support large screen devices running at 2560...,com.hitcents.stickmanepic,https://play.google.com/store/apps/details?id=...
2,9464,63e523d7020a6c8cabe39265,Hero Wars,Mutants and Aliens invaded the Earth. All cit...,Realtime Defense game. Control all your heroe...,"10,000+",10000.0,32162.0,4.82,918.0,...,4,Violence,True,True,"Mar 10, 2018",1520766578,1.0.0,,com.naomicsoft.hero_wars01,https://play.google.com/store/apps/details?id=...
3,7769,63e523d7020a6c8cabe38bc6,(re)format Z:,Embark on a quest to become the programmer of ...,Reformat Z: Do you have the skills to make you...,"10,000+",10000.0,21033.0,2.375,343.0,...,1,,0.0,0.0,"Nov 1, 2017",1539854317,1.1.6,- Fixed some strings not being localized,com.blindflugstudios.renewalzh,https://play.google.com/store/apps/details?id=...
4,9526,63e523d7020a6c8cabe392a3,Swim.com: Workouts & Tracking,SWIM.COM IS THE ESSENTIAL APP FOR SWIMMERS! R...,Swim with the most advanced swim tracking app ...,"50,000+",50000.0,94936.0,3.9,1014.0,...,1,,0.0,0.0,"Feb 5, 2015",1668499138,5.2.3,Bug fixes,com.spiraledge.swimapp,https://play.google.com/store/apps/details?id=...


In [35]:
google_apps_df.columns

Index(['_c0', '_id', 'title', 'description', 'summary', 'installs',
       'minInstalls', 'realInstalls', 'score', 'ratings', 'reviews', 'price',
       'free', 'currency', 'offersIAP', 'inAppProductPrice', 'developer',
       'developerId', 'genre', 'contentRating', 'contentRatingDescription',
       'adSupported', 'containsAds', 'released', 'updated', 'version',
       'recentChanges', 'appId', 'url'],
      dtype='object')

In [36]:
google_apps_df.count()

_c0                         11593
_id                         11593
title                       11593
description                 11593
summary                     11592
installs                    11593
minInstalls                 11593
realInstalls                11593
score                       11593
ratings                     11593
reviews                     11593
price                       11593
free                        11593
currency                    11593
offersIAP                    7953
inAppProductPrice           11593
developer                   11593
developerId                 11593
genre                       11593
contentRating               11593
contentRatingDescription     2965
adSupported                 11593
containsAds                 11593
released                    11593
updated                     11593
dtype: Int64

In [None]:
# keep only specific subset of columns
google_apps_df_subset = google_apps_df[['title','description','summary','ratings','reviews','price','free','genre']]

In [38]:
google_apps_df_subset.head(5)

Unnamed: 0,title,description,summary,ratings,reviews,price,free,genre
0,CBN Family,CBN Family is a free Christian TV streaming ap...,CBN Family is a free Christian TV app broadcas...,1093.0,257.0,0.0,True,Lifestyle
1,Draw a Stickman: EPIC,Pencil your way into one of the most creative ...,"Draw a stickman, then guide him through a fant...",11497.0,1232.0,0.99,False,Adventure
2,Hero Wars,Mutants and Aliens invaded the Earth. All cit...,Realtime Defense game. Control all your heroe...,918.0,68.0,0.0,True,Strategy
3,(re)format Z:,Embark on a quest to become the programmer of ...,Reformat Z: Do you have the skills to make you...,343.0,14.0,0.0,True,Action
4,Swim.com: Workouts & Tracking,SWIM.COM IS THE ESSENTIAL APP FOR SWIMMERS! R...,Swim with the most advanced swim tracking app ...,1014.0,132.0,0.0,True,Health & Fitness


In [None]:
# write to SILVER Layer
google_apps_df_subset.to_gbq(destination_table=f'{PROJECT_ID}.{BQ_SILVER_DATASET}.T_GOOGLE_APP_DETAILS',if_exists='replace')

'market-mirror-dev.APP_MARKET_SILVER.T_GOOGLE_APP_DETAILS'

##### Exploring Windows Apps table `windows_store`

In [None]:
windows_app_df = bpd.read_gbq(f'{PROJECT_ID}.{BQ_BRONZE_DATASET}.windows_store')

In [None]:
windows_app_df.head(5)

Unnamed: 0,Name,Price,Description,Publisher,Date_of_Release,Category,Size,Age_Rating,Languages
0,3D Chess Game,Free,"Play against the right A.I. level for you, in ...",A Trillion Games Ltd,06-02-2014,Card & board,45.6 MB,For ages 3 and up,English(UnitedStates)
1,Edge of Reality: Mark of Fate,₹ 549.00,Big Fish Editor's Choice! This title was selec...,Big Fish Games,18-01-2020,Action & adventure,892.71 MB,For ages 16 and up,English(UnitedStates)
2,Demolition,Free,"Explore, admire, then destroy works of archite...",Khor Chin Heong,11-05-2015,Simulation,2.49 GB,For ages 3 and up,English(UnitedStates)
3,Lonely Mountains: Downhill,Included  +  with Game Pass,Key Features: • Travel to the Lonely Mountains...,Thunderful Publishing,23-10-2019,Action & adventure,,For ages 3 and up,English(UnitedStates)
4,Screen Recorder by Animotica,"₹ 1,099.00",Screen Recorder by Animotica brings you an eas...,Mixilab LLC,03-11-2020,Productivity,24.02 MB,For ages 3 and up,English(UnitedStates)


In [None]:
#keep only a subset of columns
windows_app_df_subset = windows_app_df[['Name','Price','Description','Category','Size']]

In [None]:
windows_app_df_subset.count()

Name           3960
Price          3960
Description    3960
Category       3960
dtype: Int64

In [None]:
#write to BQ SILVER Layer
windows_app_df_subset.to_gbq(f'{PROJECT_ID}.{BQ_SILVER_DATASET}.T_WINDOWS_APP_DETAILS',if_exists='replace')

'market-mirror-dev.APP_MARKET_SILVER.T_WINDOWS_APP_DETAILS'

##### Exploring Apple Apps tables `AppleStore`

In [None]:
#Read data
apple_app_df = bpd.read_gbq(f'{PROJECT_ID}.{BQ_BRONZE_DATASET}.AppleStore')

In [None]:
apple_app_df.head(5)

Unnamed: 0,_c0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices_num,ipadSc_urls_num,lang_num,vpp_lic
0,3701,931337752,幻獣契約クリプトラクト,97223680,USD,0.0,33,0,4.0,0.0,v3.6.9,9+,Games,40,5,1,1
1,6668,1070864992,Solo Selfie,140281856,USD,0.0,1799,267,4.0,4.5,2.2.1,4+,Photo & Video,37,4,11,1
2,5575,1042273323,Splash Cars,182239232,USD,0.0,2094,24,4.5,5.0,1.5.2,4+,Games,38,5,1,1
3,10217,1153473604,GoldenMoji - Golden Retriever Emojis,43315200,USD,0.0,27,3,2.5,2.0,1.5,4+,Utilities,37,3,1,1
4,1095,463142843,twinkle,17612800,USD,2.99,5,0,4.0,0.0,6.6,17+,Entertainment,37,1,1,1


In [None]:
#Keep only a subset of columns
apple_app_df_subset = apple_app_df[['id','track_name','size_bytes','currency','price','user_rating','prime_genre']]

In [None]:
apple_app_df_subset.head(5)

Unnamed: 0,id,track_name,size_bytes,currency,price,user_rating,prime_genre
0,931337752,幻獣契約クリプトラクト,97223680,USD,0.0,4.0,Games
1,1070864992,Solo Selfie,140281856,USD,0.0,4.0,Photo & Video
2,1042273323,Splash Cars,182239232,USD,0.0,4.5,Games
3,1153473604,GoldenMoji - Golden Retriever Emojis,43315200,USD,0.0,2.5,Utilities
4,463142843,twinkle,17612800,USD,2.99,4.0,Entertainment


In [None]:
apple_app_df_subset.count()

id             7197
track_name     7197
size_bytes     7197
currency       7197
price          7197
user_rating    7197
prime_genre    7197
dtype: Int64

In [None]:
# Read the 2nd Apple product description table 
apple_app_df2 = bpd.read_gbq(f'{PROJECT_ID}.{BQ_BRONZE_DATASET}.appleStore_description')

incompatibilies with previous reads of this table. To read the latest
version, set `use_cache=False` or close the current session with
Session.close() or bigframes.pandas.close_session().
  return method(*args, **kwargs)


In [None]:
apple_app_df2.head()

Unnamed: 0,id,track_name,size_bytes,app_desc
0,1120072154,Athlete Shave Salon Games,62823424,Can you help our your friends get ready for th...
1,1106737361,传说一刀 - 一刀一级PK爆屠龙,363078656,老司机带你飞！千人跨服战开打！ 复古76千人同屏攻城战手游《传说一刀》，以蓝光高清画质、流...
2,739836039,Literacy Leveler,2455552,Literacy Leveler makes it easy to level childr...
3,1085596938,Escape from the ICU room.,96123904,#Story :I tomorrow will leave the hospital. Bu...
4,453691481,飞猪,148888576,飞猪是阿里巴巴集团旗下旅行品牌，旨在为用户提供便捷、更高性价比的出行服务。 飞猪旅行app...


In [None]:
apple_app_df2_subset = apple_app_df2[['id','app_desc']]

In [None]:
apple_app_df2_subset.count()

id          7197
app_desc    7197
dtype: Int64

In [None]:
#join both the tables
apple_app_df_subset = bpd.merge(apple_app_df_subset,apple_app_df2_subset,left_on='id',right_on='id',how='left')

In [None]:
apple_app_df_subset.head(5)

Unnamed: 0,id,track_name,size_bytes,currency,price,user_rating,prime_genre,app_desc
0,931337752,幻獣契約クリプトラクト,97223680,USD,0.0,4.0,Games,◆【 600 万ダウンロード突破！】◆ 大ヒット王道ファンタジーRPG「幻獣契約クリプトラク...
1,1070864992,Solo Selfie,140281856,USD,0.0,4.0,Photo & Video,Solo camera is the easiest way to create and s...
2,1042273323,Splash Cars,182239232,USD,0.0,4.5,Games,Tired of the everyday grey? Color the world in...
3,1153473604,GoldenMoji - Golden Retriever Emojis,43315200,USD,0.0,2.5,Utilities,The cutest Golden Retrievers and Ultimate Emoj...
4,463142843,twinkle,17612800,USD,2.99,4.0,Entertainment,2ちゃんねる、まちBBS、したらば、その他の2ちゃんねる互換の掲示板を閲覧するためのブラウザ...


In [None]:
#finally write the table to BQ Silver layer
apple_app_df_subset.to_gbq(f'{PROJECT_ID}.{BQ_SILVER_DATASET}.T_APPLE_APP_DETAILS',if_exists='replace')

'market-mirror-dev.APP_MARKET_SILVER.T_APPLE_APP_DETAILS'