**Activate `dwh` environment**

And Install google-cloud-bigquery-storage, google-cloud-bigquery packages if not done yet.

The following codes are for data analysis

1. Seller performance (Top sellers by year)
2. Product performance (Top selling product by category by year by city)

In [3]:
#Authenticate the user to access the bigquery API

from google.auth import default
from google.cloud import bigquery

# Authenticate the user
# run `gcloud auth application-default login` in your terminal to authenticate using your auth token

# Set the project ID
project_id = 'premium-node-451703-i2'

# Use the default credentials
credentials, project = default()

# Initialize the BigQuery client
client = bigquery.Client(credentials=credentials, project=project_id)

print(f"Successfully authenticated with project: {project_id}")

Successfully authenticated with project: premium-node-451703-i2


**Distinct value of order_status from orders table for reference** 

0	approved

1	canceled

2	created

3	delivered

4	invoiced

5	processing

6	shipped

7	unavailable

Note: The facts table only contains the orders which are in all status except `canceled`, `created` and `unavailable` as those orders are not confirmed and have no product information attached.



In [4]:
#Fetch information from facts table and put into dataframe
query="select * from premium-node-451703-i2.brazilecom_facts.facts_orders limit 300"
df_res = client.query(query).to_dataframe()
df_res.head()

Unnamed: 0,order_id,customer_id,order_status,order_year,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,order_item_id,...,customer_state,customer_unique_id,product_category_name_english,product_name_lenght,product_description_lenght,product_photos_qty,product_weight_g,product_length_cm,product_height_cm,product_width_cm
0,e04f1da1f48bf2bbffcf57b9824f76e1,0d00d77134cae4c58695086ad8d85100,invoiced,2016,2016-10-05 13:22:20,2016-10-06 15:51:38,NaT,NaT,2016-11-29,,...,SC,8886115442775dd8a20c2dcc921c7cc8,,,,,,,,
1,a68ce1686d536ca72bd2dadc4b8671e5,d7bed5fac093a4136216072abaf599d5,shipped,2016,2016-10-05 01:47:40,2016-10-07 03:11:22,2016-11-07 16:37:37,NaT,2016-12-01,,...,RS,f15a952dfc52308d0361288fbf42c7b3,,,,,,,,
2,2ce9683175cdab7d1c95bcbb3e36f478,b2d7ae0415dbbca535b5f7b38056dd1f,invoiced,2016,2016-10-05 21:03:33,2016-10-06 07:46:39,NaT,NaT,2016-11-25,,...,SP,6a2da481aa7827b951175772a0fe8bb8,,,,,,,,
3,7f39ba4c9052be115350065d07583cac,d7fc82cbeafea77bd0a8fbbf6296e387,delivered,2017,2017-10-18 08:16:34,2017-10-18 23:56:20,2017-10-20 14:29:01,2017-10-27 16:46:05,2017-11-09,1.0,...,MG,9de5797cddb92598755a0f76383ddbbb,small_appliances,40.0,849.0,2.0,11800.0,40.0,43.0,36.0
4,d455a8cb295653b55abda06d434ab492,944b72539d7e1f7f7fc6e46639ef1fe3,delivered,2017,2017-09-26 22:17:05,2017-09-27 22:24:16,2017-09-29 15:53:03,2017-10-07 16:12:47,2017-10-30,1.0,...,PR,3c7e305796add66698959fc7ad176f6b,small_appliances,40.0,849.0,2.0,11800.0,40.0,43.0,36.0


**Use Case 1: To find the best sellers by year**

In [5]:
# group by seller_id, order_year, seller_city and sum price and freight_value 
df_summary_bysellerbyyearbycity = df_res.groupby(['seller_id', 'order_year','seller_city']).agg({'price': 'sum', 'freight_value': 'sum'}).reset_index()
df_summary_bysellerbyyearbycity.sort_values(by=['order_year', 'seller_city','price'], ascending=[True, False, True], inplace=True)
df_summary_bysellerbyyearbycity.head()

Unnamed: 0,seller_id,order_year,seller_city,price,freight_value
0,0015a82c2db000af6aaaf3ae2ecb0532,2017,santo andre,2685.0,63.06
3,002100f778ceb8431b7a1020ff7ab48f,2017,franca,799.7,608.63
1,001cca7ae9ae17fb1caed9dfb1094831,2017,cariacica,21638.23,7428.32
4,002100f778ceb8431b7a1020ff7ab48f,2018,franca,434.8,185.03
2,001cca7ae9ae17fb1caed9dfb1094831,2018,cariacica,3441.8,1425.82


**Use Case 2: To find best product category by year by city**

In [13]:
df_bestproduct_byyearbycity = df_res.groupby(['product_category_name_english', 'order_year','customer_city']).agg({'price': 'sum', 'freight_value': 'sum'}).reset_index()
df_bestproduct_byyearbycity.sort_values(by=['order_year', 'customer_city','price'], ascending=[True, False, True], inplace=True)
df_bestproduct_byyearbycity.head()

Unnamed: 0,product_category_name_english,order_year,customer_city,price,freight_value
52,furniture_decor,2017,votorantim,49.9,11.85
159,garden_tools,2017,vitoria,110.0,20.27
158,garden_tools,2017,vila flores,89.0,69.33
157,garden_tools,2017,videira,99.0,39.79
179,small_appliances,2017,umuarama,895.0,21.02
