# Task2 A Sample of Owners

A python script that handles the following tasks: 
1.	Connects to your GBQ instance.
2.	Builds a list of owners. 
3.	Takes a sample of the owners. 
4.	Extracts all records associated with those owners and writes them to a local text file. 

**Notes: Final Sample files are uploaded into the GBQ, access [here](https://console.cloud.google.com/bigquery?project=hong-wedge&p=hong-wedge&d=transactions&t=sample_owners_records&page=table)!**

In [4]:
import os
import io
import csv
from zipfile import ZipFile

from google.cloud import bigquery
from google.oauth2 import service_account

## Connects to GBQ instance.


In [5]:
# GBQ Setting
service_path = "./"
service_file = 'Hong-Wedge-8a5b036bb32c.json' 
gbq_proj_id = 'hong-wedge' 

private_key =service_path + service_file
credentials = service_account.Credentials.from_service_account_file(service_path + service_file)
client = bigquery.Client(credentials = credentials, project=gbq_proj_id)

## Builds a list of owners. 
This step search all of the owners in the records except for `card_no=3`. To achieve this, make use of the GBQ ability of [Querying multiple tables using a wildcard table](https://cloud.google.com/bigquery/docs/querying-wildcard-tables)


In [19]:
query = (
    "SELECT distinct card_no "
    "FROM `hong-wedge.transactions.transArchive_*` "
    "WHERE card_no != 3 "
)
query_job = client.query(
    query,
    location="US",
)

owners=[]
for idx, row in enumerate(query_job) :
    owners.append(row[0])
    
print(f"We have total {len(owners)} owners")

We have total 27207 owners


## Takes a sample of the owners.


In [None]:
import random
random.seed(len(owners))
sample_owners= random.sample(owners,500)
sample_owners =[str(owner) for owner in sample_owners]
sample_owners_str=",".join(sample_owners)

## Extracts all records 

In [51]:
sample_owner_list_str=",".join(sample_owners)
query = (
    "SELECT * "
    "FROM `hong-wedge.transactions.transArchive_*` "
    "WHERE card_no in ("+ sample_owners_str +")"
)
query_job = client.query(
    query,
    location="US",
)

fields= ["datetime","register_no","emp_no","trans_no","upc","description","trans_type","trans_subtype","trans_status","department"
         ,"quantity","scale","cost","unitPrice","total","regPrice","altPrice","tax","taxexempt","foodstamp","wicable","discount"
         ,"memDiscount","discountable","discounttype","voided","percentDiscount","ItemQtty","volDiscType","volume","VolSpecial"
         ,"mixMatch","matched","memType","staff","numflag","itemstatus","tenderstatus","charflag","varflag","batchHeaderID","local"
         ,"organic","display","receipt","card_no","store","branch","branch","trans_id"]

with open("sample_owners_records.csv", "w",encoding="utf-8") as text_file:
    text_file.write(",".join(fields)+"\n")
    for idx, row in enumerate(query_job) :
        text_file.write("\t".join([str(item) for item in row]) + "\n") #tersely print  