# Task2 A Sample of Owners

A python script that handles the following tasks: 
1.	Connects to your GBQ instance.
2.	Builds a list of owners. 
3.	Takes a sample of the owners. 
4.	Extracts all records associated with those owners and writes them to a local text file. 

**Notes: Final Sample files are uploaded into the GBQ, access [here](https://console.cloud.google.com/bigquery?project=hong-wedge&p=hong-wedge&d=transactions&t=sample_owners_records&page=table)!**

In [4]:
import os
import io
import csv
from zipfile import ZipFile

from google.cloud import bigquery
from google.oauth2 import service_account

## Connects to GBQ instance.


In [5]:
# GBQ Setting
service_path = "./"
service_file = 'Hong-Wedge-8a5b036bb32c.json' 
gbq_proj_id = 'hong-wedge' 

private_key =service_path + service_file
credentials = service_account.Credentials.from_service_account_file(service_path + service_file)
client = bigquery.Client(credentials = credentials, project=gbq_proj_id)

## Builds a list of owners. 
This step search all of the owners in the records except for `card_no=3`. To achieve this, make use of the GBQ ability of [Querying multiple tables using a wildcard table](https://cloud.google.com/bigquery/docs/querying-wildcard-tables)


In [19]:
query = (
    "SELECT distinct card_no "
    "FROM `hong-wedge.transactions.transArchive_*` "
    "WHERE card_no != 3 "
)
query_job = client.query(
    query,
    location="US",
)

owners=[]
for idx, row in enumerate(query_job) :
    owners.append(row[0])
    
print(f"We have total {len(owners)} owners")

We have total 27207 owners


## Takes a sample of the owners.


In [50]:
import random
random.seed(len(owners))
sample_owners= random.sample(owners,500)
sample_owners =[str(owner) for owner in sample_owners]
sample_owners_str=",".join(sample_owners)
print(sample_owners_str)

20770.0,64450.0,13430.0,48388.0,51668.0,11933.0,51912.0,23821.0,48460.0,48307.0,25262.0,21247.0,47340.0,22325.0,15811.0,15258.0,48609.0,50137.0,15884.0,64959.0,17321.0,12222.0,15930.0,18157.0,34281.0,24706.0,21722.0,49545.0,13405.0,22632.0,15907.0,50744.0,14114.0,19998.0,24432.0,17939.0,49587.0,65306.0,50347.0,65080.0,37365.0,25214.0,19038.0,51011.0,23621.0,64324.0,49003.0,52104.0,25246.0,12567.0,40429.0,64762.0,53826.0,23584.0,45102.0,20984.0,51218.0,21314.0,39531.0,21362.0,44940.0,12765.0,17734.0,23634.0,44366.0,38074.0,22649.0,10298.0,42233.0,52889.0,11055.0,10909.0,44513.0,66118.0,17612.0,12992.0,41257.0,18899.0,19418.0,52036.0,64250.0,24588.0,13715.0,11426.0,14932.0,24674.0,25457.0,13726.0,51217.0,51045.0,16085.0,64943.0,16948.0,44225.0,10887.0,48641.0,10984.0,44646.0,20729.0,17564.0,57235.0,25932.0,50706.0,18774.0,18347.0,50107.0,10104.0,20442.0,20595.0,12462.0,47856.0,24040.0,35765.0,11199.0,51683.0,51556.0,22161.0,13468.0,42401.0,26575.0,50771.0,19526.0,64395.0,13060.0,19701.0,

## Extracts all records 

In [51]:
sample_owner_list_str=",".join(sample_owners)
query = (
    "SELECT * "
    "FROM `hong-wedge.transactions.transArchive_*` "
    "WHERE card_no in ("+ sample_owners_str +")"
)
query_job = client.query(
    query,
    location="US",
)

fields= ["datetime","register_no","emp_no","trans_no","upc","description","trans_type","trans_subtype","trans_status","department"
         ,"quantity","scale","cost","unitPrice","total","regPrice","altPrice","tax","taxexempt","foodstamp","wicable","discount"
         ,"memDiscount","discountable","discounttype","voided","percentDiscount","ItemQtty","volDiscType","volume","VolSpecial"
         ,"mixMatch","matched","memType","staff","numflag","itemstatus","tenderstatus","charflag","varflag","batchHeaderID","local"
         ,"organic","display","receipt","card_no","store","branch","branch","trans_id"]

with open("sample_owners_records.csv", "w",encoding="utf-8") as text_file:
    text_file.write(",".join(fields)+"\n")
    for idx, row in enumerate(query_job) :
        record=[]
        for i in range(0,50):
            record.append(str(row[i]))
        output=",".join(record)
        text_file.write(output+"\n")