<a href="https://colab.research.google.com/github/kty0307/Blog/blob/main/mockup_db_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
BigQuery 라이브러리 설치, 인증 설정, 테이블 로딩을 위한 초기 코드를 작성해줘.

## 라이브러리 설치

### Subtask:
Google Cloud BigQuery 클라이언트 라이브러리를 설치합니다.


**Reasoning**:
The subtask is to install the Google Cloud BigQuery client library. This can be done using pip in a Jupyter Notebook environment.



In [2]:
%pip install google-cloud-bigquery



## 인증 설정

### Subtask:
Google Cloud 인증을 설정합니다.


**Reasoning**:
Import the necessary module and authenticate the user for Google Cloud access.



In [3]:
from google.colab import auth
auth.authenticate_user()

## 테이블 로딩

### Subtask:
BigQuery에서 테이블을 로딩하는 코드를 작성합니다.


**Reasoning**:
Import the necessary Client class from google.cloud.bigquery, create a client object, and then retrieve the specified BigQuery table.



In [8]:
from google.cloud.bigquery import Client

client = bigquery.Client(project="bigquery-mockup-test")

project_id = 'bigquery-mockup-test'  # Replace with your GCP project ID
dataset_id = 'mockup'      # Replace with your BigQuery dataset ID
table_id = 'tb_a'          # Replace with your BigQuery table ID

table_ref = client.dataset(dataset_id, project=project_id).table(table_id)
table = client.get_table(table_ref)

print(f"Loaded table: {table.project}.{table.dataset_id}.{table.table_id}")
print(f"Table schema: {table.schema}")

Loaded table: bigquery-mockup-test.mockup.tb_a
Table schema: [SchemaField('loan_req_id', 'STRING', 'NULLABLE', None, '대출 신청 고유 번호', (), None), SchemaField('inquiry_id', 'STRING', 'NULLABLE', None, '한도조회 고유 번호', (), None), SchemaField('customer_id', 'STRING', 'NULLABLE', None, '고객 고유 번호', (), None), SchemaField('gender', 'STRING', 'NULLABLE', None, '성별', (), None), SchemaField('age', 'INTEGER', 'NULLABLE', None, '연령', (), None), SchemaField('occupation_cd', 'STRING', 'NULLABLE', None, '직업 구분 코드(자영업자, 급여소득자, 주부)', (), None), SchemaField('final_loan_amount', 'FLOAT', 'NULLABLE', None, '최종 실행 대출 금액', (), None), SchemaField('application_date', 'DATE', 'NULLABLE', None, '신청일', (), None), SchemaField('process_date', 'DATE', 'NULLABLE', None, '처리일(신청 이후 접수나 추가 한도조회가 별도로 이뤄진 경우나 결국 대출이 기각된 경우 여기에 날짜가 기입됨. 즉 해당 신청건에 대한 가장 최종 처리일을 뜻함)', (), None), SchemaField('execution_date', 'DATE', 'NULLABLE', None, '대출 실행일', (), None), SchemaField('status', 'STRING', 'NULLABLE', None, '신청 상태', (), None)]


## Summary:

### Data Analysis Key Findings

*   The `google-cloud-bigquery` library was already installed in the environment.
*   Google Cloud authentication was successfully configured using `google.colab.auth.authenticate_user()`.
*   A BigQuery table was successfully loaded using the `google.cloud.bigquery.Client` library, and its schema was retrieved and displayed.

### Insights or Next Steps

*   The initial setup for interacting with BigQuery from a Colab environment is complete.
*   The next steps would involve querying the loaded BigQuery table to perform data analysis.


In [11]:
from google.cloud.bigquery import Client

# Assuming the 'client' object is already initialized from previous steps
# client = Client()

project_id = 'bigquery-mockup-test'  # Replace with your GCP project ID
dataset_id = 'mockup'      # Replace with your BigQuery dataset ID
table_id = 'tb_a'          # Replace with your BigQuery table ID

query = f"""
SELECT
    SUM(final_loan_amount) AS total_loan_amount
FROM
    `{project_id}.{dataset_id}.{table_id}`
WHERE
    execution_date BETWEEN '2025-07-01' AND '2025-07-31'  # Assuming the year is 2025, adjust if needed
"""

query_job = client.query(query)
results = query_job.result()

for row in results:
    total_loan_amount = row.total_loan_amount

print(f"The total loan amount for July in 2025 was: {total_loan_amount}")

The total loan amount for July in 2025 was: 830522000.0


In [12]:
from google.cloud.bigquery import Client

# Assuming the 'client' object is already initialized from previous steps
# client = Client()

project_id = 'bigquery-mockup-test'  # Replace with your GCP project ID
dataset_id = 'mockup'      # Replace with your BigQuery dataset ID
table_id = 'tb_b'          # Replace with your BigQuery table ID

query = f"""
SELECT
    *
FROM
    `{project_id}.{dataset_id}.{table_id}`
LIMIT 100
"""

query_job = client.query(query)
results = query_job.result()

# Display the results (first 100 rows)
for i, row in enumerate(results):
    if i == 0:
        print(row.keys()) # Print column names
    print(row)

dict_keys(['loan_req_id', 'inquiry_id', 'table_reg_no', 'product_id', 'inquiry_date', 'offer_limit', 'offer_rate', 'channel', 'is_accepted'])
Row(('REQ000349', 'INQ000349', 1, '론A', datetime.date(2025, 7, 1), 8930000.0, 19.092683802299995, 'Direct', False), {'loan_req_id': 0, 'inquiry_id': 1, 'table_reg_no': 2, 'product_id': 3, 'inquiry_date': 4, 'offer_limit': 5, 'offer_rate': 6, 'channel': 7, 'is_accepted': 8})
Row(('REQ000349', 'INQ000349', 2, '론B', datetime.date(2025, 7, 1), 5635000.0, 12.337965671908423, 'Direct', False), {'loan_req_id': 0, 'inquiry_id': 1, 'table_reg_no': 2, 'product_id': 3, 'inquiry_date': 4, 'offer_limit': 5, 'offer_rate': 6, 'channel': 7, 'is_accepted': 8})
Row(('REQ000349', 'INQ000349', 3, '론C', datetime.date(2025, 7, 1), 3144000.0, 13.96071547671699, 'Direct', False), {'loan_req_id': 0, 'inquiry_id': 1, 'table_reg_no': 2, 'product_id': 3, 'inquiry_date': 4, 'offer_limit': 5, 'offer_rate': 6, 'channel': 7, 'is_accepted': 8})
Row(('REQ000561', 'INQ000561', 1, 

In [14]:
from google.cloud.bigquery import Client

# Assuming the 'client' object is already initialized from previous steps
# client = Client()

project_id = 'bigquery-mockup-test'  # Replace with your GCP project ID
dataset_id = 'mockup'      # Replace with your BigQuery dataset ID
table_id_a = 'tb_a'          # Replace with your BigQuery table A ID
table_id_b = 'tb_b'          # Replace with your BigQuery table B ID

query = f"""
SELECT
    a.*,
    b.offer_limit
FROM
    `{project_id}.{dataset_id}.{table_id_a}` AS a
JOIN
    (
        SELECT
            inquiry_id,
            offer_limit,
            ROW_NUMBER() OVER(PARTITION BY inquiry_id ORDER BY table_reg_no DESC) as rn -- inquiry_id별로 table_reg_no가 가장 큰(최근) 행을 선택
        FROM
            `{project_id}.{dataset_id}.{table_id_b}`
    ) AS b
ON a.inquiry_id = b.inquiry_id
WHERE b.rn = 1 AND b.offer_limit = 0 -- Added condition to filter for offer_limit = 0
LIMIT 100 -- Displaying the first 100 rows as an example
"""

query_job = client.query(query)
results = query_job.result()

# Display the results (first 100 rows)
for i, row in enumerate(results):
    if i == 0:
        print(row.keys()) # Print column names
    print(row)