Activate Python virtual environment first and install all packages before running the notebook.

1. create a new virtual environment
```
python -m venv venv
```

2. activate the virtual environment
```
source venv/Scripts/activate
```

3. install packages
```
pip install -r requirements.txt
```

4. make sure your kernel is switched to the venv python kernel






Goal: Examine correlations between global,socially significant events and Bitcoin blockchain metrics (block congestion, price movement, metadata notes) over time.
Utilizing publick bitcoin blockchain datasets stored in bigquery
to begin, I installed the following libraries:
pip install google-cloud-bigquery pandas pyarrow

Once I started trying to communicate with bigquery database, I had to also install:
pip install db-dtypes
to help pandas understand how to display data types used in SQL like "DATE, TIMESTAMP, STRUCT,etc.

Tables used:
 
A. bigquery-public-data.crypto_bitcoin.blocks table
B. bigquery-public-data.crypto_bitcoin.transactions table



Note to myself, I need to send a slack message to Eileen with the content of my credentials file and tell her she needs this to run my project.

In [2]:
from google.cloud import bigquery
import os
#Set the environement variable for key file (telling python where my login key file is)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Credentials/config.json"
#Start the BigQuery client using my key
client = bigquery.Client()



Due to data constraints, I decided to refine my dataset to focus on the 6 months period before/after covid lockdown took place in the US (March 1st 2020~March 30th 2020)
Due to issues to the original datasets I was working with (not having data for the date ranges I wanted), I need to change and update the list of tables I'm using



In [3]:
# Checking if the new table "crypto_bitcoin.blocks" has information for the timeperiod I'm looking for:
query_sample = """
SELECT *
FROM `bigquery-public-data.crypto_bitcoin.blocks`
LIMIT 5
"""

df_sample = client.query(query_sample).to_dataframe()
df_sample.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   hash               5 non-null      object             
 1   size               5 non-null      Int64              
 2   stripped_size      5 non-null      Int64              
 3   weight             5 non-null      Int64              
 4   number             5 non-null      Int64              
 5   version            5 non-null      Int64              
 6   merkle_root        5 non-null      object             
 7   timestamp          5 non-null      datetime64[us, UTC]
 8   timestamp_month    5 non-null      dbdate             
 9   nonce              5 non-null      object             
 10  bits               5 non-null      object             
 11  coinbase_param     5 non-null      object             
 12  transaction_count  5 non-null      Int64              

In [4]:
# Checking if the new table "crypto_bitcoin.blocks" has information for the timeperiod I'm looking for:
query_check_range = """
SELECT
  MIN(timestamp) AS start_date,
  MAX(timestamp) AS end_date
FROM `bigquery-public-data.crypto_bitcoin.blocks`
"""

df_range = client.query(query_check_range).to_dataframe()
df_range

Unnamed: 0,start_date,end_date
0,2009-01-03 18:15:05+00:00,2025-07-30 01:22:59+00:00


so we made sure the information in the crypto_bitcoin.blocks table has block information starting 2009~2025 July 28th.
now checking the type of information obtained on the crypto_bitcoin.transactions table:

In [5]:
# checking the type of information obtained on the crypto_bitcoin.transactions table:
query_preview_tx = """
SELECT *
FROM `bigquery-public-data.crypto_bitcoin.transactions`
LIMIT 5
"""
df_tx = client.query(query_preview_tx).to_dataframe()
df_tx.info()
df_tx.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 17 columns):
 #   Column                 Non-Null Count  Dtype              
---  ------                 --------------  -----              
 0   hash                   5 non-null      object             
 1   size                   5 non-null      Int64              
 2   virtual_size           5 non-null      Int64              
 3   version                5 non-null      Int64              
 4   lock_time              5 non-null      Int64              
 5   block_hash             5 non-null      object             
 6   block_number           5 non-null      Int64              
 7   block_timestamp        5 non-null      datetime64[us, UTC]
 8   block_timestamp_month  5 non-null      dbdate             
 9   input_count            5 non-null      Int64              
 10  output_count           5 non-null      Int64              
 11  input_value            5 non-null      object             
 12

Unnamed: 0,hash,size,virtual_size,version,lock_time,block_hash,block_number,block_timestamp,block_timestamp_month,input_count,output_count,input_value,output_value,is_coinbase,fee,inputs,outputs
0,f470e9dd02036be070c68384b2ba47d1bfe873c30a37ac...,223,223,1,0,000000000000000000021e323a7cada08a157493cef87e...,907745,2025-07-30 00:47:05+00:00,2025-07-01,1,2,15814034.0,15807254.0,False,6780.0,"[{'index': 0, 'spent_transaction_hash': '5d4f6...","[{'index': 0, 'script_asm': '0 1139d490a1c69df..."
1,03f9d3b4535bba3deefa1faaa4acc8a1005ca5d37186cc...,222,141,1,0,000000000000000000021e323a7cada08a157493cef87e...,907745,2025-07-30 00:47:05+00:00,2025-07-01,1,2,403665.0,403101.0,False,564.0,"[{'index': 0, 'spent_transaction_hash': 'afa6b...","[{'index': 0, 'script_asm': '0 026f87eb5ca55a1..."
2,7d205e01ea36c211804d26142558e4aa6d75c37ea6b8bf...,313,212,2,0,000000000000000000021e323a7cada08a157493cef87e...,907745,2025-07-30 00:47:05+00:00,2025-07-01,2,2,465605.0,465351.0,False,254.0,"[{'index': 0, 'spent_transaction_hash': '79037...","[{'index': 0, 'script_asm': '1 fe5271d4aa8cef9..."
3,c35a862fb1f4781f90fc63e64e0697371cab65572b91b9...,355,196,2,0,000000000000000000021e323a7cada08a157493cef87e...,907745,2025-07-30 00:47:05+00:00,2025-07-01,1,3,24458.0,24335.0,False,123.0,"[{'index': 0, 'spent_transaction_hash': 'a9e5a...","[{'index': 0, 'script_asm': '0 39215284c18e35c..."
4,d80b0dff00193ff28d0dce119be962ebeb0412c2519da5...,355,196,2,0,000000000000000000021e323a7cada08a157493cef87e...,907745,2025-07-30 00:47:05+00:00,2025-07-01,1,3,30681.0,30558.0,False,123.0,"[{'index': 0, 'spent_transaction_hash': 'b99de...","[{'index': 0, 'script_asm': '0 39215284c18e35c..."


In [6]:
# Now checking to make sure there's enough data during Jan-June 2020 in the transactions table:
query_row_count = """
SELECT COUNT(*) as transaction_count
FROM `bigquery-public-data.crypto_bitcoin.transactions`
WHERE block_timestamp BETWEEN '2020-01-01' AND '2020-06-30'
"""
df_row_count = client.query(query_row_count).to_dataframe()
print(df_row_count)

   transaction_count
0           54643498


In [7]:
# doing a small check to see if I can join things on the blocks and transactions table:
query_join_test = """
SELECT
  t.hash AS transaction_id,
  t.block_number,
  b.hash AS block_hash,
  b.timestamp AS block_time,
  t.fee,
  t.input_value,
  t.output_value
FROM
  `bigquery-public-data.crypto_bitcoin.transactions` t
JOIN
  `bigquery-public-data.crypto_bitcoin.blocks` b
ON
  t.block_number = b.number
WHERE
  t.block_timestamp BETWEEN '2020-01-01' AND '2020-06-30'
LIMIT 10
"""
df_join_test = client.query(query_join_test).to_dataframe()
df_join_test.head()

Unnamed: 0,transaction_id,block_number,block_hash,block_time,fee,input_value,output_value
0,e26b5b8a792a9d1fcdf44d7501153b3bea3f4e1479388c...,616863,00000000000000000000c6aabdf67a4ceb247bf103f37e...,2020-02-11 02:25:44+00:00,121000.0,16824785.0,16703785.0
1,f3e20db695d6bdfc2f61f2caa436d63acdbf18bc3d12bf...,619287,00000000000000000006b0bf44a5f27600c448437200c8...,2020-02-28 00:34:51+00:00,31200.0,130000.0,98800.0
2,16834157e81885ff86478c070e5ef14aca88d40a201b74...,617057,0000000000000000000e6ca0cf6564cbb407733db13fa7...,2020-02-12 10:25:55+00:00,31200.0,130000.0,98800.0
3,a7f2d7fb2b0b747dc4081df3b4ddfdddeadb85cfb8c963...,618956,0000000000000000000f71d13d2947fcdf80f5c6089922...,2020-02-25 17:06:35+00:00,31200.0,130000.0,98800.0
4,5e272a95c2cc52ba6d9c45b84db1befe10d3a79ac18a89...,616183,000000000000000000099232a8e5315086be3a9a4cd5c2...,2020-02-06 03:03:04+00:00,31200.0,130000.0,98800.0


In [8]:
query_blocks_covid = """
SELECT *
FROM `bigquery-public-data.crypto_bitcoin.blocks`
WHERE timestamp BETWEEN '2020-01-01' AND '2020-06-30'
"""
df_blocks = client.query(query_blocks_covid).to_dataframe()
df_blocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26253 entries, 0 to 26252
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   hash               26253 non-null  object             
 1   size               26253 non-null  Int64              
 2   stripped_size      26253 non-null  Int64              
 3   weight             26253 non-null  Int64              
 4   number             26253 non-null  Int64              
 5   version            26253 non-null  Int64              
 6   merkle_root        26253 non-null  object             
 7   timestamp          26253 non-null  datetime64[us, UTC]
 8   timestamp_month    26253 non-null  dbdate             
 9   nonce              26253 non-null  object             
 10  bits               26253 non-null  object             
 11  coinbase_param     26253 non-null  object             
 12  transaction_count  26253 non-null  Int64      

Since I found out the bitcoin transactions table for March of 2020 adds up to 10 Million rows, I decided to extract a sample of 100K rows of data

In [9]:
# Getting a sample (100k rows) of data from the bitcoin transactions table for March 2020:
query_transactions_march_2020 = """
SELECT *
FROM `bigquery-public-data.crypto_bitcoin.transactions`
WHERE block_timestamp BETWEEN '2020-03-01' AND '2020-03-31 23:59:59'
LIMIT 100000
"""
df_transactions = client.query(query_transactions_march_2020).to_dataframe()
df_transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 17 columns):
 #   Column                 Non-Null Count   Dtype              
---  ------                 --------------   -----              
 0   hash                   100000 non-null  object             
 1   size                   100000 non-null  Int64              
 2   virtual_size           100000 non-null  Int64              
 3   version                100000 non-null  Int64              
 4   lock_time              100000 non-null  Int64              
 5   block_hash             100000 non-null  object             
 6   block_number           100000 non-null  Int64              
 7   block_timestamp        100000 non-null  datetime64[us, UTC]
 8   block_timestamp_month  100000 non-null  dbdate             
 9   input_count            100000 non-null  Int64              
 10  output_count           100000 non-null  Int64              
 11  input_value            99961 non-null   

In [10]:
(df_transactions['block_timestamp'].min(), df_transactions['block_timestamp'].max())

(Timestamp('2020-03-01 00:04:03+0000', tz='UTC'),
 Timestamp('2020-03-31 23:51:06+0000', tz='UTC'))

In [11]:
# SQLite start
import sqlite3


In [12]:
# Connecting to a new SQLite database to create a file for my dataset
connection = sqlite3.connect("bitcoin_data.db")

In [13]:
# Saving the blocks DataFrame into the database as a new table called "blocks"
table_name = "blocks"
data = df_blocks
# Writing the data in df_blocks to the database
data.to_sql(
        name = table_name,
        con = connection,
        if_exists = "replace",
        index = False
)


26253

In [14]:
#Testing my code to see if it is working:
query1 = """
    SELECT block_id
    FROM `bigquery-public-data.bitcoin_blockchain.blocks`
    LIMIT 5
"""


df = client.query(query1).to_dataframe()
print(df)

                                            block_id
0  00000000000000000045fc015fe17335fdedb9ebca4b3d...
1  000000000000000000051cb4a6272786418c5357bff9d0...
2  000000000000000000d091f9c5d7815e1a685f702b4164...
3  000000000000000000235b8a4ec7a7cc12c1c9133f2d34...
4  000000000000000000719106d55fea7af59d31cacef736...


In [15]:
import sys
print(sys.executable)


c:\C drive Code You Projects\Code You Data Analyst Course Module 3\Module 3 Capstone Project\venv\Scripts\python.exe


In [16]:
query2 = """
    SELECT *
    FROM `bigquery-public-data.bitcoin_blockchain.blocks`
    LIMIT 5
"""
print(df.columns)

Index(['block_id'], dtype='object')


In [17]:
query3 = """
    SELECT *
    FROM `bigquery-public-data.bitcoin_blockchain.transactions`
    LIMIT 5
"""
print(df.columns)

Index(['block_id'], dtype='object')


In [18]:
query4 = """
    SELECT 
      o.script_asm,
      o.block_number,
      o.value,
      o.transaction_hash
    FROM `bigquery-public-data.crypto_bitcoin.outputs` AS o
    WHERE o.script_asm LIKE '%OP_RETURN%'
    LIMIT 10
"""

df = client.query(query4).to_dataframe()
print(df.head())


                                          script_asm  block_number value  \
0                               OP_RETURN 13 1310996        907680  0E-9   
1                          OP_RETURN 13 140114001602        907680  0E-9   
2                               OP_RETURN 13 1310996        907680  0E-9   
3  OP_RETURN 3d3a653a3078666266356333363662316231...        907680  0E-9   
4                          OP_RETURN 13 140114001602        907680  0E-9   

                                    transaction_hash  
0  b8befb041f89d368e0ab1cbebaa68cb4780c4a3b33188a...  
1  61b6bf4899257f6568b0e36bd8bf292c6a616ff8cf2b7e...  
2  779c7d4dbc96b85af72198bcc63e6f927ead3033492fd9...  
3  5e67f95bdac37babd635cd29f9c17ce1efca9176bd9903...  
4  dd70319da42fdfb9d9b9327e0e299253dee59c92b46416...  


In [19]:
query5 = """
    SELECT *
    FROM `bigquery-public-data.crypto_bitcoin.outputs` AS o
    WHERE o.script_asm LIKE '%OP_RETURN%'
    LIMIT 10
"""

df = client.query(query5).to_dataframe()
print(df.columns)

Index(['transaction_hash', 'block_hash', 'block_number', 'block_timestamp',
       'index', 'script_asm', 'script_hex', 'required_signatures', 'type',
       'addresses', 'value'],
      dtype='object')


Stopped here as of 5/31. when I left off, I was able to see that tables:
A. bigquery-public-data.bitcoin_blockchain.blocks table
B. bigquery-public-data.bitcoin_blockchain.transactions table
C. bigquery-public-data.crypto_bitcoin.outputs table
cointained the data I needed to connect block timestamp to messages hidden in OP_RETURN column (in the outputs table) 
1. blocks → has block_hash and block_timestamp
2. transactions → has both transaction_hash and block_hash
3. outputs → has transaction_hash



