Goal: Examine correlations between global,socially significant events and Bitcoin blockchain metrics (block congestion, price movement, metadata notes) over time.
Utilizing publick bitcoin blockchain datasets stored in bigquery
to begin, I installed the following libraries:
pip install google-cloud-bigquery pandas pyarrow

Once I started trying to communicate with bigquery database, I had to also install:
pip install db-dtypes
to help pandas understand how to display data types used in SQL like "DATE, TIMESTAMP, STRUCT,etc.

Tables used:
 
A. bigquery-public-data.bitcoin_blockchain.blocks table
B. bigquery-public-data.bitcoin_blockchain.transactions table
C. bigquery-public-data.crypto_bitcoin.outputs table


In [1]:
%pip install -r requirements.txt


Defaulting to user installation because normal site-packages is not writeable
Collecting google-cloud-bigquery-storage==2.32.0 (from -r requirements.txt (line 3))
  Downloading google_cloud_bigquery_storage-2.32.0-py3-none-any.whl.metadata (9.8 kB)
Downloading google_cloud_bigquery_storage-2.32.0-py3-none-any.whl (296 kB)
Installing collected packages: google-cloud-bigquery-storage
Successfully installed google-cloud-bigquery-storage-2.32.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
from google.cloud import bigquery
import os
#Set the environement variable for key file (telling python where my login key file is)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Credentials/capstone-project-461516-4b2a88c5f4cd.json"
#Start the BigQuery client using my key
client = bigquery.Client()



In [None]:
# running a query on the bitcoin blockchain.blocks table to see what kind of columns there are:
query0 = """
    SELECT *
    FROM `bigquery-public-data.bitcoin_blockchain.blocks`
    LIMIT 5

"""
df_blocks = client.query(query0).to_dataframe()
df_blocks.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   block_id          5 non-null      object
 1   previous_block    5 non-null      object
 2   merkle_root       5 non-null      object
 3   timestamp         5 non-null      Int64 
 4   difficultyTarget  5 non-null      Int64 
 5   nonce             5 non-null      Int64 
 6   version           5 non-null      Int64 
 7   work_terahash     5 non-null      Int64 
 8   work_error        0 non-null      object
 9   transactions      5 non-null      object
 10  row_number        5 non-null      Int64 
dtypes: Int64(6), object(5)
memory usage: 602.0+ bytes


In [7]:
# running a query on the bitcoin blockchain.transactions table to see what kind of columns there are:
query01 = """
    SELECT *
    FROM `bigquery-public-data.bitcoin_blockchain.transactions`
    LIMIT 5

"""
df_transactions = client.query(query01).to_dataframe()
df_transactions.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   timestamp       5 non-null      Int64 
 1   transaction_id  5 non-null      object
 2   inputs          5 non-null      object
 3   outputs         5 non-null      object
 4   block_id        5 non-null      object
 5   previous_block  5 non-null      object
 6   merkle_root     5 non-null      object
 7   nonce           5 non-null      Int64 
 8   version         5 non-null      Int64 
 9   work_terahash   5 non-null      Int64 
 10  work_error      0 non-null      object
dtypes: Int64(4), object(7)
memory usage: 592.0+ bytes


In [8]:
df_transactions.head()


Unnamed: 0,timestamp,transaction_id,inputs,outputs,block_id,previous_block,merkle_root,nonce,version,work_terahash,work_error
0,1236391160000,8326f021b328ee0d14d4d366a9f6e6f0be68ea60f7e5a6...,[{'input_script_bytes': b'\x04\xff\xff\x00\x1d...,"[{'output_satoshis': 5000000000, 'output_scrip...",000000007fc54ccef03a1386155e5fd10229a80d592bd6...,00000000b46bb9d830499302bcf3e544e679897ab59553...,8326f021b328ee0d14d4d366a9f6e6f0be68ea60f7e5a6...,71398302,1,0,
1,1277390708000,85d7c3a8beafe9bcacffe974a84ced6ad08ff8f2bf5ba2...,[{'input_script_bytes': b'\x04B1\r\x1c\x02O\x1...,"[{'output_satoshis': 5000000000, 'output_scrip...",000000000393cb2dc72fc296c66d9a19d2c01493e70c6d...,000000000aee77db52b067bf6df9367b0b7cf27af8eae9...,f397614fb38ff8a145124839e15b7967567d22857b0fc4...,353990997,1,0,
2,1262550231000,13ab738b73ee3b58cdf78ce85e1c77e1c37626a2806b28...,[{'input_script_bytes': b'\x04j\xd8\x00\x1d\x0...,"[{'output_satoshis': 5000000000, 'output_scrip...",00000000c57747a26508ddee9c24fd14d7e2cd597af20d...,000000009f3ab6a1c4f6887b1039deb6b28a5e00e6bd80...,13ab738b73ee3b58cdf78ce85e1c77e1c37626a2806b28...,167000329,1,0,
3,1272343543000,d62314bb2d951f836ca9456cf2b951ddd74120b59ec299...,[{'input_script_bytes': b'\x04oT\x16\x1c\x02a\...,"[{'output_satoshis': 5000000000, 'output_scrip...",00000000080a0475223cbbc2da042d2e172b1de0f36fe4...,000000000769d691530735d4ec558c9857da83cd9767df...,d62314bb2d951f836ca9456cf2b951ddd74120b59ec299...,172080494,1,0,
4,1257606481000,28ccad7d77ecbb1f5ecc456168831818ef2f1bf27d3e88...,[{'input_script_bytes': b'\x04\xff\xff\x00\x1d...,"[{'output_satoshis': 5000000000, 'output_scrip...",0000000095d27443f3592d1486f444f1a3773a2dbde637...,0000000089c4eb2d48b233587f89afa255b61f497b0310...,28ccad7d77ecbb1f5ecc456168831818ef2f1bf27d3e88...,2849869340,1,0,


In [7]:
# running a query on the bitcoin blockchain.outputs table to see what kind of columns there are:
query02 = '''
SELECT *
FROM `bigquery-public-data.crypto_bitcoin.outputs`
LIMIT 5

'''
df_outputs = client.query(query02).to_dataframe()
df_outputs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype              
---  ------               --------------  -----              
 0   transaction_hash     5 non-null      object             
 1   block_hash           5 non-null      object             
 2   block_number         5 non-null      Int64              
 3   block_timestamp      5 non-null      datetime64[us, UTC]
 4   index                5 non-null      Int64              
 5   script_asm           5 non-null      object             
 6   script_hex           5 non-null      object             
 7   required_signatures  0 non-null      Int64              
 8   type                 5 non-null      object             
 9   addresses            5 non-null      object             
 10  value                5 non-null      object             
dtypes: Int64(3), datetime64[us, UTC](1), object(7)
memory usage: 587.0+ bytes


In [3]:
#Testing my code to see if it is working:
query1 = """
    SELECT block_id
    FROM `bigquery-public-data.bitcoin_blockchain.blocks`
    LIMIT 5
"""


df = client.query(query1).to_dataframe()
print(df)

                                            block_id
0  00000000000000000069564d0540f7d160fc7400e03c49...
1  000000000000000000022ab0c5740584d31006ddcd1e20...
2  000000000000000001628d23a7ba194fe9498676960947...
3  00000000000000000025a758640f78eff1fe5bc472ff03...
4  0000000000000000004769939c52cfa8e8f1585b580080...




In [4]:
import sys
print(sys.executable)


c:\Python313\python.exe


In [5]:
query2 = """
    SELECT *
    FROM `bigquery-public-data.bitcoin_blockchain.blocks`
    LIMIT 5
"""
print(df.columns)

Index(['block_id'], dtype='object')


In [6]:
query3 = """
    SELECT *
    FROM `bigquery-public-data.bitcoin_blockchain.transactions`
    LIMIT 5
"""
print(df.columns)

Index(['block_id'], dtype='object')


In [7]:
query4 = """
    SELECT 
      o.script_asm,
      o.block_number,
      o.value,
      o.transaction_hash
    FROM `bigquery-public-data.crypto_bitcoin.outputs` AS o
    WHERE o.script_asm LIKE '%OP_RETURN%'
    LIMIT 10
"""

df = client.query(query4).to_dataframe()
print(df.head())


                                          script_asm  block_number value  \
0  OP_RETURN 58325bbc0b0f83bf38d794ddddcd41ac0944...        906690  0E-9   
1  OP_RETURN 58325bbc0b0f83bf38d794ddddcd41ac0944...        906690  0E-9   
2  OP_RETURN 00034db3c093ea8ba03987aed40813c04acb...        906690  0E-9   
3  OP_RETURN 58325bbc0b0f83bf38d794ddddcd41ac0944...        906690  0E-9   
4  OP_RETURN 58325bbc0b0f83bf38d794ddddcd41ac0944...        906690  0E-9   

                                    transaction_hash  
0  eb62fd85aedcd9fec45bad73d048a1e8d0f48ae17ababd...  
1  124a375deda998fac0b70c561b83cc629117b96ff46b3f...  
2  5161061aa560fb6b9541d97b34c05bf61bb0ff60ab5abb...  
3  71587df89da762662f7e44011dbf2c362f600a7c0c896f...  
4  9abf76d1f14e37a54a2a462fa332c46f3167ff9aa7c39c...  




In [8]:
query5 = """
    SELECT *
    FROM `bigquery-public-data.crypto_bitcoin.outputs` AS o
    WHERE o.script_asm LIKE '%OP_RETURN%'
    LIMIT 10
"""

df = client.query(query5).to_dataframe()
print(df.columns)

Index(['transaction_hash', 'block_hash', 'block_number', 'block_timestamp',
       'index', 'script_asm', 'script_hex', 'required_signatures', 'type',
       'addresses', 'value'],
      dtype='object')




Stopped here as of 5/31. when I left off, I was able to see that tables:
A. bigquery-public-data.bitcoin_blockchain.blocks table
B. bigquery-public-data.bitcoin_blockchain.transactions table
C. bigquery-public-data.crypto_bitcoin.outputs table
cointained the data I needed to connect block timestamp to messages hidden in OP_RETURN column (in the outputs table) 
1. blocks → has block_hash and block_timestamp
2. transactions → has both transaction_hash and block_hash
3. outputs → has transaction_hash



