# BitcoinPetl API Demo

This notebook demonstrates the core ETL helper functions in `bitcoin_petl_utils.py`.  
We’ll fetch live Bitcoin prices, generate demo data, apply a variety of PETL transformations,  
and show how to convert to pandas for further analysis.

## Imports & Setup

We’re loading the minimal runtime dependencies.

* petl for pure-ETL table transforms
* pandas for DataFrame conversion demos
* datetime for timestamp conversions
* Our three API functions from bitcoin_petl_utils.py

In [1]:
!pip install petl pandas requests 
import petl as etl
import pandas as pd
from datetime import datetime
from bitcoin_petl_utils import (
    fetch_btc_price_table,
    filter_recent,
    expand_demo_rows,
)

Collecting petl
  Downloading petl-1.7.16.tar.gz (420 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting pandas
  Downloading pandas-2.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting requests
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting numpy>=1.20.3 (from pandas)
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting charset-normalizer<4,>=2 (from requests)
  Downloading charset_normalizer-3.4.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (35 kB)
Collecting urllib3<3,>=1.21.1 (from requests)
  Down

## Fetch & Expand to 5 Rows

In [2]:
# 1. Fetch a single live row
tbl = fetch_btc_price_table()

# 2. Expand into 5 demo rows, each 60s earlier than the next
demo_tbl = expand_demo_rows(tbl, n=5, dt=60)

print("Demo PETL table (5 rows):")
print(etl.look(demo_tbl))

Demo PETL table (5 rows):
+------------+-----------+
| timestamp  | price_usd |
| 1747612967 |    106307 |
+------------+-----------+
| 1747613027 |    106307 |
+------------+-----------+
| 1747613087 |    106307 |
+------------+-----------+
| 1747613147 |    106307 |
+------------+-----------+
| 1747613207 |    106307 |
+------------+-----------+



You should see a 5-row table with identical price values and timestamps spaced by one minute.
Confirms both fetch_btc_price_table() and expand_demo_rows() work.

## Convert, Rename & Sort

* Convert raw UNIX timestamp into a readable time_str.

* Cast price_usd to float and rename to price_usd_float.

* Sort rows by price descending to highlight the highest values first.

In [3]:
# Convert UNIX timestamp to human string, cast price, rename cols, then sort by price desc
converted = (
    demo_tbl
    .convert('timestamp', lambda t: datetime.fromtimestamp(t).strftime('%Y-%m-%d %H:%M:%S'))
    .convert('price_usd', float)
    .rename('timestamp', 'time_str')
    .rename('price_usd', 'price_usd_float')
    .sort('price_usd_float', reverse=True)
)

print("After convert → rename → sort:")
print(etl.look(converted))

After convert → rename → sort:
+-----------------------+-----------------+
| time_str              | price_usd_float |
| '2025-05-19 00:02:47' |        106307.0 |
+-----------------------+-----------------+
| '2025-05-19 00:03:47' |        106307.0 |
+-----------------------+-----------------+
| '2025-05-19 00:04:47' |        106307.0 |
+-----------------------+-----------------+
| '2025-05-19 00:05:47' |        106307.0 |
+-----------------------+-----------------+
| '2025-05-19 00:06:47' |        106307.0 |
+-----------------------+-----------------+



The printed table shows time_str and price_usd_float columns, sorted by price in descending order.
Demonstrates chaining multiple PETL transforms in one pipeline.

## Binning & Aggregation

* Create a new column price_k_usd by flooring the float price to the nearest $1000.

* Group by that bucket and aggregate to count rows per bucket.
This shows grouping and summary without leaving PETL.

In [4]:
# 5. Bin prices into $1000 buckets and count per bucket
binned = (
    converted
    # create a new 'price_k' field: floor(price/1000)*1000
    .convert('price_usd_float', lambda p: int(p // 1000) * 1000)
    .rename('price_usd_float', 'price_k_usd')
)
# aggregate: count how many rows fall in each bucket
agg = etl.aggregate(
    binned,
    key='price_k_usd',
    aggregation={'count': (lambda rows: sum(1 for _ in rows))}
)
print("Price buckets and counts:")
print(etl.look(agg))


Price buckets and counts:
+-------------+-------+
| price_k_usd | count |
|      106000 |     5 |
+-------------+-------+



The output lists each bucket (e.g. 56000) and a count (likely 5).
Illustrates PETL’s ability to bin and summarize tabular data.

## Filter Recent Rows

Demonstrate time-window filtering on a PETL table using our helper.
Since our demo spans 5 minutes total, all rows should remain.

In [5]:
# Demonstrate filter_recent on a single-row table
recent_tbl = filter_recent(tbl, lookback_min=10)
print("\nAfter filter_recent(tbl, 10):")
print(etl.look(recent_tbl))


After filter_recent(tbl, 10):
+------------+-----------+
| timestamp  | price_usd |
| 1747613207 |    106307 |
+------------+-----------+



## Expand Demo Rows

In [6]:
# Create a 5-row demo table for demonstration purposes
demo_tbl = expand_demo_rows(tbl, n=5, dt=60)
print("\nDemo table with 5 synthetic rows:")
print(etl.look(demo_tbl))


Demo table with 5 synthetic rows:
+------------+-----------+
| timestamp  | price_usd |
| 1747612967 |    106307 |
+------------+-----------+
| 1747613027 |    106307 |
+------------+-----------+
| 1747613087 |    106307 |
+------------+-----------+
| 1747613147 |    106307 |
+------------+-----------+
| 1747613207 |    106307 |
+------------+-----------+



## Convert to pandas DataFrame

Illustrate interoperability: switch from PETL to pandas DataFrame.
Parse the time_str into a datetime index.

In [7]:
# Convert the PETL table to a pandas DataFrame
df = etl.todataframe(demo_tbl)
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
df.set_index('timestamp', inplace=True)
df.head()

Unnamed: 0_level_0,price_usd
timestamp,Unnamed: 1_level_1
2025-05-19 00:02:47,106307
2025-05-19 00:03:47,106307
2025-05-19 00:04:47,106307
2025-05-19 00:05:47,106307
2025-05-19 00:06:47,106307


'df.head()' shows the first rows with a datetime index and price_k_usd column.
Confirms you can drop into pandas at any point.

## Error Handling Example

In [8]:
# Show error handling by pointing to a bad URL
import importlib
import bitcoin_petl_utils as utils
# Temporarily break the URL
utils.CG_URL = "https://api.coingecko.invalid/foo"
try:
    _ = fetch_btc_price_table()
except Exception as e:
    print("Caught an error as expected:", e)
# Restore original module state
importlib.reload(utils)

Caught an error as expected: HTTPSConnectionPool(host='api.coingecko.invalid', port=443): Max retries exceeded with url: /foo (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ffb517b9e20>: Failed to resolve 'api.coingecko.invalid' ([Errno -2] Name or service not known)"))


<module 'bitcoin_petl_utils' from '/data/bitcoin_petl_utils.py'>

## Wrap-Up & Takeaways

1. **`fetch_btc_price_table()`**  
   - Fetches a single, real-time Bitcoin price from CoinGecko.  
   - Returns a one-row Petl table with UNIX `timestamp` and `price_usd`.

2. **`expand_demo_rows(tbl, n, dt)`**  
   - Clones that one row into `n` rows, each shifted by `dt` seconds.  
   - Useful for showing ETL operations on multi-row data in tutorials.  
   - In this demo, we generated 5 rows spaced 1 minute apart.

3. **`filter_recent(table, lookback_min)`**  
   - Converts the `timestamp` column to integers and keeps only rows  
     within the last `lookback_min` minutes.  
   - When run on our 5-row demo, it lets you see how filtering works  
     across multiple records rather than just one.

4. **Converting Petl → pandas**  
   - `etl.todataframe()` turns a Petl table into a pandas DataFrame.  
   - After parsing the timestamps into datetime objects, you can  
     leverage pandas’ powerful time-series tools (rolling windows, plotting, etc.).

---

By walking through:

- **a real fetch**,  
- **synthetic multi-row generation**,  
- **time-based filtering**,  
- and **conversion to pandas**,  

you now have a clear recipe for integrating live BTC data into any ETL or analytics pipeline. 