# **TradeCare: Data Collection Notebook**

## Objectives
* Fetch historical Bitcoin OHLCV (Open, High, Low, Close, Volume) data from a GitHub-hosted repository that provides automated daily updates.
* Verify data loaded correctly (basic checks)
* Understand data structure and characteristics
* Document data source and live data collection strategy

## Inputs
*  **Data Source:** GitHub Repository (automated updates)
*   **URL:** https://raw.githubusercontent.com/mouadja02/bitcoin-hourly-ohclv-dataset/main/btc-hourly-price_2015_2025.csv\n",
*   **Asset:** BTC-USD
*   **Timeframe:** 1 Hour
*   **Period:** November 2014 - present

## Outputs
* DataFrame loaded in memory for exploration
* Data understanding documented
* Live data approach: No CSV files saved (subsequent notebooks fetch fresh) 

## Additional Comments
This GitHub dataset provides a **unique combination** rarely found in ML projects:

* **Fresh & Maintained:** Automated workflow fetches current data from CryptoCompare API daily and stores backups on GitHub. Repository contains Bitcoin hourly price data from 2015 to present with continuous updates
* **Simple**: Direct CSV access via single URL
* **Free**: No API keys or costs  
* **Reliable**: No rate limits or auth failures  
* **Transparent**: Git history shows every change  
* **Scalable**: Should work in production environments  

**Live Data Strategy:**
* This notebook fetches data from URL and explores it
* No CSV files are saved (live data approach)
* Each notebook in the pipeline will fetch fresh data as needed
* This ensures always up-to-date analysis and predictions
* Trade-off: Requires internet connection, slightly slower but more current

---

## Change Working Directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with `os.getcwd()`

In [2]:
import os
current_dir = os.getcwd()
current_dir

'/Users/ilianamarquez/Documents/vscode-projects/trade-care/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [5]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [6]:
current_dir = os.getcwd()
current_dir

'/Users/ilianamarquez/Documents/vscode-projects/trade-care'

# Fetch Data from GitHub

Import Required Libraries       

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime

## Define Data Source   

In [11]:
# GitHub raw CSV URL
data_url = "https://raw.githubusercontent.com/mouadja02/bitcoin-hourly-ohclv-dataset/main/btc-hourly-price_2015_2025.csv"

print(f"Data Source: {data_url}")
print("Live data approach: Fetching fresh data directly from GitHub")

Data Source: https://raw.githubusercontent.com/mouadja02/bitcoin-hourly-ohclv-dataset/main/btc-hourly-price_2015_2025.csv
Live data approach: Fetching fresh data directly from GitHub


## Download Data from GitHub

In [13]:
# Fetch the CSV file directly into DataFrame
print("Fetching Bitcoin hourly data from GitHub...")

try:
    df = pd.read_csv(data_url)
    
    print(f"✓ Data fetched successfully")
    print(f"✓ Shape: {df.shape[0]:,} rows × {df.shape[1]} columns")
    
except Exception as e:
    print(f"✗ Error fetching data: {e}")
    raise

Fetching Bitcoin hourly data from GitHub...
✓ Data fetched successfully
✓ Shape: 96,570 rows × 9 columns


## Validate Data Integrity

---

## Validate Data Integrity

## Load CSV Data

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In case you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.