<span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Feature Backfill for SOLANA bitcoin</span>


## 🗒️ The tasks of this script
1. Download historical prices for SOLANA and Bitcoin as CSV files
2. Update the path of the CSV files in this notebook to point to the ones that you downloaded
5. Create an account on www.hopsworks.ai and get your HOPSWORKS_API_KEY
6. Run notebook to upload the feature on a hopsworks feature storage



### <span style='color:#ff5f27'> 📝 Imports

In [22]:
import pandas as pd
import hopsworks
from utils import *
import json
import os
import warnings
from dotenv import load_dotenv

warnings.filterwarnings("ignore")

### IF YOU WANT TO WIPE OUT ALL OF YOUR FEATURES AND MODELS, run the cell below

In [23]:
# If you haven't set the env variable 'HOPSWORKS_API_KEY', then uncomment the next line and enter your API key
# with open('../../data/hopsworks-api-key.txt', 'r') as file:
#     os.environ["HOPSWORKS_API_KEY"] = file.read().rstrip()
# #proj = hopsworks.login()
#util.purge_project(proj)

### Connect to hopsworks and upload historical data

---

In [24]:
load_dotenv()
os.environ["HOPSWORKS_API_KEY"] = os.getenv("HOPSWORKS_API_KEY")
project = hopsworks.login()

2024-12-22 13:58:03,339 INFO: Closing external client and cleaning up certificates.
Connection closed.
2024-12-22 13:58:03,342 INFO: Initializing external client
2024-12-22 13:58:03,343 INFO: Base URL: https://c.app.hopsworks.ai:443
2024-12-22 13:58:04,741 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1160346


### Add historical data to hopsworks feature storage

#### Add historical solana prices

In [25]:
hist_data_sol = pd.read_csv("data/SOL_USD Binance Historical Data.csv")
hist_data_sol.columns = ['date', 'price', 'open', 'high', 'low', 'vol', 'change']
hist_data_sol = hist_data_sol.drop(columns=['vol', 'change', 'price', 'high', 'low'])
hist_data_sol['date']=pd.to_datetime(hist_data_sol['date'], format='%m/%d/%Y')
hist_data_sol


Unnamed: 0,date,open
0,2024-12-21,194.280
1,2024-12-20,194.310
2,2024-12-19,207.050
3,2024-12-18,223.550
4,2024-12-17,216.390
...,...,...
1588,2020-08-16,3.173
1589,2020-08-15,3.410
1590,2020-08-14,3.730
1591,2020-08-13,3.756


In [26]:
fs = project.get_feature_store() 
solana_fg = fs.get_or_create_feature_group(
    name='solana',
    description='Solana price',
    version=7,
    primary_key=["date"])

solana_fg.insert(hist_data_sol)


Uploading Dataframe: 100.00% |██████████| Rows 1593/1593 | Elapsed Time: 00:04 | Remaining Time: 00:00


(Job('solana_7_offline_fg_materialization', 'SPARK'), None)

In [27]:
solana_fg.update_feature_description("date", "Date")
#solana_fg.update_feature_description("price", "The price of Solana")
solana_fg.update_feature_description("open", "The opening price of Solana")
#solana_fg.update_feature_description("high", "The highest price of Solana")
#solana_fg.update_feature_description("low", "The lowest price of Solana")
#solana_fg.update_feature_description("vol", "Volume")
#solana_fg.update_feature_description("change", "Change in price")


<hsfs.feature_group.FeatureGroup at 0x164766650>

#### Add historical data for bitcoin

In [28]:
hist_data_btc = pd.read_csv("data/BTC_USD Binance Historical Data.csv")
hist_data_btc.columns = ['date', 'price', 'open', 'high', 'low', 'vol', 'change']
hist_data_btc = hist_data_btc.drop(columns=['vol', 'change', 'price', 'high', 'low'])
hist_data_btc['date']=pd.to_datetime(hist_data_btc['date'], format='%m/%d/%Y')

hist_data_btc

Unnamed: 0,date,open
0,2024-12-21,97786.5
1,2024-12-20,97517.6
2,2024-12-19,100220.0
3,2024-12-18,106138.2
4,2024-12-17,106083.4
...,...,...
1812,2020-01-05,7355.1
1813,2020-01-04,7340.5
1814,2020-01-03,6964.4
1815,2020-01-02,7193.4


In [29]:
fs = project.get_feature_store() 
bitcoin_fg = fs.get_or_create_feature_group(
    name='bitcoin',
    description='Bitcoin price',
    version=7,
    primary_key=["date"])

bitcoin_fg.insert(hist_data_btc)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1160346/fs/1151049/fg/1393092


Uploading Dataframe: 100.00% |██████████| Rows 1817/1817 | Elapsed Time: 00:03 | Remaining Time: 00:00


Launching job: bitcoin_7_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1160346/jobs/named/bitcoin_7_offline_fg_materialization/executions


(Job('bitcoin_7_offline_fg_materialization', 'SPARK'), None)

In [30]:
bitcoin_fg.update_feature_description("date", "Date")
#bitcoin_fg.update_feature_description("price", "The price of Bitcoin")
bitcoin_fg.update_feature_description("open", "The opening price of Bitcoin")
#bitcoin_fg.update_feature_description("high", "The highest price of Bitcoin")
#bitcoin_fg.update_feature_description("low", "The lowest price of Bitcoin")
#bitcoin_fg.update_feature_description("vol", "Volume")
#bitcoin_fg.update_feature_description("change", "Change in price")


<hsfs.feature_group.FeatureGroup at 0x16b0cfa90>

#### Add historical data for fear and greed index

In [31]:
import requests
import pandas as pd
import io

# URL of the API
url = "https://api.alternative.me/fng/?limit=0&format=csv"

# Fetch data from the API
response = requests.get(url)

if response.status_code == 200:
    content = response.text
    if "data" in content:
        # Locate and clean the pseudo-CSV section
        start_idx = content.find("[") + 1
        end_idx = content.find("]", start_idx)
        raw_data = content[start_idx:end_idx].strip()
        
        # Replace single quotes and braces for easier parsing
        raw_data = raw_data.replace("'", "").replace("{", "").replace("}", "")
        
        # Debug: Print raw_data to check its format
        #print("Raw data:", raw_data)
        
        # Split into rows
        rows = raw_data.split("\n")
        data = []
        
        for row in rows:
            # Debug: Print each row to check format
            #print("Processing row:", row)
            if row == "fng_value,fng_classification,date":
                # Skip header row
                continue
            
            # Extract key-value pairs
            key_values = row.split(",")
            # Ensure each field has a key and value
            if len(key_values) == 3:
                data.append(key_values[0].strip())
                data.append(key_values[1].strip())
                data.append(key_values[2].strip())

            else:
                print("Skipping malformed row:", row)

        
        # Assuming rows are in order of [date, fng_value, fng_classification] repeat
        # Split data into chunks of 3 for each record
        structured_data = [data[i:i + 3] for i in range(0, len(data), 3)]
        
        # Create DataFrame
        fng_df = pd.DataFrame(structured_data, columns=["date", "fng_value", "fng_classification"])
        print(fng_df.head())
    else:
        print("Data field not found in response.")
else:
    print(f"Failed to fetch data: {response.status_code}")


         date fng_value fng_classification
0  22-12-2024        73              Greed
1  21-12-2024        73              Greed
2  20-12-2024        74              Greed
3  19-12-2024        75              Greed
4  18-12-2024        81      Extreme Greed


In [32]:
#fixing the date format
fng_df['date'] = pd.to_datetime(fng_df['date'], format='%d-%m-%Y').dt.strftime('%m/%d/%Y')
fng_df['date']=pd.to_datetime(fng_df['date'], format='%m/%d/%Y')
fng_df

Unnamed: 0,date,fng_value,fng_classification
0,2024-12-22,73,Greed
1,2024-12-21,73,Greed
2,2024-12-20,74,Greed
3,2024-12-19,75,Greed
4,2024-12-18,81,Extreme Greed
...,...,...,...
2508,2018-02-05,11,Extreme Fear
2509,2018-02-04,24,Extreme Fear
2510,2018-02-03,40,Fear
2511,2018-02-02,15,Extreme Fear


In [33]:
fs = project.get_feature_store() 
fng_fg = fs.get_or_create_feature_group(
    name='f_n_g_index',
    description='fear_and_greed_index',
    version=6,
    primary_key=["date"])

fng_fg.insert(fng_df)

Uploading Dataframe: 100.00% |██████████| Rows 2513/2513 | Elapsed Time: 00:03 | Remaining Time: 00:00


Launching job: f_n_g_index_6_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1160346/jobs/named/f_n_g_index_6_offline_fg_materialization/executions


(Job('f_n_g_index_6_offline_fg_materialization', 'SPARK'), None)

#### Enter a description for each feature in the Feature Group

In [34]:
fng_fg.update_feature_description("date", "Date of the Fear and Greed Index")
fng_fg.update_feature_description("fng_value", "Fear and Greed Index value")
fng_fg.update_feature_description("fng_classification", "Fear and Greed Index classification")


<hsfs.feature_group.FeatureGroup at 0x16c2e3bb0>