<span style="font-width:bold; font-size: 3rem; color:#333;">- Part 01: Feature Backfill for SOLANA bitcoin</span>


## 🗒️ The tasks of this script
1. Download historical prices for SOLANA and Bitcoin as CSV files
2. Update the path of the CSV files in this notebook to point to the ones that you downloaded
5. Create an account on www.hopsworks.ai and get your HOPSWORKS_API_KEY
6. Run notebook to upload the feature on a hopsworks feature storage



### <span style='color:#ff5f27'> 📝 Imports

In [1]:
import pandas as pd
import hopsworks
from utils import *
import json
import os
import warnings
from dotenv import load_dotenv

warnings.filterwarnings("ignore")

  from .autonotebook import tqdm as notebook_tqdm


### IF YOU WANT TO WIPE OUT ALL OF YOUR FEATURES AND MODELS, run the cell below

In [35]:
# If you haven't set the env variable 'HOPSWORKS_API_KEY', then uncomment the next line and enter your API key
# with open('../../data/hopsworks-api-key.txt', 'r') as file:
#     os.environ["HOPSWORKS_API_KEY"] = file.read().rstrip()
# #proj = hopsworks.login()
#util.purge_project(proj)

### Connect to hopsworks and upload historical data

---

In [2]:
load_dotenv()
os.environ["HOPSWORKS_API_KEY"] = os.getenv("HOPSWORKS_API_KEY")
project = hopsworks.login()

2024-12-23 14:39:03,019 INFO: Initializing external client
2024-12-23 14:39:03,036 INFO: Base URL: https://c.app.hopsworks.ai:443
2024-12-23 14:39:04,493 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1164448


### Add historical data to hopsworks feature storage

#### Add historical solana prices

In [3]:
hist_data_sol = pd.read_csv("data/historical_solana.csv")
hist_data_sol = hist_data_sol[["TIMESTAMP", 'OPEN', 'HIGH', 'LOW', 'CLOSE', "VOLUME", 'VOLUME_BUY', 'VOLUME_SELL']]
hist_data_sol.columns = hist_data_sol.columns.str.lower()
hist_data_sol.head()

Unnamed: 0,timestamp,open,high,low,close,volume,volume_buy,volume_sell
0,1623888000,40.23,40.57,38.3,39.13,6007.61849,2885.660324,3121.958166
1,1623974400,39.13,39.36,35.0,36.62,13557.357196,7925.472309,5631.884887
2,1624060800,36.62,37.35,35.0,35.45,16986.163716,9258.361133,7727.802583
3,1624147200,35.45,35.89,31.48,35.28,38681.265775,21052.579774,17628.686001
4,1624233600,35.28,35.28,26.0,26.55,41903.613224,15380.745579,26522.867646


In [44]:
hist_data_sol

Unnamed: 0,timestamp,open,high,low,close,volume,volume_buy,volume_sell
0,1623888000,40.23,40.57,38.30,39.13,6007.618490,2885.660324,3121.958166
1,1623974400,39.13,39.36,35.00,36.62,13557.357196,7925.472309,5631.884887
2,1624060800,36.62,37.35,35.00,35.45,16986.163716,9258.361133,7727.802583
3,1624147200,35.45,35.89,31.48,35.28,38681.265775,21052.579774,17628.686001
4,1624233600,35.28,35.28,26.00,26.55,41903.613224,15380.745579,26522.867646
...,...,...,...,...,...,...,...,...
1281,1734566400,206.49,215.00,186.80,193.65,452547.729871,185623.785796,266923.944075
1282,1734652800,193.65,199.44,175.01,194.40,463451.482080,210245.812667,253205.669413
1283,1734739200,194.40,201.91,178.50,181.06,308103.063575,122417.224892,185685.838684
1284,1734825600,181.06,187.86,176.87,180.34,218352.282441,108511.764430,109840.518011


In [5]:
fs = project.get_feature_store() 
solana_fg = fs.get_or_create_feature_group(
    name='solana',
    description='Solana price',
    version=1,
    primary_key=["timestamp"])

solana_fg.insert(hist_data_sol)  

# solana_fg.update_feature_description("date", "Date")
#solana_fg.update_feature_description("price", "The price of Solana")
# solana_fg.update_feature_description("open", "The opening price of Solana")
#solana_fg.update_feature_description("high", "The highest price of Solana")
#solana_fg.update_feature_description("low", "The lowest price of Solana")
#solana_fg.update_feature_description("vol", "Volume")
#solana_fg.update_feature_description("change", "Change in price")

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1164448/fs/1155151/fg/1393151


Uploading Dataframe: 100.00% |██████████| Rows 1286/1286 | Elapsed Time: 00:02 | Remaining Time: 00:00


Launching job: solana_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1164448/jobs/named/solana_1_offline_fg_materialization/executions


(Job('solana_1_offline_fg_materialization', 'SPARK'), None)

#### Add historical data for bitcoin

In [8]:
hist_data_btc = pd.read_csv("data/historical_bitcoin.csv")
hist_data_btc = hist_data_btc[["TIMESTAMP", 'OPEN', 'HIGH', 'LOW', 'CLOSE', "VOLUME", 'VOLUME_BUY', 'VOLUME_SELL']]
hist_data_btc.columns = hist_data_btc.columns.str.lower()
hist_data_btc

Unnamed: 0,timestamp,open,high,low,close,volume,volume_buy,volume_sell
0,1622678400,37565.3,39471.0,37159.1,39196.6,5230.076287,2452.239397,2777.836889
1,1622764800,39196.6,39249.8,35576.7,36847.7,4991.517443,2199.007148,2792.510295
2,1622851200,36847.7,37935.8,34825.9,35534.6,5532.753260,2475.469496,3057.283764
3,1622937600,35534.6,36479.5,35251.1,35789.0,3187.546401,1668.324634,1519.221767
4,1623024000,35789.0,36796.3,33334.3,33587.6,6165.920455,2555.082410,3610.838045
...,...,...,...,...,...,...,...,...
1295,1734566400,100166.1,102750.0,95586.4,97431.4,3582.993384,1634.942176,1948.051208
1296,1734652800,97431.4,98064.7,92159.0,97781.8,3145.577624,1246.352081,1899.225543
1297,1734739200,97781.8,99575.4,96359.8,97232.6,1158.063596,550.601852,607.461744
1298,1734825600,97232.6,97321.3,94201.1,95101.9,843.506407,277.687032,565.819376


In [9]:
fs = project.get_feature_store() 
bitcoin_fg = fs.get_or_create_feature_group(
    name='bitcoin',
    description='Bitcoin price',
    version=7,
    primary_key=["timestamp"])

bitcoin_fg.insert(hist_data_btc)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1164448/fs/1155151/fg/1394141


Uploading Dataframe: 100.00% |██████████| Rows 1300/1300 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: bitcoin_7_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1164448/jobs/named/bitcoin_7_offline_fg_materialization/executions


(Job('bitcoin_7_offline_fg_materialization', 'SPARK'), None)

In [None]:
# bitcoin_fg.update_feature_description("date", "Date")
#bitcoin_fg.update_feature_description("price", "The price of Bitcoin")
# bitcoin_fg.update_feature_description("open", "The opening price of Bitcoin")
#bitcoin_fg.update_feature_description("high", "The highest price of Bitcoin")
#bitcoin_fg.update_feature_description("low", "The lowest price of Bitcoin")
#bitcoin_fg.update_feature_description("vol", "Volume")
#bitcoin_fg.update_feature_description("change", "Change in price")


<hsfs.feature_group.FeatureGroup at 0x16b0cfa90>

#### Add historical data for fear and greed index

In [30]:
import requests
import pandas as pd
import io

# URL of the API
url = "https://api.alternative.me/fng/?limit=0&format=csv"

# Fetch data from the API
response = requests.get(url)

if response.status_code == 200:
    content = response.text
    if "data" in content:
        # Locate and clean the pseudo-CSV section
        start_idx = content.find("[") + 1
        end_idx = content.find("]", start_idx)
        raw_data = content[start_idx:end_idx].strip()
        
        # Replace single quotes and braces for easier parsing
        raw_data = raw_data.replace("'", "").replace("{", "").replace("}", "")
        
        # Debug: Print raw_data to check its format
        #print("Raw data:", raw_data)
        
        # Split into rows
        rows = raw_data.split("\n")
        data = []
        
        for row in rows:
            # Debug: Print each row to check format
            #print("Processing row:", row)
            if row == "fng_value,fng_classification,date":
                # Skip header row
                continue
            
            # Extract key-value pairs
            key_values = row.split(",")
            # Ensure each field has a key and value
            if len(key_values) == 3:
                data.append(key_values[0].strip())
                data.append(key_values[1].strip())
                data.append(key_values[2].strip())

            else:
                print("Skipping malformed row:", row)

        
        # Assuming rows are in order of [date, fng_value, fng_classification] repeat
        # Split data into chunks of 3 for each record
        structured_data = [data[i:i + 3] for i in range(0, len(data), 3)]
        
        # Create DataFrame
        fng_df = pd.DataFrame(structured_data, columns=["date", "fng_value", "fng_classification"])
        print(fng_df.head())
    else:
        print("Data field not found in response.")
else:
    print(f"Failed to fetch data: {response.status_code}")


         date fng_value fng_classification
0  23-12-2024        70              Greed
1  22-12-2024        73              Greed
2  21-12-2024        73              Greed
3  20-12-2024        74              Greed
4  19-12-2024        75              Greed


In [31]:
#fixing the date format
fng_df['date'] = pd.to_datetime(fng_df['date'], format='%d-%m-%Y').dt.strftime('%m/%d/%Y')
fng_df['date']=pd.to_datetime(fng_df['date'], format='%m/%d/%Y')
fng_df

Unnamed: 0,date,fng_value,fng_classification
0,2024-12-23,70,Greed
1,2024-12-22,73,Greed
2,2024-12-21,73,Greed
3,2024-12-20,74,Greed
4,2024-12-19,75,Greed
...,...,...,...
2509,2018-02-05,11,Extreme Fear
2510,2018-02-04,24,Extreme Fear
2511,2018-02-03,40,Fear
2512,2018-02-02,15,Extreme Fear


In [5]:
fs = project.get_feature_store() 
fng_fg = fs.get_or_create_feature_group(
    name='f_n_g_index',
    description='fear_and_greed_index',
    version=6,
    primary_key=["date"])

fng_fg.insert(fng_df)

Uploading Dataframe: 100.00% |██████████| Rows 2514/2514 | Elapsed Time: 00:01 | Remaining Time: 00:00


Launching job: f_n_g_index_6_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1164448/jobs/named/f_n_g_index_6_offline_fg_materialization/executions


(Job('f_n_g_index_6_offline_fg_materialization', 'SPARK'), None)

#### Enter a description for each feature in the Feature Group

In [34]:
fng_fg.update_feature_description("date", "Date of the Fear and Greed Index")
fng_fg.update_feature_description("fng_value", "Fear and Greed Index value")
fng_fg.update_feature_description("fng_classification", "Fear and Greed Index classification")


<hsfs.feature_group.FeatureGroup at 0x16c2e3bb0>