# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 02: Feature Pipeline</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/bitcoin/2_bitcoin_feature_pipeline.ipynb)


## 🗒️ This notebook is divided into the following sections:
1. Parsing Data.
2. Feature Group Insertion.

### <span style="color:#ff5f27;"> 📝 Imports</span>

In [None]:
!pip install -U hopsworks --quiet

!pip install -U unicorn-binance-rest-api --quiet
!pip install -U python-dotenv --quiet
!pip install -U textblob --quiet
!pip install -U vaderSentiment --quiet
!pip install -U tweepy --quiet

# Hosted notebook environments may not have the local features package
import os

def need_download_modules():
    if 'google.colab' in str(get_ipython()):
        return True
    if 'HOPSWORKS_PROJECT_ID' in os.environ:
        return True
    return False

if need_download_modules():
    print("Downloading modules")
    os.system('mkdir -p features')
    os.system('cd features && wget https://raw.githubusercontent.com/logicalclocks/hopsworks-tutorials/master/advanced_tutorials/bitcoin/features/bitcoin_price.py')
    os.system('cd features && wget https://raw.githubusercontent.com/logicalclocks/hopsworks-tutorials/master/advanced_tutorials/bitcoin/features/tweets.py')
else:
    print("Local environment")

In [None]:
# Uncomment and fill in if you are running on Colab
# os.environ['TWITTER_API_KEY'] = '{YOUR_API_KEY}'
# os.environ['TWITTER_API_SECRET'] = '{YOUR_API_KEY}'
# os.environ['TWITTER_ACCESS_TOKEN'] = '{YOUR_API_KEY}'
# os.environ['TWITTER_ACCESS_TOKEN_SECRET'] = '{YOUR_API_KEY}'

# os.environ['BINANCE_API_KEY'] = '{YOUR_API_KEY}'
# os.environ['BINANCE_API_SECRET'] = '{YOUR_API_KEY}'

In [None]:
import pandas as pd
from features import bitcoin_price, tweets

---
## <span style="color:#ff5f27;"> 🧙🏼‍♂️ Parsing Data</span>

You will parse timeseries Bitcoin data from Binance using your own credentials, so you have to get a free Binance account and [create API-keys](https://www.binance.com/en/support/faq/360002502072).

Also, you should [contact Twitter](https://developer.twitter.com/en/docs/twitter-api/getting-started/getting-access-to-the-twitter-api) for their API-keys.


#### Don't forget to create an `.env` configuration file inside this directory where all the necessary environment variables will be stored:

`TWITTER_API_KEY = "YOUR_API_KEY"`

`TWITTER_API_SECRET = "YOUR_API_KEY"`

`TWITTER_ACCESS_TOKEN = "YOUR_API_KEY"`

`TWITTER_ACCESS_TOKEN_SECRET = "YOUR_API_KEY"`


`BINANCE_API_KEY = "YOUR_API_KEY"`

`BINANCE_API_SECRET = "YOUR_API_KEY"`

> If you done it after you run this notebook, restart the Python Kernel (because `functions.py` does not have these variables in his namespace).

![](images/api_keys_env_file.png)

### <span style='color:#ff5f27'> 📈 Bitcoin Data

In [None]:
# we should take 56+ days because of feature engineering with window aggregations.
df_bitcoin = bitcoin_price.parse_btc_data(number_of_days_ago=57)
df_bitcoin.head(3)

In [None]:
df_bitcoin_processed = bitcoin_price.process_btc_data(df_bitcoin)
df_bitcoin_processed.tail(3)

In [None]:
df_bitcoin_processed.date = df_bitcoin_processed.date.astype(str)

### <span style='color:#ff5f27'> 💭 Tweets Data

In [None]:
df_tweets_parsed = tweets.get_last_tweets()
df_tweets_parsed.head()

In [None]:
tweets_textblob = tweets.textblob_processing(df_tweets_parsed)

In [None]:
tweets_vader = tweets.vader_processing(df_tweets_parsed)

In [None]:
tweets_textblob.date = tweets_textblob.date.apply(lambda x: x[:10])
tweets_vader.date = tweets_vader.date.apply(lambda x: x[:10])

In [None]:
tweets_textblob.head()

---

### <span style="color:#ff5f27;"> 📡 Connecting to the Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

In [None]:
btc_price_fg = fs.get_or_create_feature_group(
    name='bitcoin_price',
    version=1,
)

tweets_textblob_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_textblob',
    version=1,
)

tweets_vader_fg = fs.get_or_create_feature_group(
    name='bitcoin_tweets_vader',
    version=1,
)

---

### <span style='color:#ff5f27'> 💫 Filling the gap in tweets

In [None]:
btc_dates = btc_price_fg.read().date.sort_values().reset_index(drop=True).astype(str)

In [None]:
stored_tweets_df = tweets_textblob_fg.read()

In [None]:
stored_dates = stored_tweets_df.date.apply(lambda x: str(x)[:10]).drop_duplicates().sort_values().reset_index(drop=True)

In [None]:
btc_dates

In [None]:
stored_dates

In [None]:
missing_dates = list(set(btc_dates) - set(stored_dates))

In [None]:
len(missing_dates)

In [None]:
tweets_textblob_fix = pd.DataFrame(
    {
        "date": missing_dates,
        "subjectivity": [1] * len(missing_dates),
        "polarity": [1] * len(missing_dates),
    })

In [None]:
tweets_vader_fix = pd.DataFrame(
    {
        "date": missing_dates,
        "compound": [1] * len(missing_dates),
    })

In [None]:
tweets_vader_fix

In [None]:
tweets_vader_fix["unix"] = tweets_vader_fix.date.apply(tweets.convert_date_to_unix)
tweets_textblob_fix["unix"] = tweets_textblob_fix.date.apply(tweets.convert_date_to_unix)

In [None]:
tweets_vader_fix.sort_values("date")

In [None]:
tweets_vader_batch = pd.concat([tweets_vader_fix, tweets_vader]).sort_values("date").reset_index(drop=True)
tweets_textblob_batch = pd.concat([tweets_textblob_fix, tweets_textblob]).sort_values("date").reset_index(drop=True)

---

## <span style="color:#ff5f27;">⬆️ Uploading new data to the Feature Store</span>

### <span style='color:#ff5f27'> 📈 Bitcoin Feature Group

In [None]:
btc_price_fg.insert(df_bitcoin_processed)

### <span style='color:#ff5f27'> 💭 Tweets Feature Groups

In [None]:
tweets_textblob_fg.insert(tweets_textblob_batch)

In [None]:
tweets_vader_fg.insert(tweets_vader_batch, wait=True)

## <span style="color:#ff5f27;">⏭️ **Next:** Part 03: Training Pipeline </span>

In the next notebook you will create a feature view, training dataset, train a model and register it in Hopsworks Model Registry.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/advanced_tutorials/bitcoin/3_bitcoin_training_pipeline.ipynb)