I chose to collect data using Historic-Crypto, using a daily approach.
The only features they provide us are 'low', 'high', 'open', 'close', 'volume' so we will keep all these columns.

Understanting features:
**Low**: The lowest price at which a particular cryptocurrency was traded during a specific time period, typically within a given timeframe such as 1 minute, 1 hour, 1 day, etc.

**High**: The highest price at which a particular cryptocurrency was traded during a specific time period, typically within a given timeframe such as 1 minute, 1 hour, 1 day, etc.

**Open**: The price of a cryptocurrency at the beginning of a specific time period, such as the opening price of a trading session, often referred to as the price at which the first trade of the period occurs.

**Close**: The price of a cryptocurrency at the end of a specific time period, such as the closing price of a trading session, often referred to as the price at which the last trade of the period occurs.

**Volume**: The total number of units of a cryptocurrency traded during a specific time period. It represents the amount of cryptocurrency that has been exchanged between buyers and sellers during the given timeframe.

In [None]:
pip install Historic-Crypto

In [None]:
from Historic_Crypto import LiveCryptoData
from Historic_Crypto import HistoricalData
BTC = HistoricalData('BTC-USD',86400,'2014-01-01-00-00', verbose =  False).retrieve_data()
ETH = HistoricalData('ETH-USD',86400,'2014-01-01-00-00', verbose =  False).retrieve_data()
LTC = HistoricalData('LTC-USD',86400,'2014-01-01-00-00', verbose =  False).retrieve_data()

In [None]:
LTC.head(15)

Unnamed: 0_level_0,low,high,open,close,volume
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-08-17,3.62,3.7,3.7,3.62,10.317691
2016-08-18,3.62,3.7,3.62,3.7,1.0
2016-08-19,3.7,4.0,3.7,4.0,201.532878
2016-08-21,4.0,4.0,4.0,4.0,0.249252
2016-08-22,3.99,3.99,3.99,3.99,0.624692
2016-08-23,3.61,5.49,3.99,3.95,5619.788148
2016-08-24,3.61,3.96,3.95,3.79,5183.135185
2016-08-25,3.75,3.88,3.82,3.81,4985.640642
2016-08-26,3.77,3.9,3.81,3.81,3336.893101
2016-08-27,3.62,3.85,3.85,3.78,1150.722207


One of my ideas to have a better model is to include sentiment analysis on tweets, but the free API only let us collect up to 1000 tweets.
Another idea is to include Google Trends. So using an API, I collected data for some keywords for each of the datasets.

The words are "crypto, buy bitcoin/litecoin/ethereum, cryptocurrency, Blockchain, Bitcoin/Litecoin/Ethereum". Next, I included all of these statistics in the original dataframes.

In [None]:
pip install serpapi

In [None]:
import os
import serpapi

client = serpapi.Client(api_key='c2c12deb86e94040bc9aa365762be1ae1d37136b545b20fc2c837b76a673819b')
search =  client.search(
    engine="google_trends",
    q="crypto, buy bitcoin, cryptocurrency, Blockchain, Bitcoin",
    api_key='c2c12deb86e94040bc9aa365762be1ae1d37136b545b20fc2c837b76a673819b',
    date = "all",
  )


In [None]:
timeline_data_months = search["interest_over_time"]["timeline_data"]

In [None]:
date_data = {}

for item in timeline_data_months:
    date = item['date']
    timestamp = item['timestamp']
    values = item['values']
    query_data = {}
    for value_dict in values:
        query = value_dict['query']
        extracted_value = value_dict['extracted_value']
        query_data[query] = extracted_value
    date_data[date] = {'timestamp': timestamp, **query_data}

df_months = pd.DataFrame(date_data).T.reset_index()
df_months.rename(columns={'index': 'date'}, inplace=True)
df_months_btc = df_months[['date', 'timestamp'] + sorted(df_months.columns.difference(['date', 'timestamp']))]


In [None]:
search =  client.search(
    engine="google_trends",
    q="crypto, buy litecoin, cryptocurrency, Blockchain, litecoin",
    api_key='c2c12deb86e94040bc9aa365762be1ae1d37136b545b20fc2c837b76a673819b',
    date = "all",
  )
timeline_data_months = search["interest_over_time"]["timeline_data"]
date_data = {}

for item in timeline_data_months:
    date = item['date']
    timestamp = item['timestamp']
    values = item['values']
    query_data = {}
    for value_dict in values:
        query = value_dict['query']
        extracted_value = value_dict['extracted_value']
        query_data[query] = extracted_value
    date_data[date] = {'timestamp': timestamp, **query_data}
del df_months
df_months = pd.DataFrame(date_data).T.reset_index()
df_months.rename(columns={'index': 'date'}, inplace=True)
df_months_ltc = df_months[['date', 'timestamp'] + sorted(df_months.columns.difference(['date', 'timestamp']))]

In [None]:
search =  client.search(
    engine="google_trends",
    q="crypto, buy ethereum, cryptocurrency, Blockchain, ethereum",
    api_key='c2c12deb86e94040bc9aa365762be1ae1d37136b545b20fc2c837b76a673819b',
    date = "all",
  )
timeline_data_months = search["interest_over_time"]["timeline_data"]
date_data = {}
del df_months
for item in timeline_data_months:
    date = item['date']
    timestamp = item['timestamp']
    values = item['values']
    query_data = {}
    for value_dict in values:
        query = value_dict['query']
        extracted_value = value_dict['extracted_value']
        query_data[query] = extracted_value
    date_data[date] = {'timestamp': timestamp, **query_data}

df_months = pd.DataFrame(date_data).T.reset_index()
df_months.rename(columns={'index': 'date'}, inplace=True)
df_months_eth = df_months[['date', 'timestamp'] + sorted(df_months.columns.difference(['date', 'timestamp']))]

In [None]:
BTC = BTC.reset_index()
LTC = LTC.reset_index()
ETH = ETH.reset_index()

In [None]:
BTC['time'] = pd.to_datetime(BTC['time'])
ETH['time'] = pd.to_datetime(ETH['time'])
LTC['time'] = pd.to_datetime(LTC['time'])

In [None]:
BTC['convert'] = BTC['time'].dt.strftime('%b %Y')
ETH['convert'] = ETH['time'].dt.strftime('%b %Y')
LTC['convert'] = LTC['time'].dt.strftime('%b %Y')

In [None]:
ETH

Unnamed: 0,time,low,high,open,close,volume,convert
0,2016-05-18,12.50,14.93,12.50,13.18,482.521827,May 2016
1,2016-05-19,13.00,14.90,13.18,14.90,950.441205,May 2016
2,2016-05-20,13.71,14.82,14.82,14.17,254.166617,May 2016
3,2016-05-23,13.61,13.91,13.86,13.61,180.786712,May 2016
4,2016-05-24,12.00,13.74,13.68,12.77,2753.239298,May 2016
...,...,...,...,...,...,...,...
2856,2024-03-15,3566.38,3932.39,3880.45,3742.95,252124.799207,Mar 2024
2857,2024-03-16,3466.55,3781.40,3743.29,3520.10,116692.681774,Mar 2024
2858,2024-03-17,3410.78,3676.76,3520.24,3641.02,109583.542242,Mar 2024
2859,2024-03-18,3454.30,3642.56,3641.51,3520.66,119847.052306,Mar 2024


In [None]:
df1 = BTC.copy()
df2=df_months_btc.copy()
BTC_2 = pd.merge(df1, df2, left_on='convert', right_on='date', how='left')

In [None]:
df1 = LTC.copy()
df2=df_months_ltc.copy()
LTC_2 = pd.merge(df1, df2, left_on='convert', right_on='date', how='left')
df1 = ETH.copy()
df2=df_months_eth.copy()
ETH_2 = pd.merge(df1, df2, left_on='convert', right_on='date', how='left')

In [None]:
BTC_2

Unnamed: 0,time,low,high,open,close,volume,time2,convert,date,timestamp,Bitcoin,Blockchain,buy bitcoin,crypto,cryptocurrency
0,2015-07-20,277.37,280.00,277.98,280.00,782.883420,2015-07-20,Jul 2015,Jul 2015,1435708800,3,0,0,0,0
1,2015-07-21,276.85,281.27,279.96,277.32,4943.559434,2015-07-21,Jul 2015,Jul 2015,1435708800,3,0,0,0,0
2,2015-07-22,275.01,278.54,277.33,277.89,4687.909383,2015-07-22,Jul 2015,Jul 2015,1435708800,3,0,0,0,0
3,2015-07-23,276.28,279.75,277.96,277.39,5306.919575,2015-07-23,Jul 2015,Jul 2015,1435708800,3,0,0,0,0
4,2015-07-24,276.43,291.52,277.23,289.12,7362.469083,2015-07-24,Jul 2015,Jul 2015,1435708800,3,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3161,2024-03-15,65565.70,72414.96,71364.58,69506.88,40195.842312,2024-03-15,Mar 2024,Mar 2024,1709251200,44,2,1,17,2
3162,2024-03-16,64774.05,70050.00,69514.87,65254.22,16532.825303,2024-03-16,Mar 2024,Mar 2024,1709251200,44,2,1,17,2
3163,2024-03-17,64505.00,68877.82,65254.22,68343.64,12631.850841,2024-03-17,Mar 2024,Mar 2024,1709251200,44,2,1,17,2
3164,2024-03-18,66562.65,68933.71,68354.29,67613.04,20057.375800,2024-03-18,Mar 2024,Mar 2024,1709251200,44,2,1,17,2


Converting to Unix Timestamp for easier analysis.

In [None]:
LTC_2['unix_timestamp'] = LTC_2['time'].astype(int) // 10**9
BTC_2['unix_timestamp'] = BTC_2['time'].astype(int) // 10**9
ETH_2['unix_timestamp'] = ETH_2['time'].astype(int) // 10**9

In [None]:
LTC_2.drop(['convert', 'date', 'timestamp'], axis=1, inplace=True)
BTC_2.drop(['convert', 'date', 'timestamp'], axis=1, inplace=True)
ETH_2.drop(['convert', 'date', 'timestamp'], axis=1, inplace=True)

In [None]:
BTC_2

Unnamed: 0,time,low,high,open,close,volume,time2,Bitcoin,Blockchain,buy bitcoin,crypto,cryptocurrency,unix_timestamp
0,2015-07-20,277.37,280.00,277.98,280.00,782.883420,2015-07-20,3,0,0,0,0,1437350400
1,2015-07-21,276.85,281.27,279.96,277.32,4943.559434,2015-07-21,3,0,0,0,0,1437436800
2,2015-07-22,275.01,278.54,277.33,277.89,4687.909383,2015-07-22,3,0,0,0,0,1437523200
3,2015-07-23,276.28,279.75,277.96,277.39,5306.919575,2015-07-23,3,0,0,0,0,1437609600
4,2015-07-24,276.43,291.52,277.23,289.12,7362.469083,2015-07-24,3,0,0,0,0,1437696000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3161,2024-03-15,65565.70,72414.96,71364.58,69506.88,40195.842312,2024-03-15,44,2,1,17,2,1710460800
3162,2024-03-16,64774.05,70050.00,69514.87,65254.22,16532.825303,2024-03-16,44,2,1,17,2,1710547200
3163,2024-03-17,64505.00,68877.82,65254.22,68343.64,12631.850841,2024-03-17,44,2,1,17,2,1710633600
3164,2024-03-18,66562.65,68933.71,68354.29,67613.04,20057.375800,2024-03-18,44,2,1,17,2,1710720000


In [None]:
BTC_2.isna().any()

time              False
low               False
high              False
open              False
close             False
volume            False
time2             False
Bitcoin           False
Blockchain        False
buy bitcoin       False
crypto            False
cryptocurrency    False
unix_timestamp    False
dtype: bool

In [None]:
LTC_2.isna().any()

time              False
low               False
high              False
open              False
close             False
volume            False
Blockchain        False
buy litecoin      False
crypto            False
cryptocurrency    False
litecoin          False
unix_timestamp    False
dtype: bool

In [None]:
ETH_2.isna().any()

time              False
low               False
high              False
open              False
close             False
volume            False
Blockchain        False
buy ethereum      False
crypto            False
cryptocurrency    False
ethereum          False
unix_timestamp    False
dtype: bool

In [None]:
LTC_2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2771 entries, 0 to 2770
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   time            2771 non-null   datetime64[ns]
 1   low             2771 non-null   float64       
 2   high            2771 non-null   float64       
 3   open            2771 non-null   float64       
 4   close           2771 non-null   float64       
 5   volume          2771 non-null   float64       
 6   Blockchain      2771 non-null   object        
 7   buy litecoin    2771 non-null   object        
 8   crypto          2771 non-null   object        
 9   cryptocurrency  2771 non-null   object        
 10  litecoin        2771 non-null   object        
 11  unix_timestamp  2771 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(1), object(5)
memory usage: 281.4+ KB


Transforming columns that are type object in type int.

In [None]:
BTC_2[['Bitcoin', 'Blockchain', 'buy bitcoin', 'crypto', 'cryptocurrency']] = BTC_2[['Bitcoin', 'Blockchain', 'buy bitcoin', 'crypto', 'cryptocurrency']].astype(int)
LTC_2[['litecoin', 'Blockchain', 'buy litecoin', 'crypto', 'cryptocurrency']] = LTC_2[['litecoin', 'Blockchain', 'buy litecoin', 'crypto', 'cryptocurrency']].astype(int)
ETH_2[['ethereum', 'Blockchain', 'buy ethereum', 'crypto', 'cryptocurrency']] = ETH_2[['ethereum', 'Blockchain', 'buy ethereum', 'crypto', 'cryptocurrency']].astype(int)

In [None]:
ETH_2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2861 entries, 0 to 2860
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   time            2861 non-null   datetime64[ns]
 1   low             2861 non-null   float64       
 2   high            2861 non-null   float64       
 3   open            2861 non-null   float64       
 4   close           2861 non-null   float64       
 5   volume          2861 non-null   float64       
 6   Blockchain      2861 non-null   int64         
 7   buy ethereum    2861 non-null   int64         
 8   crypto          2861 non-null   int64         
 9   cryptocurrency  2861 non-null   int64         
 10  ethereum        2861 non-null   int64         
 11  unix_timestamp  2861 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(6)
memory usage: 290.6 KB


We can see strong correlation values between columns like 'low','high' and the columns added from google trends statistics, which means that these will help the model train better.

In [None]:
ETH_2.corr()

  ETH_2.corr()


Unnamed: 0,low,high,open,close,volume,Blockchain,buy ethereum,crypto,cryptocurrency,ethereum,unix_timestamp
low,1.0,0.998126,0.998383,0.99898,0.148806,0.328726,0.098282,0.907232,0.460551,0.694923,0.720842
high,0.998126,1.0,0.999274,0.999089,0.185425,0.342756,0.114494,0.918632,0.482954,0.71815,0.711432
open,0.998383,0.999274,1.0,0.998159,0.17442,0.337291,0.107998,0.914427,0.473799,0.708677,0.714839
close,0.99898,0.999089,0.998159,1.0,0.168392,0.337085,0.109356,0.914055,0.474731,0.709313,0.715093
volume,0.148806,0.185425,0.17442,0.168392,1.0,0.276967,0.30983,0.300279,0.348528,0.468822,0.16823
Blockchain,0.328726,0.342756,0.337291,0.337085,0.276967,1.0,0.536806,0.471092,0.830916,0.574073,0.082901
buy ethereum,0.098282,0.114494,0.107998,0.109356,0.30983,0.536806,1.0,0.296179,0.663681,0.549851,-0.169114
crypto,0.907232,0.918632,0.914427,0.914055,0.300279,0.471092,0.296179,1.0,0.662436,0.856573,0.605166
cryptocurrency,0.460551,0.482954,0.473799,0.474731,0.348528,0.830916,0.663681,0.662436,1.0,0.811209,0.096172
ethereum,0.694923,0.71815,0.708677,0.709313,0.468822,0.574073,0.549851,0.856573,0.811209,1.0,0.366444


In [None]:
BTC_2.to_csv('Bitcoin.csv', index=False)
LTC_2.to_csv('Litecoin.csv', index=False)
ETH_2.to_csv('Ethereum.csv', index=False)