# 修了課題③ 仮想通貨

**データセット**：仮想通貨の週単位のデータ  
※１週間のうち６日間の取引価格の終値を学習データとして利用して、最後の１日の取引価格の終値を予測する。これは、１週間の取引価格の間に依存関係があり、正しく傾向を学習すれば、６日間の取引価格から残りの１日間の価格が予測できるはずという仮説に基づいたものとなる。

**合格基準**：RMSE 50未満
  

※補足 ( 仮想通貨に関して)  
仮想通貨は、株式と同じように個別銘柄（ビットコインやイーサリアムなど）が存在し、日々仮想通貨市場にて取引が行われている。仮想通貨市場は為替市場などと同じように取引されており、例えば代表的な仮想通貨であるビットコインなどであれば、１ビットコイン＝〇〇ドルのような形で日々刻々と価格が変動している。

# データのダウンロード

In [1]:
# 学習データのダウンロード
!wget 'https://drive.google.com/uc?export=download&id=1kUfPb8qikA8rdQ26iVUxpod2Qjw3ct_O' -O crypto_train.csv

# テストデータのダウンロード
!wget 'https://drive.google.com/uc?export=download&id=1VhzCcjNSDxGRG86Za653zHHpjVCdSPD3' -O crypto_test_x.csv

--2025-01-27 04:22:37--  https://drive.google.com/uc?export=download&id=1kUfPb8qikA8rdQ26iVUxpod2Qjw3ct_O
正在解析主机 drive.google.com (drive.google.com)... 142.251.222.14
正在连接 drive.google.com (drive.google.com)|142.251.222.14|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 303 See Other
位置：https://drive.usercontent.google.com/download?id=1kUfPb8qikA8rdQ26iVUxpod2Qjw3ct_O&export=download [跟随至新的 URL]
--2025-01-27 04:22:38--  https://drive.usercontent.google.com/download?id=1kUfPb8qikA8rdQ26iVUxpod2Qjw3ct_O&export=download
正在解析主机 drive.usercontent.google.com (drive.usercontent.google.com)... 142.251.42.193
正在连接 drive.usercontent.google.com (drive.usercontent.google.com)|142.251.42.193|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：24244 (24K) [application/octet-stream]
正在保存至: “crypto_train.csv”


2025-01-27 04:22:41 (511 KB/s) - 已保存 “crypto_train.csv” [24244/24244])

--2025-01-27 04:22:41--  https://drive.google.com/uc?export=download&id=1VhzCcjNSDxGRG86Za653zHHpjVCdSPD3
正在解析主机 drive.google.com (drive.goog

# データの確認

In [190]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [191]:
# 学習データの確認
train = pd.read_csv('crypto_train.csv', index_col=0)
train

Unnamed: 0,Mon,Tue,Wed,Thu,Fri,Sat,Sun
0,144.539993,139.000000,116.989998,105.209999,97.750000,112.500000,115.910004
1,112.300003,111.500000,113.566002,112.669998,117.199997,115.242996,115.000000
2,117.980003,111.500000,114.220001,118.760002,123.014999,123.498001,121.989998
3,122.000000,122.879997,123.889000,126.699997,133.199997,131.979996,133.479996
4,129.744995,129.000000,132.300003,128.798996,129.000000,129.300003,122.292000
...,...,...,...,...,...,...,...
186,739.247986,751.346985,744.593994,740.289001,741.648987,735.382019,732.034973
187,735.812988,735.604004,745.690979,756.773987,777.943970,771.155029,773.872009
188,758.700012,764.223999,768.132019,770.809998,772.794006,774.650024,769.731018
189,780.086975,780.556030,781.481018,778.088013,784.906982,790.828979,790.530029


In [192]:
# テストデータの確認
test = pd.read_csv('crypto_test_x.csv', index_col=0)
test

Unnamed: 0,Mon,Tue,Wed,Thu,Fri,Sat
0,1021.75,1043.839966,1154.72998,1013.380005,902.200989,908.585022
1,902.828003,907.679016,777.757019,804.833984,823.984009,818.411987
2,831.533997,907.937988,886.617981,899.072998,895.026001,921.789001
3,921.012024,892.687012,901.541992,917.585999,919.75,921.590027
4,920.382019,970.403015,989.02301,1011.799988,1029.910034,1042.900024
5,1038.150024,1061.349976,1063.069946,994.382996,988.674011,1004.450012
6,990.642029,1004.549988,1007.47998,1027.439941,1046.209961,1054.420044
7,1079.97998,1115.300049,1117.439941,1166.719971,1173.680054,1143.839966
8,1179.969971,1179.969971,1222.5,1251.01001,1274.98999,1255.150024
9,1272.829956,1223.540039,1150.0,1188.48999,1116.719971,1175.829956


# モデルの作成

In [193]:
# 正規化処理
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler = scaler.fit(train.values)

# 正規化する
s_train = scaler.transform(train)
# 元の値に戻す場合
inv_train = scaler.inverse_transform(s_train)

# シーケンスデータの教師ありデータを作成する
def make_seq(data, seq_len=6):
    X = data[:, 0:seq_len]
    X = np.expand_dims(X, axis=2)
    Y = data[:, seq_len:]
    return X, Y

X, Y = make_seq(s_train)

def train_val_split(X, Y, val_rate):
  rate = int(X.shape[0]*(1-val_rate))
  train_X, train_Y, val_X, val_Y = X[:rate], Y[:rate], X[rate:], Y[rate:]
  return train_X, train_Y, val_X, val_Y

train_X, train_Y, val_X, val_Y = train_val_split(X, Y, 0.1)



In [194]:
train_X.shape, train_Y.shape, val_X.shape, val_Y.shape

((171, 6, 1), (171, 1), (20, 6, 1), (20, 1))

In [195]:
import torch.nn as nn
import torch

class LSTMModel(nn.Module):
    """LSTM モデル"""
    def __init__(self, input_size=1, hidden_size_1=64, hidden_size_2=32, output_size=1):
        super(LSTMModel, self).__init__()
        self.hidden_size_1 = hidden_size_1
        self.hidden_size_2 = hidden_size_2
        
        self.lstm1 = nn.LSTM(input_size, hidden_size_1, batch_first=True)
        self.dropout1 = nn.Dropout(0.2)
        self.lstm2 = nn.LSTM(hidden_size_1, hidden_size_2, batch_first=True)
        self.dropout2 = nn.Dropout(0.2)
        self.fc = nn.Linear(hidden_size_2, output_size)
        
        self.criterion = nn.MSELoss()
        # self.modelをselfに修正
        self.optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
    
    def forward(self, x):
        # First LSTM layer
        out, _ = self.lstm1(x)
        out = self.dropout1(out)
        
        # Second LSTM layer
        out, _ = self.lstm2(out)
        out = self.dropout2(out[:, -1, :])  # Get last output
        
        # Output layer
        out = self.fc(out)
        return out
    
    def predict(self, x):
        self.eval()
        with torch.no_grad():  # 構文を修正
            return self.forward(x)
    
    def train_model(self, train_x, train_y, val_x, val_y, epochs):
        """モデルの学習を行う"""
        train_losses = []
        val_losses = []
        
        for epoch in range(epochs):
            # Training phase
            self.train()
            self.optimizer.zero_grad()
            
            # Forward pass
            outputs = self.forward(train_x)
            loss = self.criterion(outputs, train_y)
            
            # Backward pass
            loss.backward()
            self.optimizer.step()
            
            # Validation phase
            self.eval()
            with torch.no_grad():
                val_outputs = self.forward(val_x)
                val_loss = self.criterion(val_outputs, val_y)
            
            # Loss values
            train_losses.append(loss.item())
            val_losses.append(val_loss.item())
            
            # Print progress
            if (epoch + 1) % 10 == 0:
                print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {loss.item():.4f}, Val Loss: {val_loss.item():.4f}')
        
        return train_losses, val_losses

In [196]:
# モデルのインスタンス化
model = LSTMModel(input_size=1, hidden_size_1=64, hidden_size_2=32, output_size=1)

# データをPyTorchのテンソルに変換
train_X = torch.FloatTensor(train_X)
train_Y = torch.FloatTensor(train_Y)
val_X = torch.FloatTensor(val_X)
val_Y = torch.FloatTensor(val_Y)

# モデルの学習
train_losses, val_losses = model.train_model(train_X, train_Y, val_X, val_Y, epochs=5000)

Epoch [10/5000], Train Loss: 0.1361, Val Loss: 0.3516
Epoch [20/5000], Train Loss: 0.0510, Val Loss: 0.0497
Epoch [30/5000], Train Loss: 0.0461, Val Loss: 0.0792
Epoch [40/5000], Train Loss: 0.0404, Val Loss: 0.1072
Epoch [50/5000], Train Loss: 0.0354, Val Loss: 0.0459
Epoch [60/5000], Train Loss: 0.0281, Val Loss: 0.0433
Epoch [70/5000], Train Loss: 0.0125, Val Loss: 0.0089
Epoch [80/5000], Train Loss: 0.0091, Val Loss: 0.0029
Epoch [90/5000], Train Loss: 0.0087, Val Loss: 0.0016
Epoch [100/5000], Train Loss: 0.0078, Val Loss: 0.0005
Epoch [110/5000], Train Loss: 0.0047, Val Loss: 0.0007
Epoch [120/5000], Train Loss: 0.0067, Val Loss: 0.0014
Epoch [130/5000], Train Loss: 0.0064, Val Loss: 0.0013
Epoch [140/5000], Train Loss: 0.0054, Val Loss: 0.0005
Epoch [150/5000], Train Loss: 0.0045, Val Loss: 0.0005
Epoch [160/5000], Train Loss: 0.0045, Val Loss: 0.0004
Epoch [170/5000], Train Loss: 0.0056, Val Loss: 0.0005
Epoch [180/5000], Train Loss: 0.0036, Val Loss: 0.0006
Epoch [190/5000], T

# 提出形式

In [197]:
# 予測

# スケーリング
test_x = scaler.fit_transform(test.values)
# テストデータの形状を(16, 6, 1)に変更
test_x = test_x.reshape(test_x.shape[0], test_x.shape[1], 1)

# PyTorchのテンソルに変換
test_x = torch.FloatTensor(test_x)

# 予測実行
model.eval()  # モデルを評価モードに設定
with torch.no_grad():
    pred = model(test_x)

# 予測結果をnumpy配列に変換
pred = pred.numpy()

# スケール変換のために新しい配列を作成
scaled_pred = np.zeros((pred.shape[0], 6))  # (16, 6)の形状
scaled_pred[:, -1] = pred.flatten()  # 最後の列に予測値を設定

# スケーラーを使って予測値を元のスケールに戻す
scaled_pred = scaler.inverse_transform(scaled_pred)

# 最後の列（予測値）だけを取り出す
final_predictions = scaled_pred[:, -1]

# 結果の確認
print("最終予測結果の形状:", final_predictions.shape)
print("\n予測値:")
print(final_predictions)

最終予測結果の形状: (16,)

予測値:
[ 917.4505768   817.13749242  937.39630651  948.76724159 1102.65127572
 1023.53768176 1110.48132536 1217.05832581 1258.03890898 1247.34642289
 1092.50287642  972.64303589 1131.27090244 1240.19059209 1240.47824144
 1264.04138303]


In [198]:
# 提出形式の確認
pred = pd.DataFrame(final_predictions, columns=['Sun'])
# 形状が（16, 1）になっていることを確認して下さい。
print(pred.shape)
pred

(16, 1)


Unnamed: 0,Sun
0,917.450577
1,817.137492
2,937.396307
3,948.767242
4,1102.651276
5,1023.537682
6,1110.481325
7,1217.058326
8,1258.038909
9,1247.346423


In [199]:
# csv形式での提出をお願いします。
pred.to_csv('crypto_pred.csv')