
# üìà D·ª± ƒëo√°n gi√° ƒë√≥ng c·ª≠a Bitcoin v√†o ng√†y h√¥m sau

**M√¥n h·ªçc: Machine Learning**  
**Sinh vi√™n: [ƒêi·ªÅn t√™n b·∫°n]**  

Trong notebook n√†y, ch√∫ng ta s·∫Ω s·ª≠ d·ª•ng c√°c k·ªπ thu·∫≠t h·ªçc m√°y ƒë·ªÉ d·ª± ƒëo√°n gi√° ƒë√≥ng c·ª≠a (`close`) c·ªßa Bitcoin v√†o ng√†y h√¥m sau d·ª±a tr√™n d·ªØ li·ªáu l·ªãch s·ª≠ v√† c√°c y·∫øu t·ªë kinh t·∫ø nh∆∞ l√£i su·∫•t FED, ch·ªâ s·ªë USD, v.v.


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
import seaborn as sns


## 1. ƒê·ªçc v√† ti·ªÅn x·ª≠ l√Ω d·ªØ li·ªáu

In [None]:

# ƒê·ªçc d·ªØ li·ªáu
df = pd.read_csv("your_dataset.csv")  # Thay b·∫±ng t√™n file b·∫°n
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values('timestamp').reset_index(drop=True)

# T·∫°o nh√£n l√† close ng√†y h√¥m sau
df['target_close'] = df['close'].shift(-1)
df = df.dropna()

# Tr√≠ch xu·∫•t ƒë·∫∑c tr∆∞ng th·ªùi gian
df['day'] = df['timestamp'].dt.day
df['month'] = df['timestamp'].dt.month
df['weekday'] = df['timestamp'].dt.weekday

# T·∫°o th√™m m·ªôt s·ªë ƒë·∫∑c tr∆∞ng m·ªõi
df['range_high_low'] = df['high'] - df['low']
df['change_open_close'] = df['close'] - df['open']
df['close_pct_change'] = df['close'].pct_change().fillna(0)


## 2. Ch·ªçn ƒë·∫∑c tr∆∞ng v√† chu·∫©n h√≥a

In [None]:

# L∆∞u √Ω: 'year', 'month', 'weekday' l√† d·ªØ li·ªáu d·∫°ng th·ª© t·ª± (ordinal), th∆∞·ªùng ƒë∆∞·ª£c chu·∫©n h√≥a n·∫øu kh√¥ng d√πng tree-based models.
features = [
    'open', 'high', 'low', 'close', 'volume', 'marketCap',
    'FEDFUNDS', 'US_Dollar_Price', 'day', 'month', 'weekday',
    'range_high_low', 'change_open_close', 'close_pct_change'
]

X = df[features]
y = df['target_close']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## 3. T√°ch t·∫≠p train/test

In [None]:

# Kh√¥ng shuffle v√¨ ƒë√¢y l√† d·ªØ li·ªáu th·ªùi gian
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, shuffle=False)


## 4. Hu·∫•n luy·ªán v√† ƒë√°nh gi√° c√°c m√¥ h√¨nh

In [None]:

models = {
    "Gradient Boosting": GradientBoostingRegressor(n_estimators=100, random_state=42),
    "Random Forest": RandomForestRegressor(n_estimators=100, random_state=42),
    "Linear Regression": LinearRegression()
}

results = {}

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    results[name] = {"MSE": mse, "MAE": mae, "R2": r2}
    print(f"--- {name} ---")
    print(f"MSE: {mse:.2f} | MAE: {mae:.2f} | R2: {r2:.4f}\n")


## 5. So s√°nh k·∫øt qu·∫£ d·ª± ƒëo√°n

In [None]:

# V·∫Ω bi·ªÉu ƒë·ªì cho m√¥ h√¨nh t·ªët nh·∫•t (Gradient Boosting)
best_model = GradientBoostingRegressor(n_estimators=100, random_state=42)
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)

plt.figure(figsize=(12, 5))
plt.plot(y_test.values, label='Gi√° th·ª±c t·∫ø')
plt.plot(y_pred, label='Gi√° d·ª± ƒëo√°n (GBR)')
plt.title('So s√°nh gi√° Close th·ª±c t·∫ø v√† d·ª± ƒëo√°n')
plt.xlabel("Ng√†y")
plt.ylabel("Gi√° Close")
plt.legend()
plt.show()



## ‚úÖ K·∫øt lu·∫≠n

- D·ªØ li·ªáu ƒë∆∞·ª£c x·ª≠ l√Ω ƒë·∫ßy ƒë·ªß g·ªìm chu·∫©n h√≥a, t·∫°o ƒë·∫∑c tr∆∞ng ƒë·ªông v√† th·ªùi gian.
- Gradient Boosting Regressor cho k·∫øt qu·∫£ t·ªët nh·∫•t v·ªÅ sai s·ªë v√† ƒë·ªô ch√≠nh x√°c.
- C√≥ th·ªÉ c·∫£i thi·ªán th√™m b·∫±ng c√°ch:
  - Th√™m ƒë·∫∑c tr∆∞ng chu·ªói th·ªùi gian (`rolling mean`, `volatility`...)
  - D√πng m√¥ h√¨nh h·ªçc s√¢u nh∆∞ LSTM n·∫øu d·ªØ li·ªáu d√†i h∆°n

**üëâ ƒê√¢y l√† m·ªôt pipeline h·ªçc m√°y ƒë·∫ßy ƒë·ªß, ph√π h·ª£p ƒë·ªÉ n·ªôp cho b√†i t·∫≠p m√¥n h·ªçc.**
