<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Задача" data-toc-modified-id="Задача-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Задача</a></span></li><li><span><a href="#Загрузки" data-toc-modified-id="Загрузки-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Загрузки</a></span></li><li><span><a href="#EDA" data-toc-modified-id="EDA-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>EDA</a></span></li><li><span><a href="#Убираем-дырки" data-toc-modified-id="Убираем-дырки-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Убираем дырки</a></span></li><li><span><a href="#Строим-графики" data-toc-modified-id="Строим-графики-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Строим графики</a></span></li></ul></div>

<div class="alert alert-info">
<font size="4", color = "black"><b>✍ Вопрос</b></font>
    <br /> 
    <font size="3", color = "black">
<br /> Добрый день. Немного запутался в задаче и в разных результатах. Буду благодарен за помощь.


А еще будет отлично, если подскатеже по структуре и в целом, как это должно выглядеть в идеале.

# Предварительный анализ задачи торговой стратегии

## Задача

1. Загрузить данные о котировках ценных бумаг из списка SnP500 и котировки криптовалют (BTC, ETH, SOL, XRP).

2. Подготовьте автоматическое отображение графиков текущей ситуации.

3. Проверьть пропуски и ошибки. 

4. Проанализировать выбросы. Определить, что это: выбросы или реальные данные, с которыми предстоит работать.

## Загрузки

In [1]:
'''Системные'''
import os
from datetime import datetime, timedelta
from tqdm import tqdm 

'''База'''
import talib
import yfinance as yf
import pandas as pd
import numpy as np

'''Графики'''
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.graph_objects as go
import dash
from dash import dcc, html
import plotly.graph_objects as go

'''Обучение'''
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, accuracy_score
from catboost import CatBoostClassifier
import optuna




  from .autonotebook import tqdm as notebook_tqdm


In [2]:
tickers_crypt = ['BTC-USD', 'ETH-USD', 'SOL-USD', 'XRP-USD']
output_crypt_file = 'crypto_data.csv'
output_file = 'snp500_data.csv'
all_data = []
all_data_crypt = []
end_date = datetime.today().strftime('%Y-%m-%d')
start_date = (datetime.today() - timedelta(days=2 * 365)).strftime('%Y-%m-%d')

In [3]:
if os.path.exists(output_file):
    data = pd.read_csv(output_file, index_col=0)
    print("Данные успешно загружены:")
    display(data.head()) 

else:

    url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

    table = pd.read_html(url)[0]
    tickers = table['Symbol'].tolist()[:2]
    print(f"Всего {len(tickers)} тикеров.")

    for ticker in tqdm(tickers, desc="Загрузка данных", unit="тикер"):
        try:
            data_temp = yf.download(ticker, start=start_date, end=end_date, progress=False)
            if isinstance(data_temp.columns, pd.MultiIndex):
                data_temp.columns = data_temp.columns.droplevel([1])
            data_temp = data_temp.reset_index(drop=False)
#             if 'Price' in data_temp.index.names:
#                 data_temp.index = data_temp.index.droplevel('Price')
            data_temp.index.name = 'Price'
#             data_temp.rename_axis(None, inplace=True)
            data_temp.columns.name = None
            data_temp['Ticker'] = ticker
            data_temp = data_temp.reset_index(drop=True)
#             data_temp = data_temp.set_index('Date')
#             display(data_temp)
            all_data.append(data_temp)
        except Exception as e:
            print(f"Ошибка в тикере {ticker}: {e}")

    if all_data:
        data = pd.concat(all_data, axis=0, ignore_index=True)
        data.to_csv(output_file)
        print(f"Данные сохранены в {output_file}")
    else:
        print("Нет данных.")
        

Данные успешно загружены:


Unnamed: 0,Date,Close,High,Low,Open,Volume,Ticker
0,2022-12-27,92.099693,92.567009,91.287638,92.038408,2166195,MMM
1,2022-12-28,90.621147,92.697265,90.590508,92.199303,2345356,MMM
2,2022-12-29,92.367836,92.590005,90.782026,91.06548,2464717,MMM
3,2022-12-30,91.869873,91.954149,90.789687,91.663034,2506816,MMM
4,2023-01-03,93.82341,93.953648,92.214616,93.095624,3124909,MMM


In [4]:
if os.path.exists(output_crypt_file):
    data_crypt = pd.read_csv(output_crypt_file, index_col=0)
    print("Данные успешно загружены:")
    print(data_crypt.head())
else:
#     all_data_crypt = [] 
    for ticker in tqdm(tickers_crypt, desc="Загрузка данных", unit="тикер"):
        try:
            data_temp_crypt = yf.download(ticker, start=start_date, end=end_date, progress=False)
            
            if isinstance(data_temp_crypt.columns, pd.MultiIndex):
                data_temp_crypt.columns = data_temp_crypt.columns.droplevel([1])
            
            data_temp_crypt = data_temp_crypt.reset_index(drop=False)
            data_temp_crypt.columns.name = None
            data_temp_crypt['Ticker'] = ticker
            
#             print(ticker, ' ', len(data_temp_crypt))
            if not data_temp_crypt.empty:
                all_data_crypt.append(data_temp_crypt)
        except Exception as e:
            print(f"Ошибка в тикере {ticker}: {e}")

    if all_data_crypt: 
        data_crypt = pd.concat(all_data_crypt, axis=0, ignore_index=True)
        print("Данные объединены.")
    else:
        print("Нет данных.")

Загрузка данных: 100%|██████████| 4/4 [00:01<00:00,  3.32тикер/s]

Данные объединены.





In [5]:
df = pd.concat([data, data_crypt], axis=0, ignore_index=True)
df

Unnamed: 0,Date,Close,High,Low,Open,Volume,Ticker
0,2022-12-27,92.099693,92.567009,91.287638,92.038408,2166195,MMM
1,2022-12-28,90.621147,92.697265,90.590508,92.199303,2345356,MMM
2,2022-12-29,92.367836,92.590005,90.782026,91.065480,2464717,MMM
3,2022-12-30,91.869873,91.954149,90.789687,91.663034,2506816,MMM
4,2023-01-03,93.823410,93.953648,92.214616,93.095624,3124909,MMM
...,...,...,...,...,...,...,...
3915,2024-12-20 00:00:00,2.276886,2.346128,1.969547,2.248882,26858734292,XRP-USD
3916,2024-12-21 00:00:00,2.238112,2.381590,2.203753,2.276900,14987697762,XRP-USD
3917,2024-12-22 00:00:00,2.199332,2.290921,2.167088,2.237794,11275662705,XRP-USD
3918,2024-12-23 00:00:00,2.257262,2.268452,2.135262,2.199439,10144129668,XRP-USD


## EDA

In [6]:
def viewing_statistics(df_list):
    print('Посмотрим на данные:')
    for i in table:
        if len(i) >= 3:
            display(i.sample(3))
        else:
            display(i)
        display(i.info())
        display(i.columns)
        print('\n')
table = [df]   
viewing_statistics(table)

Посмотрим на данные:


Unnamed: 0,Date,Close,High,Low,Open,Volume,Ticker
568,2023-04-03,66.86293,67.183408,66.212269,66.921204,860100,AOS
703,2023-10-16,70.423363,70.580073,69.06191,69.532048,1238800,AOS
1211,2023-07-22 00:00:00,29771.802734,29991.615234,29664.121094,29908.697266,7873300598,BTC-USD


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3920 entries, 0 to 3919
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    3920 non-null   object 
 1   Close   3920 non-null   float64
 2   High    3920 non-null   float64
 3   Low     3920 non-null   float64
 4   Open    3920 non-null   float64
 5   Volume  3920 non-null   int64  
 6   Ticker  3920 non-null   object 
dtypes: float64(4), int64(1), object(2)
memory usage: 214.5+ KB


None

Index(['Date', 'Close', 'High', 'Low', 'Open', 'Volume', 'Ticker'], dtype='object')





In [7]:
print('Проверим пропуски:')
for i in table:
    display(i.isnull().mean().sort_values())

Проверим пропуски:


Date      0.0
Close     0.0
High      0.0
Low       0.0
Open      0.0
Volume    0.0
Ticker    0.0
dtype: float64

## Заполняем пропуски
пока нет, что бы не забыть оставлю.

In [8]:
df = df.fillna(method='ffill')
df = pd.DataFrame(df.replace(to_replace=0, method='ffill'))

  df = df.fillna(method='ffill')
  df = pd.DataFrame(df.replace(to_replace=0, method='ffill'))


In [9]:
# pip install nbformat

## Строим графики

In [10]:
fig = go.Figure()

for ticker in df['Ticker'].unique():
    _ = df[df['Ticker'] == ticker].copy()
    
    '''Убираю ошибку с лишней линией, проверю дубликаты и пропуски после'''
    _ = _.drop_duplicates(subset=['Date']).sort_values(by='Date')
    if _['Close'].isnull().any():
        _['Close'] = _['Close'].fillna(method='ffill')  # Заполнение пропусков предыдущим значением
    

    _['Growth'] = (_['Close'] / _['Close'].iloc[0]) * 100  # Нормализация, первое значение = 100%
    fig.add_trace(go.Scatter(
        x=_['Date'],
        y=_['Growth'],
        mode='lines',
        name=ticker,
    ))

fig.update_layout(
    template="plotly_dark",
    title="Темпы прироста всех тикеров (нормализация к 100%)",
    title_x=0.5,
    xaxis_title="Дата",
    yaxis_title="Темп прироста (%)",
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False),
    font=dict(size=14),
)

fig.show()


In [11]:
for ticker in df['Ticker'].unique():
    _ = df[df['Ticker'] == ticker]
    
    '''Убираю ошибку с лишней линией, проверю дубликаты и пропуски после'''
    _ = _.drop_duplicates(subset=['Date']).sort_values(by='Date')
    if _['Close'].isnull().any():
        _['Close'] = _['Close'].fillna(method='ffill')  #'''Заполнение пропусков предыдущим значением'''
    
    '''Вычисление максимума и минимума'''
    max_row = _.loc[_['Close'].idxmax()]
    min_row = _.loc[_['Close'].idxmin()]
    

    fig = px.line(
        _,
        x='Date',
        y='Close',
        title=f'Временной ряд для {ticker}',
        labels={'Close': 'Цена закрытия', 'Date': 'Дата'},
    )
    
    fig.add_annotation(
        x=max_row['Date'],
        y=max_row['Close'],
        text=f"Макс: {max_row['Close']:.2f}",
        showarrow=True,
        arrowhead=2,
        ax=20,
        ay=-30,
        bgcolor="green",
        font=dict(color="white"),
    )
    fig.add_annotation(
        x=min_row['Date'],
        y=min_row['Close'],
        text=f"Мин: {min_row['Close']:.2f}",
        showarrow=True,
        arrowhead=2,
        ax=20,
        ay=30,
        bgcolor="red",
        font=dict(color="white"),
    )
    

    fig.update_layout(
        template="plotly_dark",
        title=dict(x=0.5),
        xaxis=dict(showgrid=False),
        yaxis=dict(showgrid=False),
        font=dict(size=14),
    )

    fig.show()

In [12]:
df['Ticker'].unique()


array(['MMM', 'AOS', 'BTC-USD', 'ETH-USD', 'SOL-USD', 'XRP-USD'],
      dtype=object)

<div class="alert alert-info">
<font size="4", color = "black"><b>✍ Вопрос</b></font>
    <br /> 
    <font size="3", color = "black">
<br /> Добавил данные из теории, что бы на них опереться, по итогу я их не использовал, но пока решил оставить в коде.

In [13]:
macd_df = df[df['Ticker'] == df['Ticker'].unique().tolist()[1]]
macd_df['Date'] = pd.to_datetime(macd_df['Date'])
macd_df.set_index('Date', inplace=True)
macd_df['MACD'], macd_df['MACD_Signal'], macd_df['MACD_Hist'] = talib.MACD(macd_df["Close"], fastperiod=12, slowperiod=27, signalperiod=9)

'''Инициализация сигналов'''
macd_df['Signal'] = 0

'''Логика сигналов'''
macd_df.loc[macd_df['MACD'] > macd_df['MACD_Signal'], 'Signal'] = 1
macd_df.loc[macd_df['MACD'] < macd_df['MACD_Signal'], 'Signal'] = -1


'''Сигналы позиций'''
buy_signals = macd_df[macd_df['Signal'] == 1]
sell_signals = macd_df[macd_df['Signal'] == -1]



'''Создание графика Plotly'''
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, 
                    vertical_spacing=0.2, 
                    subplot_titles=('Price with Buy/Sell Signals', 'MACD'))

'''Добавление линии цены закрытия'''
fig.add_trace(go.Scatter(x=macd_df.index, y=macd_df['Close'], mode='lines', name='Close Price'), row=1, col=1)

'''Добавление точек для Long сигналов'''
fig.add_trace(go.Scatter(x=buy_signals.index, y=buy_signals['Close'], mode='markers', 
                         marker=dict(color='green', size=5), name='Buy Signal'), row=1, col=1)

'''Добавление точек для Short сигналов'''
fig.add_trace(go.Scatter(x=sell_signals.index, y=sell_signals['Close'], mode='markers', 
                         marker=dict(color='red', size=5), name='Sell Signal'), row=1, col=1)

'''Добавление MACD линии'''
fig.add_trace(go.Scatter(x=df.index, y=macd_df['MACD'], mode='lines', name='MACD', line=dict(color='blue')), row=2, col=1)

'''Добавление сигнальной линии MACD'''
fig.add_trace(go.Scatter(x=df.index, y=macd_df['MACD_Signal'], mode='lines', name='MACD Signal', line=dict(color='orange')), row=2, col=1)

'''Настройки оформления графика'''
fig.update_layout(title='Price and MACD with Buy/Sell Signals',
                  xaxis_title='Date',
                  yaxis_title='Price',
                  xaxis2_title='Date',
                  yaxis2_title='MACD',
                  legend=dict(x=1, y=1),
                  width=800,
                  height=600
                  
)
# Показать график
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [14]:
tema_df = df[df['Ticker'] == df['Ticker'].unique().tolist()[3]]
tema_df['Date'] = pd.to_datetime(tema_df['Date'])
tema_df.set_index('Date', inplace=True)
period = 30

'''EMA: Простая EMA от цены'''
tema_df['EMA'] = talib.EMA(tema_df['Close'], timeperiod=period)

'''Удаление NaN значений, что бы не ломалось, потом нужно будет сделать заполнение .fillna(method='ffill')'''
tema_df = tema_df.dropna()

'''Инициализация сигналов'''
tema_df['Signal'] = 0

'''Логика для входа в короткую позицию (Short)'''
tema_df.loc[tema_df['Close'] < tema_df['EMA'], 'Signal'] = -1

'''Логика для входа в длинную позицию (Long)'''
tema_df.loc[tema_df['Close'] > tema_df['EMA'], 'Signal'] = 1

buy_signals = tema_df[tema_df['Signal'] == 1]
sell_signals = tema_df[tema_df['Signal'] == -1]

'''Визуализация сигналов'''
fig = go.Figure()
fig.add_trace(go.Scatter(x=tema_df.index, y=tema_df['Close'], mode='lines', name='Close Price'))
fig.add_trace(go.Scatter(x=tema_df.index, y=tema_df['EMA'], mode='lines', name='EMA', line=dict(color='blue')))
fig.add_trace(go.Scatter(x=buy_signals.index, y=buy_signals['Close'], mode='markers', marker=dict(color='green', size=10), name='Buy Signal'))
fig.add_trace(go.Scatter(x=sell_signals.index, y=sell_signals['Close'], mode='markers', marker=dict(color='red', size=10), name='Sell Signal'))
fig.update_layout(title='Simple EMA Strategy', xaxis_title='Date', yaxis_title='Price')
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [15]:
tema_df = df[df['Ticker'] == df['Ticker'].unique().tolist()[3]]
tema_df['Date'] = pd.to_datetime(tema_df['Date'])
tema_df.set_index('Date', inplace=True)

'''Параметры для двух EMA'''
fast_period = 3  # Период быстрой EMA
slow_period = 25  # Период медленной EMA

'''Расчет двух EMA'''
tema_df['Fast_EMA'] = talib.EMA(tema_df['Close'], timeperiod=fast_period)
tema_df['Slow_EMA'] = talib.EMA(tema_df['Close'], timeperiod=slow_period)

'''Удаление NaN значений, сделаем пока в качестве заглушки, но нужно думать'''
tema_df = tema_df.dropna()

'''Инициализация сигналов'''
tema_df['Signal'] = 0

'''Логика для входа в длинную позицию (Long) — когда быстрая EMA пересекает медленную EMA снизу вверх'''
tema_df.loc[tema_df['Fast_EMA'] > tema_df['Slow_EMA'], 'Signal'] = 1

'''Логика для входа в короткую позицию (Short) — когда быстрая EMA пересекает медленную EMA сверху вниз'''
tema_df.loc[tema_df['Fast_EMA'] < tema_df['Slow_EMA'], 'Signal'] = -1

'''Сигналы на покупку и продажу'''
buy_signals = tema_df[tema_df['Signal'] == 1]
sell_signals = tema_df[tema_df['Signal'] == -1]

'''Визуализация сигналов'''
fig = go.Figure()
fig.add_trace(go.Scatter(x=tema_df.index, y=tema_df['Close'], mode='lines', name='Close Price'))
fig.add_trace(go.Scatter(x=tema_df.index, y=tema_df['Fast_EMA'], mode='lines', name='Fast EMA', line=dict(color='blue')))
fig.add_trace(go.Scatter(x=tema_df.index, y=tema_df['Slow_EMA'], mode='lines', name='Slow EMA', line=dict(color='red')))
fig.add_trace(go.Scatter(x=buy_signals.index, y=buy_signals['Close'], mode='markers', marker=dict(color='green', size=10), name='Buy Signal'))
fig.add_trace(go.Scatter(x=sell_signals.index, y=sell_signals['Close'], mode='markers', marker=dict(color='red', size=10), name='Sell Signal'))
fig.update_layout(title='EMA Crossover Strategy', xaxis_title='Date', yaxis_title='Price')
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [16]:
'''Создание дополнительных индикаторов для анализа'''
tema_df['RSI'] = talib.RSI(tema_df['Close'], timeperiod=14)  # Индикатор определяет отношение среднего роста к среднему падению цены актива
tema_df['MACD'], tema_df['MACD_signal'], _ = talib.MACD(tema_df['Close'], fastperiod=12, slowperiod=26, signalperiod=9)

'''Удаляем NaN значения'''
tema_df = tema_df.dropna()

'''Добавляем целевую переменную (1 - рост, 0 - падение)'''
tema_df['Target'] = np.where(tema_df['Close'].shift(-1) > tema_df['Close'], 1, 0)

'''Разделение данных на обучающую, тестовую и валидационную выборки'''
train_ratio = 0.7
val_ratio = 0.2
test_ratio = 0.1

n = len(tema_df)
train_end = int(n * train_ratio)
val_end = train_end + int(n * val_ratio)

train_data = tema_df.iloc[:train_end]
val_data = tema_df.iloc[train_end:val_end]
test_data = tema_df.iloc[val_end:]

'''Выделение признаков и целевой переменной'''
features = ['Fast_EMA', 'Slow_EMA', 'RSI', 'MACD', 'MACD_signal']
X_train, y_train = train_data[features], train_data['Target']
X_val, y_val = val_data[features], val_data['Target']
X_test, y_test = test_data[features], test_data['Target']

'''Создание и обучение модел'''
rf_model = RandomForestClassifier(random_state=42)
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}
grid_search = GridSearchCV(rf_model, param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

'''Тестирование моделей'''
y_pred_val = best_model.predict(X_val)
print("Validation Accuracy:", accuracy_score(y_val, y_pred_val))
print(classification_report(y_val, y_pred_val))

y_pred_test = best_model.predict(X_test)
print("Test Accuracy:", accuracy_score(y_test, y_pred_test))
print(classification_report(y_test, y_pred_test))

'''Визуализация стратегии'''
val_data['Prediction'] = y_pred_val
val_data['Strategy_Returns'] = np.where(val_data['Prediction'] == 1, 
                                        val_data['Close'].pct_change(), 0).cumsum()

'''Дашборд с Dash'''
app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1("Торговые стратегии: Эффективность"),
    dcc.Graph(
        id="performance_chart",
        figure={
            'data': [
                go.Scatter(
                    x=val_data.index, 
                    y=val_data['Strategy_Returns'], 
                    mode='lines', 
                    name='Strategy Returns'
                ),
                go.Scatter(
                    x=val_data.index, 
                    y=val_data['Close'].pct_change().cumsum(), 
                    mode='lines', 
                    name='Market Returns'
                )
            ],
            'layout': go.Layout(
                title="Доходность стратегии против рынка",
                xaxis_title="Дата",
                yaxis_title="Кумулятивная доходность",
            )
        }
    )
])

if __name__ == '__main__':
    app.run_server(debug=True)

Validation Accuracy: 0.5223880597014925
              precision    recall  f1-score   support

           0       0.55      0.27      0.36        67
           1       0.51      0.78      0.62        67

    accuracy                           0.52       134
   macro avg       0.53      0.52      0.49       134
weighted avg       0.53      0.52      0.49       134

Test Accuracy: 0.5294117647058824
              precision    recall  f1-score   support

           0       0.50      0.31      0.38        32
           1       0.54      0.72      0.62        36

    accuracy                           0.53        68
   macro avg       0.52      0.52      0.50        68
weighted avg       0.52      0.53      0.51        68





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [17]:
'''Подготовка данных'''
tema_df['RSI'] = talib.RSI(tema_df['Close'], timeperiod=14)  # Индикатор RSI
tema_df['MACD'], tema_df['MACD_signal'], _ = talib.MACD(tema_df['Close'], fastperiod=12, slowperiod=26, signalperiod=9)

'''Удаляем NaN значения'''
tema_df = tema_df.dropna()

'''Добавляем целевую переменную (1 - рост, 0 - падение)'''
tema_df['Target'] = np.where(tema_df['Close'].shift(-1) > tema_df['Close'], 1, 0)

'''Разделение данных на обучающую, тестовую и валидационную выборки'''
train_ratio = 0.7
val_ratio = 0.2
test_ratio = 0.1

n = len(tema_df)
train_end = int(n * train_ratio)
val_end = train_end + int(n * val_ratio)

train_data = tema_df.iloc[:train_end]
val_data = tema_df.iloc[train_end:val_end]
test_data = tema_df.iloc[val_end:]

'''Выделение признаков и целевой переменной'''
features = ['Fast_EMA', 'Slow_EMA', 'RSI', 'MACD', 'MACD_signal']
X_train, y_train = train_data[features], train_data['Target']
X_val, y_val = val_data[features], val_data['Target']
X_test, y_test = test_data[features], test_data['Target']

'''Оптимизация гиперпараметров через Optuna'''
def objective(trial):
    params = {
        'iterations': trial.suggest_int('iterations', 100, 1000),
        'depth': trial.suggest_int('depth', 4, 10),
        'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.3),
        'l2_leaf_reg': trial.suggest_loguniform('l2_leaf_reg', 1, 10),
        'random_strength': trial.suggest_uniform('random_strength', 1e-9, 10),
        'border_count': trial.suggest_int('border_count', 32, 255),
        'loss_function': 'Logloss',
        'random_seed': 42,
        'logging_level': 'Silent'
    }

    model = CatBoostClassifier(**params)
    model.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=50, verbose=False)

    preds = model.predict(X_val)
    return accuracy_score(y_val, preds)

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

best_params = study.best_params

'''Обучение модели с лучшими параметрами'''
model = CatBoostClassifier(**best_params)
model.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=50, verbose=False)

'''Тестирование модели'''
y_pred_val = model.predict(X_val)
print("Validation Accuracy:", accuracy_score(y_val, y_pred_val))
print(classification_report(y_val, y_pred_val))

y_pred_test = model.predict(X_test)
print("Test Accuracy:", accuracy_score(y_test, y_pred_test))
print(classification_report(y_test, y_pred_test))

'''ЦУчет затрат и визуализация стратегии'''
val_data['Prediction'] = y_pred_val
commission_buy = 0.0003
commission_sell = 0.0002

def calculate_strategy_returns(data):
    returns = [0]  # Начальное значение для первого элемента
    for i in range(1, len(data)):
        if data['Prediction'].iloc[i - 1] == 1:  # Покупка
            trade_return = data['Close'].iloc[i] / data['Close'].iloc[i - 1] - 1
            trade_return -= commission_buy + commission_sell
            returns.append(trade_return)
        else:
            returns.append(0)
    return np.cumsum(returns)

val_data['Strategy_Returns'] = calculate_strategy_returns(val_data)
val_data['Market_Returns'] = val_data['Close'].pct_change().cumsum()

'''Дашборд Dash'''
app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1("Торговые стратегии: Эффективность"),
    dcc.Graph(
        id="performance_chart",
        figure={
            'data': [
                go.Scatter(
                    x=val_data.index, 
                    y=val_data['Strategy_Returns'], 
                    mode='lines', 
                    name='Strategy Returns'
                ),
                go.Scatter(
                    x=val_data.index, 
                    y=val_data['Market_Returns'], 
                    mode='lines', 
                    name='Market Returns'
                )
            ],
            'layout': go.Layout(
                title="Доходность стратегии против рынка",
                xaxis_title="Дата",
                yaxis_title="Кумулятивная доходность",
            )
        }
    )
])

if __name__ == '__main__':
    app.run_server(debug=True)


[I 2024-12-26 13:15:48,217] A new study created in memory with name: no-name-2c32bf14-151d-4060-bfa1-65b96bfbaacc

suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_uniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float instead.

[I 2024-12-26 13:15:48,565] Trial 0 finished with value: 0.5118110236220472 and parameters: {'iterations': 998, 'depth': 6, 'learning_rate': 0.14480151948529785, 'l2_leaf_reg': 1.0074826631335627, 'random_strength': 8.28087006332455, 'border_count': 138}. Best is trial 0 with value: 0.5118110236220472.

suggest_loguniform has 

Validation Accuracy: 0.5118110236220472
              precision    recall  f1-score   support

           0       0.67      0.06      0.11        64
           1       0.50      0.97      0.66        63

    accuracy                           0.51       127
   macro avg       0.59      0.52      0.39       127
weighted avg       0.59      0.51      0.39       127

Test Accuracy: 0.5384615384615384
              precision    recall  f1-score   support

           0       0.52      0.45      0.48        31
           1       0.55      0.62      0.58        34

    accuracy                           0.54        65
   macro avg       0.54      0.53      0.53        65
weighted avg       0.54      0.54      0.54        65





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [19]:
tema_df['RSI'] = talib.RSI(tema_df['Close'], timeperiod=14)  # Индикатор RSI
tema_df['MACD'], tema_df['MACD_signal'], _ = talib.MACD(tema_df['Close'], fastperiod=3, slowperiod=12, signalperiod=9)

'''Удаляем NaN значения'''
tema_df = tema_df.dropna()

'''Добавляем целевую переменную (1 - рост, 0 - падение)'''
tema_df['Target'] = np.where(tema_df['Close'].shift(-1) > tema_df['Close'], 1, 0)

'''Функция для скользящего обучения и тестирования'''
commission_buy = 0.0003
commission_sell = 0.0002

def sliding_window_backtest(data, features, period_train=90, period_val=10, period_test=30):
    start = 0
    strategy_returns = []
    market_returns = []

    while start + period_train + period_val + period_test <= len(data):
        '''Разделение данных на обучающую, валидационную и тестовую выборки'''
        train_data = data.iloc[start:start + period_train]
        val_data = data.iloc[start + period_train:start + period_train + period_val]
        test_data = data.iloc[start + period_train + period_val:start + period_train + period_val + period_test]

        X_train, y_train = train_data[features], train_data['Target']
        X_val, y_val = val_data[features], val_data['Target']
        X_test, y_test = test_data[features], test_data['Target']

        '''Оптимизация гиперпараметров через Optuna'''
        def objective(trial):
            params = {
                'iterations': trial.suggest_int('iterations', 100, 1000),
                'depth': trial.suggest_int('depth', 4, 10),
                'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.3),
                'l2_leaf_reg': trial.suggest_loguniform('l2_leaf_reg', 1, 10),
                'random_strength': trial.suggest_uniform('random_strength', 1e-9, 10),
                'border_count': trial.suggest_int('border_count', 32, 255),
                'loss_function': 'Logloss',
                'random_seed': 42,
                'logging_level': 'Silent'
            }

            model = CatBoostClassifier(**params)
            model.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=50, verbose=False)

            preds = model.predict(X_val)
            return accuracy_score(y_val, preds)

        study = optuna.create_study(direction='maximize')
        study.optimize(objective, n_trials=50)

        best_params = study.best_params

        '''Обучение модели с лучшими параметрами'''
        model = CatBoostClassifier(**best_params)
        model.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=50, verbose=False)

        '''Тестирование модели'''
        test_data['Prediction'] = model.predict(X_test)

        '''Расчет доходности стратегии'''
        def calculate_returns(data):
            returns = []
            for i in range(1, len(data)):
                if data['Prediction'].iloc[i - 1] == 1:  # Покупка
                    trade_return = data['Close'].iloc[i] / data['Close'].iloc[i - 1] - 1
                    trade_return -= commission_buy + commission_sell
                    returns.append(trade_return)
                else:
                    returns.append(0)
            return returns

        strategy_returns.extend(calculate_returns(test_data))
        market_returns.extend(test_data['Close'].pct_change().fillna(0).tolist())

        '''Сдвиг окна'''
        start += period_test

    return np.cumsum(strategy_returns), np.cumsum(market_returns)

'''Запуск цепного расчета'''
features = ['Fast_EMA', 'Slow_EMA', 'RSI', 'MACD', 'MACD_signal']
strategy_returns, market_returns = sliding_window_backtest(tema_df, features)

'''Дашборд Dash'''
app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1("Торговые стратегии: Скользящее обучение"),
    dcc.Graph(
        id="performance_chart",
        figure={
            'data': [
                go.Scatter(
                    x=np.arange(len(strategy_returns)), 
                    y=strategy_returns, 
                    mode='lines', 
                    name='Strategy Returns'
                ),
                go.Scatter(
                    x=np.arange(len(market_returns)), 
                    y=market_returns, 
                    mode='lines', 
                    name='Market Returns'
                )
            ],
            'layout': go.Layout(
                title="Доходность стратегии против рынка (скользящее обучение)",
                xaxis_title="Период",
                yaxis_title="Кумулятивная доходность",
            )
        }
    )
])

if __name__ == '__main__':
    app.run_server(debug=True)

[I 2024-12-26 13:18:46,397] A new study created in memory with name: no-name-38608849-bd88-49e4-bdd8-8cea60ce49af

suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_uniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float instead.

[I 2024-12-26 13:18:47,180] Trial 0 finished with value: 0.7 and parameters: {'iterations': 265, 'depth': 7, 'learning_rate': 0.011093782077792994, 'l2_leaf_reg': 1.3417962803056311, 'random_strength': 9.770733201445946, 'border_count': 226}. Best is trial 0 with value: 0.7.

suggest_loguniform has been deprecated in v3.0.0. T

In [20]:
'''Индикатор RSI'''
tema_df['RSI'] = talib.RSI(tema_df['Close'], timeperiod=14)
tema_df['MACD'], tema_df['MACD_signal'], _ = talib.MACD(tema_df['Close'], fastperiod=3, slowperiod=12, signalperiod=9)

'''Удаляем NaN значения'''
tema_df = tema_df.dropna()

'''Добавляем целевую переменную (1 - рост, 0 - падение)'''
tema_df['Target'] = np.where(tema_df['Close'].shift(-1) > tema_df['Close'], 1, 0)

'''Функция для скользящего обучения и тестирования'''
commission_buy = 0.0003
commission_sell = 0.0002

def sliding_window_backtest(data, features, period_train=90, period_val=10, period_test=30):
    start = 0
    strategy_returns = []
    market_returns = []

    while start + period_train + period_val + period_test <= len(data):
        '''Разделение данных на обучающую, валидационную и тестовую выборки'''
        train_data = data.iloc[start:start + period_train]
        val_data = data.iloc[start + period_train:start + period_train + period_val]
        test_data = data.iloc[start + period_train + period_val:start + period_train + period_val + period_test]

        X_train, y_train = train_data[features], train_data['Target']
        X_val, y_val = val_data[features], val_data['Target']
        X_test, y_test = test_data[features], test_data['Target']

        '''Оптимизация гиперпараметров через Optuna'''
        def objective(trial):
            params = {
                'iterations': trial.suggest_int('iterations', 100, 1000),
                'depth': trial.suggest_int('depth', 4, 10),
                'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.3),
                'l2_leaf_reg': trial.suggest_loguniform('l2_leaf_reg', 1, 10),
                'random_strength': trial.suggest_uniform('random_strength', 1e-9, 10),
                'border_count': trial.suggest_int('border_count', 32, 255),
                'loss_function': 'Logloss',
                'random_seed': 42,
                'logging_level': 'Silent'
            }

            model = CatBoostClassifier(**params)
            model.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=50, verbose=False)

            preds = model.predict(X_val)
            return accuracy_score(y_val, preds)

        study = optuna.create_study(direction='maximize')
        study.optimize(objective, n_trials=50)

        best_params = study.best_params

        '''Обучение модели с лучшими параметрами'''
        model = CatBoostClassifier(**best_params)
        model.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=50, verbose=False)

        '''Тестирование модели'''
        test_data['Prediction'] = model.predict(X_test)

        '''Расчет доходности стратегии'''
        def calculate_returns(data):
            returns = []
            for i in range(1, len(data)):
                if data['Prediction'].iloc[i - 1] == 1:  # Покупка
                    trade_return = data['Close'].iloc[i] / data['Close'].iloc[i - 1] - 1
                    trade_return -= commission_buy + commission_sell
                    returns.append(trade_return)
                else:
                    returns.append(0)
            return returns

        strategy_returns.extend(calculate_returns(test_data))
        market_returns.extend(test_data['Close'].pct_change().fillna(0).tolist())

        '''Сдвиг окна'''
        start += period_test

    return np.cumsum(strategy_returns), np.cumsum(market_returns)

'''Запуск цепного расчета'''
features = ['Fast_EMA', 'Slow_EMA', 'RSI', 'MACD', 'MACD_signal']
strategy_returns, market_returns = sliding_window_backtest(tema_df, features)

'''Промежуточные результаты'''
if len(strategy_returns) == 0:
    print("Стратегия не сработала, нет данных для доходности стратегии.")
else:
    print("Кумулятивная доходность стратегии:", strategy_returns[-1])

if len(market_returns) == 0:
    print("Нет данных для доходности рынка.")
else:
    print("Кумулятивная доходность рынка:", market_returns[-1])

'''Визуализация результатов Plotly'''
if len(strategy_returns) > 0 and len(market_returns) > 0:
    fig = go.Figure()

    '''Добавление графика доходности стратегии'''
    fig.add_trace(go.Scatter(
        x=np.arange(len(strategy_returns)), 
        y=strategy_returns, 
        mode='lines', 
        name='Strategy Returns'
    ))

    '''Добавление графика доходности рынка'''
    fig.add_trace(go.Scatter(
        x=np.arange(len(market_returns)), 
        y=market_returns, 
        mode='lines', 
        name='Market Returns'
    ))

    '''Настройка макета графика'''
    fig.update_layout(
        title="Доходность стратегии против рынка (скользящее обучение)",
        xaxis_title="Период",
        yaxis_title="Кумулятивная доходность",
        legend=dict(x=0, y=1)
    )

    '''Отображение графика'''
    fig.show()


[I 2024-12-26 13:25:05,527] A new study created in memory with name: no-name-b85ddcc3-1011-4436-b796-68549a2cfa6f

suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_uniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float instead.

[I 2024-12-26 13:25:05,896] Trial 0 finished with value: 0.5 and parameters: {'iterations': 950, 'depth': 8, 'learning_rate': 0.011546197391874232, 'l2_leaf_reg': 1.8133325813470724, 'random_strength': 1.615677850019351, 'border_count': 191}. Best is trial 0 with value: 0.5.

suggest_loguniform has been deprecated in v3.0.0. T

Кумулятивная доходность стратегии: -0.05648373605049545
Кумулятивная доходность рынка: 1.0121408959789675




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [21]:
tema_df['RSI'] = talib.RSI(tema_df['Close'], timeperiod=14)  # Индикатор RSI
tema_df['MACD'], tema_df['MACD_signal'], _ = talib.MACD(tema_df['Close'], fastperiod=12, slowperiod=26, signalperiod=9)
tema_df['SMA'] = talib.SMA(tema_df['Close'], timeperiod=20)  # Добавляем SMA

tema_df = tema_df.dropna()

'''Добавляем целевую переменную (1 - рост, 0 - падение)'''
tema_df['Target'] = np.where(tema_df['Close'].shift(-1) > tema_df['Close'], 1, 0)

# Функция для скользящего обучения и тестирования
commission_buy = 0.0003
commission_sell = 0.0002

def train_and_evaluate_model(data, features, model_name):
    strategy_returns, market_returns = sliding_window_backtest(data, features)
    return {
        'model_name': model_name,
        'strategy_returns': strategy_returns,
        'market_returns': market_returns
    }

'''Построение трех моделей на основе паттернов'''
def build_models(data):
    models = []

    '''Модель с RSI'''
    models.append(train_and_evaluate_model(data, ['RSI'], 'RSI-Based Model'))

    '''Модель с MACD'''
    models.append(train_and_evaluate_model(data, ['MACD', 'MACD_signal'], 'MACD-Based Model'))

    '''Модель с SMA'''
    models.append(train_and_evaluate_model(data, ['SMA'], 'SMA-Based Model'))

    return models

'''Обучение моделей'''
models_results = build_models(tema_df)

'''Построение дашборда для сравнения стратегий'''
app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1("Сравнение торговых стратегий"),
    dcc.Graph(
        id="performance_chart",
        figure={
            'data': [
                go.Scatter(
                    x=np.arange(len(model['strategy_returns'])), 
                    y=model['strategy_returns'], 
                    mode='lines', 
                    name=f"{model['model_name']} Strategy Returns"
                ) for model in models_results
            ] + [
                go.Scatter(
                    x=np.arange(len(model['market_returns'])), 
                    y=model['market_returns'], 
                    mode='lines', 
                    name=f"{model['model_name']} Market Returns"
                ) for model in models_results
            ],
            'layout': go.Layout(
                title="Эффективность стратегий во времени",
                xaxis_title="Период",
                yaxis_title="Кумулятивная доходность",
            )
        }
    )
])

if __name__ == '__main__':
    app.run_server(debug=True)

[I 2024-12-26 14:46:59,289] A new study created in memory with name: no-name-9c36521f-1884-4e4e-b509-ef2136a6ae85

suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_loguniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float(..., log=True) instead.


suggest_uniform has been deprecated in v3.0.0. This feature will be removed in v6.0.0. See https://github.com/optuna/optuna/releases/tag/v3.0.0. Use suggest_float instead.

[I 2024-12-26 14:46:59,488] Trial 0 finished with value: 0.5 and parameters: {'iterations': 844, 'depth': 5, 'learning_rate': 0.2805436656706344, 'l2_leaf_reg': 4.36359398326834, 'random_strength': 9.682990661369011, 'border_count': 160}. Best is trial 0 with value: 0.5.

suggest_loguniform has been deprecated in v3.0.0. This 