# Collecting the data
First we have to get all the needed data. We will use web3py with Pancake bet contract (0x0E3A8078EDD2021dadcdE733C6b4a86E51EE8f07) and requests with bscscan API.

Add needed modules folders:

In [None]:
import sys
import os

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

if module_path + '/src' not in sys.path:
    sys.path.append(module_path + '/src')

# os.chdir("..")

print(os.getcwd())

## Getting active players
Use get_active_players.py main() with start and end time. It may be required to change the main function loop to iterate through more past transactions.
Also change CURRENT_BLOCK_NUMBER to the current one - to start checking transactions before this block.

In [None]:
import datetime

import get_active_players

# The same timeframe as for the training set
time_from_active_players = int(datetime.datetime(2022, 10, 28, 0, 0).timestamp())
time_to_active_players = int(datetime.datetime(2022, 12, 28, 0, 0).timestamp())
active_players_file = f'../data/active_players_2months.csv'

get_active_players.make_active_players_file(time_from_active_players, time_to_active_players, 70, active_players_file)

List of active players and their occurence within given timeframe:

In [None]:
import pandas as pd
active_players_df = pd.read_csv(active_players_file, index_col=False)

active_players_df

Let's select only players that have made more than 100 bets in the selected timeframe.

In [None]:
# List of players to download data of (you can get them from the "Getting active players" section or from the Pancake Prediction leaderboard)
main_wallets_list = active_players_df.loc[active_players_df['count'] > 100, 'player'].tolist()
print(f"Selected {len(main_wallets_list)} wallets")


## Downloading and preparing the data

### All needed variables

In [None]:
# Timeframe to do all the computing (Training + Validation + Test sets)
time_from_get_bets = int(datetime.datetime(2022, 10, 28, 0, 0).timestamp())
time_to_get_bets = int(datetime.datetime(2023, 1, 28, 0, 0).timestamp())

  # to get all the transactions from bscscan api (depends on the time_from_get_bets)
block_from_get_bets = 22394098
block_to_get_bets = 25624791

# Downloading players bet history settings
players_data_folder = '../data/players_data/'

# Download rounds settings (check the rounds range according to the time_from_get_bets and time_to_get_bets values in the blockchain)
from_round = 36000
to_round = 65000
rounds_file = '../data/rounds_data/final_rounds_data.csv'

# Final data
final_data_folder = '../data/merged_data/'

### Downloading players bet history
We have to collect every needed player transaction data and save to separate JSON files.

In [None]:
import download_players_bets
print(len(main_wallets_list))
download_players_bets.main_concurrent(main_wallets_list, players_data_folder, block_from_get_bets, block_to_get_bets)

### Downloading rounds history
Use web3py to download all the rounds info and save it to the file.

In [None]:
import download_rounds
downloader = download_rounds.RoundsDownloader()

data = downloader.download_rounds(from_round, to_round)  # Download rounds FROM-TO, we only need the last few months
downloader.save_rounds(rounds_file, data)

## Preparing the data
Before using all the data we have to create dataframes in CSV files with all the needed stuff in one place
The data will be stored in 2 saparate CSV files - one containing the info about the bet type of each analyzed player, the other one about the bet size

In [None]:
import analyze_players

rounds_df = analyze_players.load_rounds_data(rounds_file)
    
analyze_players.create_final_csv_files(players_data_folder, final_data_folder, rounds_df)

final_bet_amount.csv file contains data about all the rounds and separate columns for each players bet amount

In [None]:
final_bet_amount_df = pd.read_csv(f'{final_data_folder}final_bet_amount.csv')
final_bet_amount_df

final_bet_amount.csv file contains data about all the rounds and separate columns for each players bet amount

In [None]:
final_player_bet_df = pd.read_csv(f'{final_data_folder}final_player_bet.csv', low_memory=False)
final_player_bet_df

## Selecting the best players
We have collected the data about the active players in the given dataframe, but not all players are worth following. Let's check what players have high win ratio or made the most money

### Selecting the timeframe for training&validation&test sets
We have downloaded the data in timeframe:
2022.10.28 - 2023.01.28

Let's make the first 2 months to be the training set, 1/2 month validation set and 1/2 month test set

In [None]:
import datetime
import pandas as pd

time_from_training = int(datetime.datetime(2022, 10, 28, 0, 0).timestamp())
time_to_training = int(datetime.datetime(2022, 12, 28, 0, 0).timestamp())

time_from_validation = int(datetime.datetime(2022, 12, 28, 0, 0).timestamp())
time_to_validation = int(datetime.datetime(2023, 1, 13, 0, 0).timestamp())

time_from_test = int(datetime.datetime(2023, 1, 13, 0, 0).timestamp())
time_to_test = int(datetime.datetime(2023, 1, 28, 0, 0).timestamp())


### Players metrics
Let's check how many bets were actually good for each player (win ratio) and how many CAKE tokens each player earned in total and per bet. To not affect the test set, consider only data in the training set timeframe.

In [None]:
import check_players_results
import utils

player_bet_df, bet_amount_df = utils.load_players_data(time_from_training, time_to_training)

players_metrics_df = check_players_results.get_players_metrics(player_bet_df, bet_amount_df)
players_metrics_df.sort_values(by='total_profit', ascending=False)

In [None]:
players_metrics_df[['total_profit', 'profit_per_bet']].hist(bins=20)

# Training the models

Now we can use all the collected data for simulation purposes.


## Simulating the environment
Since placing a bet affects the game environment (when you add money to the pool, the payout multipliers change) we have to use a simulator

### Copytrading each player
Copytrading one player at a time, works similar to just calculating his overall profit - but if we place a bet we make changes to the environment and change the pool. So if we place the same bet as player X, both of us will make less money (assuming we win) than if he played alone.

In [None]:
import simulator
player_bet_df, bet_amount_df = utils.load_players_data(time_from_training, time_to_training)

# Get all wallets from player_bet_df
wallets = player_bet_df.columns.to_list()
not_players_list = ['epoch', 'start_timestamp', 'lock_timestamp', 'close_timestamp', 'lock_price', 'close_price',
                    'total_amount', 'bull_amount', 'bear_amount', 'position']
wallets = [wallet for wallet in wallets if wallet not in not_players_list]

data = []
for wallet in wallets:
    trades_data = simulator.copy_trade_player(player_bet_df, bet_amount_df, wallet)
    data.append({'wallet': wallet, 'profit': trades_data['profit'].sum()})

df = pd.DataFrame(data)

df.sort_values(by='profit', ascending=False)

In [None]:
# Histogram with the profit results, we can see the curve is Gaussian-like
df['profit'].hist(bins=50)