### Exchange Market / Incresing Volume

A `database` is excellent for retrieving specific historical data.   
A decision tree brings the ability to make predictions and uncover `patterns`.  
Those patterns are `not` explicitly stated in the data from database.  

We can predict, for example, whether a trader is likely to `increase` their trading volume in the next month.  
This is a `month` for which we don't have data in training dataset.  

When you ask a Decision Tree about Trader 101's `likelihood` of increasing trading volume in June,  
it's not just looking at what happened in June; it's analyzing patterns from `all` months to make a prediction.

We take June's data as a `base` and modify it for July (Month 13).  
We can `adjust` certain features to reflect new conditions for the prediction.  

If you use the `exact` data from a past month (like June) without any changes the prediction will   
mirror the `known` outcome of that past month.  

In `practice`, you would modify certain input features to reflect new conditions or scenarios.  
This way, you can `simulate` different potential situations and get the model's estimation for these scenarios.   
These adjustments are what make predictive modeling `valuable` - they allow you to explore "what-if" scenarios.  

The model has a `binary` outcome (Yes/No).  
The dataset is split into training and `testing` sets for model evaluation.  

In [14]:
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

data = {
    "Trader ID": [101, 101, 101, 101, 101, 101, 101, 101, 101, 101],
    "Month": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Average Trading Volume": [5000, 5200, 4800, 5000, 5500, 5300, 5100, 5200, 5400, 5600],
    "Change in Trading Volume (%)": [0, 4, -8, 4, 10, -4, -4, 2, 4, 4],
    "Preferred Currency Pair": ["EURUSD"] * 10,
    "Trading Frequency": ["Daily"] * 10,
    "Market Condition": [
        "Volatile", "Stable", "Volatile", "Stable", "Volatile", 
        "Stable", "Volatile", "Stable", "Volatile", "Stable",
    ],
    "Likelihood of Increasing Trading Volume": [
        "No", "Yes", "No", "Yes", "Yes", 
        "No", "No", "Yes", "Yes", "Yes",
    ]
}

df = pd.DataFrame(data)
print("Training dataset")
display(df)

# ----------------------------------------------------------------------------

# Encode labels for categorical data
df_encoded = pd.DataFrame()
for col in df.columns:

    # Categorical data is encoded to be understandable by the machine learning model
    if df[col].dtype == 'object':
        df_encoded[col] = LabelEncoder().fit_transform(df[col])
    else:
        # Numerical columns are left as they are 
        df_encoded[col] = df[col] 

print("Encoded:")
display(df_encoded)

# ----------------------------------------------------------------------------

# Define the feature columns and target column
features = [
    "Month",
    "Average Trading Volume", 
    "Change in Trading Volume (%)",
    "Preferred Currency Pair", 
    "Trading Frequency", 
    "Market Condition",
]
target = "Likelihood of Increasing Trading Volume"

# Train data
X = df_encoded[features]
y = df_encoded[target]

# Fitting the model
dtree_model = DecisionTreeClassifier()
dtree_model.fit(X, y)

# ----------------------------------------------------------------------------

# Predict for July (Month 13)

# We create a hypothetical data point for July, basing it on June's data
x_unknown = df[df['Month'] == 10].iloc[-1].copy()
x_unknown['Month'] = 11
x_unknown['Average Trading Volume'] = int(x_unknown['Average Trading Volume'] * 1.05)


# Encoding the new data point
for col in ['Preferred Currency Pair', 'Trading Frequency', 'Market Condition']:
    le = LabelEncoder()
    le.fit(df[col])
    x_unknown[col] = le.transform([x_unknown[col]])[0]

x_unknown_encoded = pd.DataFrame([x_unknown[features]])
print("\nData for July (Month 11):")
display(x_unknown)

# Prediction
y_pred_unknown = dtree_model.predict(x_unknown_encoded)[0]
print("Likelihood of Increasing Trading Volume:", "Yes" if y_pred_unknown == 1 else "No")

# ----------------------------------------------------------------------------

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    df_encoded[features], df_encoded[target], test_size=0.2, random_state=0
)

# Fitting the model
dtree_model = DecisionTreeClassifier()
dtree_model.fit(X_train, y_train)

# Making predictions and evaluating the model
predictions = dtree_model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print("Model Accuracy:", accuracy)

Training dataset


Unnamed: 0,Trader ID,Month,Average Trading Volume,Change in Trading Volume (%),Preferred Currency Pair,Trading Frequency,Market Condition,Likelihood of Increasing Trading Volume
0,101,1,5000,0,EURUSD,Daily,Volatile,No
1,101,2,5200,4,EURUSD,Daily,Stable,Yes
2,101,3,4800,-8,EURUSD,Daily,Volatile,No
3,101,4,5000,4,EURUSD,Daily,Stable,Yes
4,101,5,5500,10,EURUSD,Daily,Volatile,Yes
5,101,6,5300,-4,EURUSD,Daily,Stable,No
6,101,7,5100,-4,EURUSD,Daily,Volatile,No
7,101,8,5200,2,EURUSD,Daily,Stable,Yes
8,101,9,5400,4,EURUSD,Daily,Volatile,Yes
9,101,10,5600,4,EURUSD,Daily,Stable,Yes


Encoded:


Unnamed: 0,Trader ID,Month,Average Trading Volume,Change in Trading Volume (%),Preferred Currency Pair,Trading Frequency,Market Condition,Likelihood of Increasing Trading Volume
0,101,1,5000,0,0,0,1,0
1,101,2,5200,4,0,0,0,1
2,101,3,4800,-8,0,0,1,0
3,101,4,5000,4,0,0,0,1
4,101,5,5500,10,0,0,1,1
5,101,6,5300,-4,0,0,0,0
6,101,7,5100,-4,0,0,1,0
7,101,8,5200,2,0,0,0,1
8,101,9,5400,4,0,0,1,1
9,101,10,5600,4,0,0,0,1



Data for July (Month 11):


Trader ID                                   101
Month                                        11
Average Trading Volume                     5880
Change in Trading Volume (%)                  4
Preferred Currency Pair                       0
Trading Frequency                             0
Market Condition                              0
Likelihood of Increasing Trading Volume     Yes
Name: 9, dtype: object

Likelihood of Increasing Trading Volume: Yes
Model Accuracy: 1.0
