<a href="https://colab.research.google.com/github/lcbjrrr/quant/blob/master/J_Class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Topic:** AI/ML

**Title:** Classifiers (Decision Tree)

**Author:** Luiz Barboza

**Date:** 20/dec/22

**Lang:** Julia

**Site:** https://quant-research.group/

**Email:** contato@quant-research.group


# Julia Installation

In [17]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.8.3" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Unrecognized magic `%%shell`.

Julia does not use the IPython `%magic` syntax.   To interact with the IJulia kernel, use `IJulia.somefunction(...)`, for example.  Julia macros, string macros, and functions can be used to accomplish most of the other functionalities of IPython magics.


In [18]:
versioninfo()

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  Threads: 2 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64
  LD_PRELOAD = /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
  JULIA_NUM_THREADS = 2


# Classifiers

In [None]:
import Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Lathe") 
Pkg.add("DecisionTree") 
Pkg.add("StatsBase") 
Pkg.add("EvalMetrics") 

using CSV
using DataFrames
using Lathe
using Lathe.preprocess: TrainTestSplit
using DecisionTree
using StatsBase
using EvalMetrics

In [None]:
;wget https://raw.githubusercontent.com/lcbjrrr/data/main/banking%20-%20train.csv

In [None]:
;wget https://raw.githubusercontent.com/lcbjrrr/data/main/banking%20-%20test.csv

In [66]:
#read csv
train=CSV.read("banking - train.csv", DataFrame)
first(train,5)

Unnamed: 0_level_0,salary,balance,default
Unnamed: 0_level_1,Float64,Float64,Int64
1,1000.0,100.0,0
2,950.0,95.0,0
3,800.0,85.0,0
4,1100.0,5500.0,0
5,999.0,6000.0,0


In [67]:
test=CSV.read("banking - test.csv", DataFrame)
first(test,5)

Unnamed: 0_level_0,salary,balance,default
Unnamed: 0_level_1,Float64,Float64,Int64
1,1050.0,105.0,0
2,997.5,99.75,0
3,1048.95,6300.0,0
4,932.4,6825.0,0
5,2415.0,4200.0,0


## Decision Tree

In [68]:
tree = DecisionTreeClassifier(max_depth=3)
DecisionTree.fit!(tree, Matrix(train[:,[:salary,:balance]]), train[:,:default])     

DecisionTreeClassifier
max_depth:                3
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  [0, 1]
root:                     Decision Tree
Leaves: 3
Depth:  2

In [69]:
function accuracy(y,pred)
  acc = sum(pred.==y)/length(y)
  return acc
end

accuracy (generic function with 1 method)

In [45]:
 pred_train = DecisionTree.predict(tree,Matrix(train[:,[:salary,:balance]]))
acc_train = accuracy(train.default,pred_train)
print("Accuracy (Train): ",acc_train)

Accuracy (Train): 1.0

In [47]:
pred_test=DecisionTree.predict(tree,Matrix(test[:,[:salary,:balance]]))
acc_test = accuracy(test.default,pred_test)
print("Accuracy (Test): ",acc_test)

Accuracy (Test): 1.0

# Classification Metrics

In [49]:
#Confusion Matrix
counts(test.default,pred_test)

2×2 Matrix{Int64}:
 8  0
 0  2

In [51]:
# Precision/Recall
println("Precision: ",precision(test.default,pred_test))
println("Recall: " , recall(test.default,pred_test))

Precision: 1.0
Recall: 1.0


# Second Example: Gender

In [None]:
import Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Lathe") 
Pkg.add("DecisionTree") 
Pkg.add("StatsBase") 
Pkg.add("EvalMetrics") 

using CSV
using DataFrames
using Lathe
using Lathe.preprocess: TrainTestSplit
using DecisionTree
using StatsBase
using EvalMetrics

In [None]:
;wget https://raw.githubusercontent.com/lcbjrrr/data/main/gender%20-%20all.csv

In [53]:
#read csv
df=CSV.read("gender - all.csv", DataFrame)
first(df,5)

Unnamed: 0_level_0,G,H,W
Unnamed: 0_level_1,Int64,Float64,Float64
1,0,187.571,109.952
2,0,174.706,73.7775
3,0,188.24,96.7004
4,0,182.197,100.019
5,0,177.5,93.7954


In [None]:
train, test = TrainTestSplit(df,.80)

In [55]:
tree = DecisionTreeClassifier(max_depth=3)
DecisionTree.fit!(tree, Matrix(train[:,[:H,:W]]), train[:,:G])   

DecisionTreeClassifier
max_depth:                3
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  [0, 1]
root:                     Decision Tree
Leaves: 8
Depth:  3

In [57]:
function accuracy(y,pred)
  acc = sum(pred.==y)/length(y)
  return acc
end
pred_train = DecisionTree.predict(tree,Matrix(train[:,[:H,:W]]))
acc_train = accuracy(train.G,pred_train)
print("Accuracy (Train): ",acc_train)

Accuracy (Train): 0.9101137642205276

In [58]:
pred_test=DecisionTree.predict(tree,Matrix(test[:,[:H,:W]]))
acc_test = accuracy(test.G,pred_test)
print("Accuracy (Test): ",acc_test)

Accuracy (Test): 0.9010494752623688

In [60]:
counts(test.G,pred_test)

2×2 Matrix{Int64}:
 903  113
  85  900

In [61]:
println("Precision: ",precision(test.G,pred_test))
println("Recall: " , recall(test.G,pred_test))

Precision: 0.8884501480750246
Recall: 0.9137055837563451


# Third Example: Play

In [None]:
import Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Lathe") 
Pkg.add("DecisionTree") 
Pkg.add("StatsBase") 
Pkg.add("EvalMetrics") 

using CSV
using DataFrames
using Lathe
using Lathe.preprocess: TrainTestSplit
using DecisionTree
using StatsBase
using EvalMetrics

In [None]:
;wget https://raw.githubusercontent.com/lcbjrrr/data/main/play%20-%20tr.csv

In [None]:
;wget https://raw.githubusercontent.com/lcbjrrr/data/main/play%20-%20ts.csv

In [70]:
#read csv
train=CSV.read("play - tr.csv", DataFrame)
first(train,5)

Unnamed: 0_level_0,sunny,hot,humid,windy,play
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64
1,1,1,1,0,0
2,1,1,1,1,0
3,0,1,1,0,1
4,0,0,1,0,1
5,0,0,0,0,1


In [71]:
#read csv
test=CSV.read("play - tr.csv", DataFrame)
first(test,5)

Unnamed: 0_level_0,sunny,hot,humid,windy,play
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64
1,1,1,1,0,0
2,1,1,1,1,0
3,0,1,1,0,1
4,0,0,1,0,1
5,0,0,0,0,1


In [73]:
tree = DecisionTreeClassifier(max_depth=3)
DecisionTree.fit!(tree, Matrix(train[:,[:sunny,	:hot,	:humid,	:windy]]), train[:,:play])   

DecisionTreeClassifier
max_depth:                3
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  [0, 1]
root:                     Decision Tree
Leaves: 6
Depth:  3

In [74]:
function accuracy(y,pred)
  acc = sum(pred.==y)/length(y)
  return acc
end
pred_train = DecisionTree.predict(tree,Matrix(train[:,[:sunny,	:hot,	:humid,	:windy]]))
acc_train = accuracy(train.play,pred_train)
print("Accuracy (Train): ",acc_train)

Accuracy (Train): 0.8571428571428571

In [76]:
pred_test=DecisionTree.predict(tree,Matrix(test[:,[:sunny,	:hot,	:humid,	:windy]]))
acc_test = accuracy(test.play,pred_test)
print("Accuracy (Test): ",acc_test)

Accuracy (Test): 0.8571428571428571

In [77]:
counts(test.play,pred_test)

2×2 Matrix{Int64}:
 5  0
 2  7

In [78]:
println("Precision: ",precision(test.play,pred_test))
println("Recall: " , recall(test.play,pred_test))

Precision: 1.0
Recall: 0.7777777777777778
