Skip to content
mckim2020 edited this page Aug 8, 2020 · 7 revisions

Welcome to the stockER wiki!

The model I used

*Added one more convolution layer and fully connected layer, each.


Various Sets

  1. Training Set: Used to train the model

  2. Test Set: Used for test

  3. Validation Set: Used before test(validation check)


Training Strategies

  1. k-fold Cross-Validation

  2. LOOCV(Leave-One-Out Cross Validation):

  • Advantage: Less bias
  • Disadvantage: Too many calculations (NFL)
  1. Bootstrapping: sampling randomly from set D to D', m times. (n(D) = n(D') = m)
  • So-called, out of bag prediction(~36.8% may not be sampled --> used for the testing phase, not training)
  • Used if data set is hard to classify into groups & fewer data sets
  1. Recall and Precision
  • Precision: True Positive / (True Positive + False Positive): 양성이라 판정했을 때의 진짜 양성일 능력
  • Recall: True Positive / (True Positive + False Negative): 양성을 양성이라 할 능력

PR_Graph PR Graph's Area under P and R may be the discrimination factor but uses an F1 rate instead.

  1. Bias-Variance-Decomposition BVD_explanation
  • Error = bias**2 + var + eps(noise)
  1. t검정 내용 추가 요망...

Linear Models

**Projection onto the hyperplane <-> least square method <-> Euclidian distance minimization **

  1. Multivariable Linear Regression: Least Square Method <-> Projection onto a hyperplane in Euclidian space f(x) = Wx + B ~ y

  2. Log Linear Regression: f(x) = Wx + B ~ ln(y)

3.Logisitic Regression: Uses sigmoid function as a surrogate function of unit step function f(x) = Wx + B ~ ln(y/(1-y))

  • W and B determined easily by Maximum Likelihood Theorem

LDA - Linear Discriminant Analysis: Minimizing the covariance of smaple sets

  1. LMM(Lagrange Multiplier Method)
  2. SVD(Singular Value Decomposition)
  3. GRQ(Generalized Rayleigh Quotient)

Class Imbalance

Too much positive or negative value compared to a negative and positive value, respectively
Thus,

  1. change system's threshold --> Rescaling method

  2. Undersampling: getting rid of too much data

  3. Oversampling: adding data

#MULTILABEL LEARNING???


dddd