Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous Plotting Errors with the "GA" Library in R (Genetic Algorithm) #54

Closed
swaheera opened this issue Jul 8, 2021 · 1 comment

Comments

@swaheera
Copy link

swaheera commented Jul 8, 2021

I am working with R. I am following this tutorial (https://cran.r-project.org/web/packages/GA/vignettes/GA.html) and am learning how to optimize functions using the "genetic algorithm".

The entire process is illustrated in the code below:

Part 1: Generate some sample data ("train_data")

Part 2: Define the "fitness function" : the objective of my problem is to generate 7 random numbers :

  • "random_1" (between 80 and 120)
  • "random_2" (between "random_1" and 120)
  • "random_3" (between 85 and 120)
  • "random_4" (between random_2 and 120)
  • "split_1" (between 0 and 1)
  • "split_2" (between 0 and 1)
  • "split_3" (between 0 and 1 )

and use these numbers to perform a series of data manipulation procedures on the train data. At the end of these data manipulation procedures, a "total" mean variable is calculated.

Part 3: The purpose of the "genetic algorithm" is to find the set of these 7 numbers that produce the largest value of the "total".

Below, I illustrate this entire process :

Part 1

#load libraries
library(dplyr)
library(GA)

# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)

Part 2

#define fitness function
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {

    #bin data according to random criteria
    train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
    
    train_data$cat = as.factor(train_data$cat)
    
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
    
    
    
    #calculate  quantile ("quant") for each bin
    
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_1)))
    
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_2)))
    
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_3)))
    
    
    
    
    #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
    table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
    table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
    table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
    
    #group all tables
    
    final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
    mean = mean(final_table$diff)
    
    
}

Part 3


#run the genetic algorithm (20 times to keep it short):
GA <- ga(type = "real-valued", 
         fitness = function(x)  fitness(x[1], x[2], x[3], x[4], x[5], x[6], x[7]),
         lower = c(80, 80, 80, 80, 0,0,0), upper = c(120, 120, 120, 120, 1,1,1), 
         popSize = 50, maxiter = 20, run = 20)

The above code (Part 1, Part 2, Part 3) all work fine.

Problem: Now, I am trying to produce some the of the visual plots from the tutorial:

First Plot - This Works:

plot(GA)

But I can't seem to produce the other plots from the tutorial:

Second Plot: Does Not Work

lbound <- 80
ubound <- 120

curve(fitness, from = lbound, to = ubound, n = 1000)
points(GA@solution, GA@fitnessValue, col = 2, pch = 19)

 Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred. 

Error in xy.coords(x, y) : 'x' and 'y' lengths differ

Third Plot : Does Not Work

random_1 <- random_2 <- seq(80, 120, by = 0.1)
f <- outer(x1, x2, fitness)
persp3D(x1, x2, fitness, theta = 50, phi = 20, col.palette = bl2gr.colors)

Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
 Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred. 

Error in z[-1, -1] : object of type 'closure' is not subsettable

Fourth Plot: Does Not Work

filled.contour(random_1, random_2, fitness, color.palette = bl2gr.colors)

Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument

Can someone please show me how to fix these errors?

Thanks

@luca-scr
Copy link
Owner

luca-scr commented Jul 8, 2021

The issue is that you try to adapt R code for a one-dimensional and two-dimensional problems to your case, which is a 7-dimensional problem. Simply you can't do that and when you try R will give errors.
For instance, when you use

curve(fitness, from = lbound, to = ubound, n = 1000)

you are trying to plot the one-dimensional function fitness from lbound to ubound on a regular grid of n points. But fitness has 7 input arguments. How you can do that?

@luca-scr luca-scr closed this as completed Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants