This project aims to determine the temperature at the surface of a new unknown star at any time, from a few observations. It is known that the temperature follows a function called Weierstrass, defined as follows:
Where is the star temperature at a given time i, with the following set of parameters:
To reach this objective, we need to find the appropriate (a,b,c) coefficients.
We have at our disposal a set of temperature points measured at several instants i.
The search space has the size ]0, 1[x20x20.
However, in Python the maximum precision is 10^15 - 1. So there are (10^15 - 1) × 20 × 20 = 4 × 10^16 - 1 possibilities
I have chosen the following function for fitness:
where t is the Weierstrass function with the parameters a,b,c and n the number of observations.
This function corresponds to the norm 2 of the vector of the difference between the observations and the values found.
The choice of the norm 2 (compared to the norm 1) has the advantage to penalize the differences and thus to obtain a more optimized fitness.
In order to find the best parameters of the function, I decided to use different operators for the integers b and c and the real number a. I will therefore present the operators in a separate way
The integers are converted into binary number and then put side by side to obtain a chromosome.
Method used: Two point crossover.
This method consists in choosing two points randomly in the chromosomes of each parent and inverting the gene sequences between these two points.
Method used: Simulated binary crossover.
This method imitates the properties of the Single point crossover. One of its properties is that the average of the values of the parents is equal to the average of the values of the children.
The two children are created from the two parents using the following formula:
Where β, the spreading factor, is defined as follows:
With the following parameters:
- η: between 2 and 5: (2: more variation between parents, 5: less variations)
- u : uniformly distributed on the interval [0, 1]
The method consists in multiplying the parent by a factor between 0.95 and 1.05.
For the selection, I have chosen the best and worst individuals.
The proportion is as follows:
- Best individuals: 5/15
- Worst individuals: 2/15
- New individuals : 8/15
I have used this simple method, because it has the advantage of keeping the best individuals at each generation and to be less computationally intensive while bringing, for my case, results equivalent to other algorithms (roulette wheel, stochastic universal sampling, tournament, ranked-based,...).
I found the following values:
- a = 0.3415035442650213
- b = 15
- c = 3
Parameters used and basic statistics:
- Population size: 100 individuals
- Average number of iterations to converge: 57 iterations
- Running time over 30 iterations:
- Threshold for fitness: 0.19
- Average: 6.5 seconds
- Standard deviation: 4.5 seconds
- Threshold for fitness: 0.1813
- Mean: 21 seconds
- Standard deviation: 27 seconds
- Threshold for fitness: 0.19
Note: The execution time is quite important because of the high accuracy required. By decreasing the requirements (for example having a precision of 10^-2), the execution time is much lower.
The requirements.txt file list all Python libraries that the project depends on.
To install, use the following command:
pip install -r requirements.txt
There are two ways to use this project:
Simply use the link located at the top of this page
- Clone this repository
- Install the associate libraries (see requirements)
- Two options:
- Open the notebook
- Launch the python program on your command prompt
python path/to/cloned/repository/Genetic_algorithm.py
I have tested different configurations for this problem, each one having its advantages and drawbacks.
Description:
I have tested a method consisting in converting a,b and c in binary and put them side by side to form a chromosome.
Note: For a, we choose a precision (10^-2 for example).
Advantages:
- Very fast
- Very precise if you know the precision in advance
Drawbacks:
- Not very stable
- Low accuracy if you don't choose the right precision
Increasing the size of the population (while proportionally decreasing the number of iterations) does not result in better performance.
Different proportions have been tested, for example, elitism (we keep only the best individuals). This proves to be not very efficient (divergence).
I have tested the single point crossover. However, this method was less efficient than the two point crossover.
I have tested the Michalewicz non-uniform mutation. However, the ratio of solution accuracy and performance was too low to justify its use.
Flavien DESEURE--CHARRON - flavien.deseure@gmail.com - LinkedIn
[1] E. Wirsansky, "Hands-On Genetic Algorithms with Python:applying genetic algorithms to solve real-world deep learning and artificial intelligence problems", Packt, 2020
[2] P Siarry, "Métaheuristiques", Eyrolls, 2014