In this repo, i will try to make the open-source implementation of funsearch. There are several probles which need to be addressed. These include:
-
Sampling need to be done in order of 10^6. This means that 0.00020 for 1k token cost of gpt3.5 would translate to about 200$ per answer. I don't have the budget of Google, so let us try to do it locally with massive speed.
-
This code is singly threaded. Most of the code has to be modified to be multithreaded. The code doesn't rely much on the GPU so this can be done with a server with a lot of CPUs.
-
The prompt here is basically two priority combined to be one. IMO this is not efficent.
LLM which can be called millions of time effectively very quickly. Hosted locally. The choices of models are:
- DeepSeek-coder-instruct (1.3B/6.7B). Specifically the deepseek-coder-6.7b-base.Q4_K_M.gguf
- Salesforce CodeGen2.5-7B-instruct
- Deploying a code generation model with massive inference speed. Copy techniques either form llama.cpp
This repository accompanies the publication
Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature (2023)
There are 6 independent directories:
-
cap_set
contains functions discovered by FunSearch that construct large cap sets, and we also provide those cap sets in a numerical format for convenience. -
admissible_set
contains functions discovered by FunSearch that construct large admissible sets, and we also provide those admissible sets in a numerical format for convenience. -
bin_packing
contains heuristics discovered by FunSearch for online 1D bin packing problems, and an evaluation suite to reproduce the results reported in the paper. -
cyclic_graphs
contains functions discovered by FunSearch that construct large independent sets in strong products of cyclic graphs, and we also provide those sets in a numerical format for convenience. -
corner_free_sets
contains the discovered sets of indices, in numerical format, satisfying the combinatorial degeneration constraints described for the corners-free problem in the Supplementary Information. -
implementation
contains an implementation of the evolutionary algorithm, code manipulation routines, and a single-threaded implementation of the FunSearch pipeline. It does not contain language models for generating new programs, the sandbox for executing untrusted code, nor the infrastructure for running FunSearch on our distributed system. This directory is intended to be useful for understanding the details of our method, and for adapting it for use with any available language models, sandboxes, and distributed systems.
No installation is required. All notebooks can be opened and run in Google Colab.
-
admissible_set
: The notebookadmissible_set.ipynb
can be opened via . -
bin_packing
: The notebookbin_packing.ipynb
can be opened via . -
cyclic_graphs
: The notebookcyclic_graphs.ipynb
can be opened via .
If you use the code or data in this package, please cite:
@Article{FunSearch2023,
author = {Romera-Paredes, Bernardino and Barekatain, Mohammadamin and Novikov, Alexander and Balog, Matej and Kumar, M. Pawan and Dupont, Emilien and Ruiz, Francisco J. R. and Ellenberg, Jordan and Wang, Pengming and Fawzi, Omar and Kohli, Pushmeet and Fawzi, Alhussein},
journal = {Nature},
title = {Mathematical discoveries from program search with large language models},
year = {2023},
doi = {10.1038/s41586-023-06924-6}
}
Copyright 2023 DeepMind Technologies Limited
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.