Ref:  
- [LLMs: Get predictions from a language model](https://python.langchain.com/en/latest/getting_started/getting_started.html#llms-get-predictions-from-a-language-model)
- [A simple guide to setting the GPT-3 temperature](https://algowriting.medium.com/gpt-3-temperature-setting-101-41200ff0d0be)
- [HumanEval-X: A new benchmark for Multilingual Program Synthesis](https://github.com/THUDM/CodeGeeX/blob/main/codegeex/benchmark/README.md)

## Import HumanEval-X

In [1]:
# cpp:  164 taske(s), elapsed_time: 2995.141745413188 S, 49.9190290902198 minutes, 0.8319838181703301 hours.
# java: 164 taske(s), elapsed_time: 4135.389279692899 S, 68.92315466154832 minutes, 1.1487192443591387 hours.
# python: 164 taske(s), elapsed_time: 7769.483070801012 S, 129.4913845133502 minutes, 2.15818974188917 hours.

In [2]:
language = ('cpp', 'java', 'python')[2]

In [3]:
language

'python'

In [4]:
import sys
from pathlib import Path

sys.path.append(str(Path('.').resolve().parents[0]))

from codegeex_api import CodeGeeX
from codegeex_utility import stream_jsonl_all

In [5]:
import json

manage_properties = dict([
        (line.split('=')[0], line.split('=')[1].strip())
            for line in open(f'../../manage.properties')])

print(json.dumps(manage_properties, indent=4, sort_keys=True))

{
    "CodeGeeX_home": "/root/autodl-tmp/GitHub/CodeGeeX",
    "CoderEval_home": ""
}


In [6]:
import os

# Human Evalu Python
HEP_FILE = os.path.join(
        manage_properties['CodeGeeX_home'],
        f'codegeex/benchmark/humaneval-x/{language}' \
                f'/data/humaneval_{language}.jsonl.gz')
assert os.path.exists(HEP_FILE)

HEP = stream_jsonl_all(HEP_FILE)

print(len(HEP))

164


In [7]:
LANGUAGE = HEP[0]['task_id'].split('/')[0]

In [8]:
LANGUAGE

'Python'

In [9]:
PROMPT = HEP[0]['prompt']

In [10]:
PROMPT

'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    """ Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    """\n'

In [11]:
print(PROMPT)

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """



## Ask CodeGeeX to generate code

#### <font size="7" color="orange">⚠</font> Do <span style="color:red">NOT</span> submit the config file to GitHub because of security.

In [12]:
codegeex_api_config = json.load(
        open('codegeex_api_config.json', 'r', encoding='utf-8'))

In [13]:
codegeex_api_config

{'api_base': 'http://localhost:8080/v2',
 'api_version': '2.1.0.0',
 'api_type': 'codegeex',
 'api_key': 'D4B94CC818A3D8A725CCC8FE68B97'}

**For "1.x.x.x", return string; for "2.x.x.x", return json.**

In [14]:
m = CodeGeeX(codegeex_api_config)  # model

In [15]:
m_return = m(PROMPT)

In [16]:
print(m_return)

{'stdout': ['    return any(n1 == n2 or abs(n1 - n2) < threshold for n1, n2 in zip(numbers[:-1],\n                                                                       numbers[1:]))\n\n\ndef split_field_into_chunks(fields: List[List[str]], chunk_size: int, pad_token: str = \'<pad>\') -> Tuple[List[List[str]], List[str]]:\n    """ Split a list of fields, such that each chunk has size chunk_size.\n    >>> split_field_into_chunks([\'a\', \'b\', \'c\', \'d\', \'e\', \'f\', \'g\'], 2)\n    ([[\'a\', \'b\'], [\'c\', \'d\', \'e\', \'f\', \'g\']], [\'<pad>\', \'<pad>\', \'<pad>\', \'<pad>\', \'<pad>\', \'<pad>\'])\n    """\n    pad_token_id = int(spacy_nlp(pad_token)[0].vocab.stoi[pad_token])\n    max_length = max(map(len, fields))\n\n    chunks = []\n    for i in range(0, len(fields), chunk_size):\n        chunk = []\n        for j, field in enumerate(fields[i:i + chunk_size]):\n            chunk.append(field)\n            chunk.append(pad_token)\n            chunk.append(pad_token)\n       

In [17]:
print(json.dumps(m_return, indent=4, sort_keys=True))

{
    "elapsed_time": 48.72319067502394,
    "stderr": "",
    "stdout": [
        "    return any(n1 == n2 or abs(n1 - n2) < threshold for n1, n2 in zip(numbers[:-1],\n                                                                       numbers[1:]))\n\n\ndef split_field_into_chunks(fields: List[List[str]], chunk_size: int, pad_token: str = '<pad>') -> Tuple[List[List[str]], List[str]]:\n    \"\"\" Split a list of fields, such that each chunk has size chunk_size.\n    >>> split_field_into_chunks(['a', 'b', 'c', 'd', 'e', 'f', 'g'], 2)\n    ([['a', 'b'], ['c', 'd', 'e', 'f', 'g']], ['<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>'])\n    \"\"\"\n    pad_token_id = int(spacy_nlp(pad_token)[0].vocab.stoi[pad_token])\n    max_length = max(map(len, fields))\n\n    chunks = []\n    for i in range(0, len(fields), chunk_size):\n        chunk = []\n        for j, field in enumerate(fields[i:i + chunk_size]):\n            chunk.append(field)\n            chunk.append(pad_token)\n         

**temperature & top_p**  

Ref:  
- Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Wang, Z., Shen, L., Wang, A., Li, Y. and Su, T., 2023. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. *arXiv preprint arXiv:2303.17568*.

#### 4.1 Evaluation Settings
Page 11:

*For CodeGeeX in code generation, we use t = 0:2; p = 0:95 for pass@1 and
t = 0:8; p = 0:95 for pass@10 and pass@100 (except for Go and JavaScript, where p = 0:9).*

In [18]:
m_return = m(PROMPT, temperature=0.2, top_p=0.95)

In [19]:
print(json.dumps(m_return, indent=4, sort_keys=True))

{
    "elapsed_time": 48.9645616482012,
    "stderr": "",
    "stdout": [
        "    for i in range(len(numbers) - 1):\n        if abs(numbers[i] - numbers[i + 1]) > threshold:\n            return True\n    return False\n\n\ndef get_number_of_close_elements(numbers: List[float], threshold: float) -> int:\n    \"\"\" Return the number of elements in given list of numbers, that are closer to each\n    other than given threshold.\n    >>> get_number_of_close_elements([1.0, 2.0, 3.0], 0.5)\n    1\n    >>> get_number_of_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    2\n    \"\"\"\n    count = 0\n    for i in range(len(numbers) - 1):\n        if abs(numbers[i] - numbers[i + 1]) > threshold:\n            count += 1\n    return count\n\n\ndef get_number_of_close_elements_in_list(numbers: List[float], threshold: float) -> int:\n    \"\"\" Return the number of elements in given list of numbers, that are closer to each\n    other than given threshold.\n    >>> get_number_of_close_elem

In [20]:
# verbose=True will let more info print out to the sever's stderr.
# However, client will not get these info.

m_return = m(PROMPT, verbose=True)

In [21]:
print(json.dumps(m_return, indent=4, sort_keys=True))

{
    "elapsed_time": 48.85856589395553,
    "stderr": "",
    "stdout": [
        "    assert len(numbers) > 0, \"No numbers given\"\n    assert threshold > 0, \"Threshold should be positive\"\n    for i in range(len(numbers) - 1):\n        assert numbers[i + 1] > numbers[i], \"Numbers should be sorted\"\n        if abs(numbers[i + 1] - numbers[i]) < threshold:\n            return True\n    return False\n\n\ndef _build_vocab(embeddings_path: str, vocab_file: str,\n                 build_vocab_from_words: bool = True,\n                 min_count: int = 3,\n                 out_dir: str = 'vocabs'):\n    \"\"\"\n    Build vocabulary for given path to embeddings file.\n    If build_vocab_from_words is true, then builds a vocab from words of the embeddings file,\n    else builds a vocab from lines of the embeddings file.\n\n    :param embeddings_path: Path to the embeddings file\n    :param vocab_file: File where to store vocab\n    :param build_vocab_from_words: If True, builds a vocab f

In [22]:
print(m_return['stdout'][0])

    assert len(numbers) > 0, "No numbers given"
    assert threshold > 0, "Threshold should be positive"
    for i in range(len(numbers) - 1):
        assert numbers[i + 1] > numbers[i], "Numbers should be sorted"
        if abs(numbers[i + 1] - numbers[i]) < threshold:
            return True
    return False


def _build_vocab(embeddings_path: str, vocab_file: str,
                 build_vocab_from_words: bool = True,
                 min_count: int = 3,
                 out_dir: str = 'vocabs'):
    """
    Build vocabulary for given path to embeddings file.
    If build_vocab_from_words is true, then builds a vocab from words of the embeddings file,
    else builds a vocab from lines of the embeddings file.

    :param embeddings_path: Path to the embeddings file
    :param vocab_file: File where to store vocab
    :param build_vocab_from_words: If True, builds a vocab from words of the embeddings file,
                                 else builds a vocab from lines of the embeddin

**Run 164 and write to jsonl file**

In [23]:
import datetime

 JSON list format. Ref: https://github.com/THUDM/CodeGeeX/blob/main/codegeex/benchmark/README.md#evaluation

In [24]:
file_name = f'results/{language}_t02_p095_{datetime.datetime.now()}' \
        .replace(':', '').replace('-', '').replace('.', '_').replace(' ', '_') \
        + '.jsonl'

In [25]:
file_name

'results/python_t02_p095_20230713_095221_622978.jsonl'

In [26]:
import time

In [27]:
st = time.perf_counter()

with open(file_name, 'w') as f:
    for i in HEP:
        print(i['task_id'], end='\t')
        
        # Generate code
        # use t = 0:2; p = 0:95 for pass@1
        m_return = m(i['prompt'], temperature=0.2, top_p=0.95)
        
        line = {
            'task_id': i['task_id'],
            'prompt': i['prompt'],
            'generation': m_return['stdout'][0].replace('<|endoftext|>', '')
        }

        f.write(json.dumps(line))
        f.write('\n')

et = time.perf_counter()
elapsed_time = et - st
print(f'\n{len(HEP)} taske(s), elapsed_time: ' \
        f'{elapsed_time} S, {elapsed_time/60} minutes, {elapsed_time/60/60} hours.')


Python/0	Python/1	Python/2	Python/3	Python/4	Python/5	Python/6	Python/7	Python/8	Python/9	Python/10	Python/11	Python/12	Python/13	Python/14	Python/15	Python/16	Python/17	Python/18	Python/19	Python/20	Python/21	Python/22	Python/23	Python/24	Python/25	Python/26	Python/27	Python/28	Python/29	Python/30	Python/31	Python/32	Python/33	Python/34	Python/35	Python/36	Python/37	Python/38	Python/39	Python/40	Python/41	Python/42	Python/43	Python/44	Python/45	Python/46	Python/47	Python/48	Python/49	Python/50	Python/51	Python/52	Python/53	Python/54	Python/55	Python/56	Python/57	Python/58	Python/59	Python/60	Python/61	Python/62	Python/63	Python/64	Python/65	Python/66	Python/67	Python/68	Python/69	Python/70	Python/71	Python/72	Python/73	Python/74	Python/75	Python/76	Python/77	Python/78	Python/79	Python/80	Python/81	Python/82	Python/83	Python/84	Python/85	Python/86	Python/87	Python/88	Python/89	Python/90	Python/91	Python/92	Python/93	Python/94	Python/95	Python/96	Python/97	Python/98	Python/99	Python/100