Ref:  
- [LLMs: Get predictions from a language model](https://python.langchain.com/en/latest/getting_started/getting_started.html#llms-get-predictions-from-a-language-model)
- [A simple guide to setting the GPT-3 temperature](https://algowriting.medium.com/gpt-3-temperature-setting-101-41200ff0d0be)
- [HumanEval-X: A new benchmark for Multilingual Program Synthesis](https://github.com/THUDM/CodeGeeX/blob/main/codegeex/benchmark/README.md)

## Import HumanEval-X (Python)

In [1]:
import json

In [2]:
HEP_FILE = 'datasets/HumanEval-X/humaneval_python.jsonl'  # Human Evalu Python

In [3]:
HEP = json.loads('[{}]'.format(','.join(list(line.rstrip() for line in open(HEP_FILE, 'r', encoding='utf-8')))))

In [4]:
LANGUAGE = HEP[0]['task_id'].split('/')[0]

In [5]:
LANGUAGE

'Python'

In [6]:
PROMPT = HEP[0]['prompt']

In [7]:
PROMPT

'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    """ Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    """\n'

In [8]:
print(PROMPT)

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """



## Ask CodeGeeX to generate code

In [9]:
import sys
from pathlib import Path

sys.path.append(str(Path('.').resolve().parents[0]))

from codegeex_api import CodeGeeX

#### <font size="7" color="orange">⚠</font> Do <span style="color:red">NOT</span> submit the config file to GitHub because of security.

In [10]:
codegeex_api_config = json.load(open('codegeex_api_config.json', 'r', encoding='utf-8'))

In [11]:
codegeex_api_config

{'api_base': 'http://localhost:8080/v1',
 'api_version': '2.1.0.0',
 'api_type': 'codegeex',
 'api_key': 'D4B94CC818A3D8A725CCC8FE68B97'}

**For "1.x.x.x", return string; for "2.x.x.x", return json.**

In [12]:
m = CodeGeeX(codegeex_api_config)  # model

In [13]:
m_return = m(PROMPT)

In [14]:
print(m_return)

{'stdout': ['    return len(\n        [\n            1\n            for num in numbers\n            if abs(num - numbers[0]) < abs(num - numbers[1]) and abs(num - numbers[0])\n            < threshold\n        ]\n    ) == 1\n\n\ndef get_language_code(language_name: str) -> str:\n    """ Returns the language code of a given language name. """\n    language_name_parts = language_name.split("_")\n    if len(language_name_parts) > 1:\n        return language_name_parts[0]\n    return language_name\n\n\ndef extract_track_name(track: str, metadata: Metadata) -> str:\n    """\n    Extracts track name from metadata and returns it.\n    """\n    track_name_key = "track_name"\n    if metadata.has_key(track_name_key):\n        track_name = metadata[track_name_key]\n    else:\n        # Get track name from track id\n        track_name = track.split("/")[-1].replace("+", " ")\n    return track_name\n\n\ndef extract_album_name(track: str, metadata: Metadata) -> str:\n    """\n    Extracts album name 

In [15]:
print(json.dumps(m_return, indent=4, sort_keys=True))

{
    "elapsed_time": 48.78208860009909,
    "stderr": "",
    "stdout": [
        "    return len(\n        [\n            1\n            for num in numbers\n            if abs(num - numbers[0]) < abs(num - numbers[1]) and abs(num - numbers[0])\n            < threshold\n        ]\n    ) == 1\n\n\ndef get_language_code(language_name: str) -> str:\n    \"\"\" Returns the language code of a given language name. \"\"\"\n    language_name_parts = language_name.split(\"_\")\n    if len(language_name_parts) > 1:\n        return language_name_parts[0]\n    return language_name\n\n\ndef extract_track_name(track: str, metadata: Metadata) -> str:\n    \"\"\"\n    Extracts track name from metadata and returns it.\n    \"\"\"\n    track_name_key = \"track_name\"\n    if metadata.has_key(track_name_key):\n        track_name = metadata[track_name_key]\n    else:\n        # Get track name from track id\n        track_name = track.split(\"/\")[-1].replace(\"+\", \" \")\n    return track_name\n\n\ndef 

**temperature & top_p**  

Ref:  
- Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Wang, Z., Shen, L., Wang, A., Li, Y. and Su, T., 2023. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. *arXiv preprint arXiv:2303.17568*.

#### 4.1 Evaluation Settings
Page 11:

*For CodeGeeX in code generation, we use t = 0:2; p = 0:95 for pass@1 and
t = 0:8; p = 0:95 for pass@10 and pass@100 (except for Go and JavaScript, where p = 0:9).*

In [16]:
m_return = m(PROMPT, temperature=0.2, top_p=0.95)

In [17]:
print(json.dumps(m_return, indent=4, sort_keys=True))

{
    "elapsed_time": 48.87400425039232,
    "stderr": "",
    "stdout": [
        "    return any(abs(n1 - n2) < threshold for n1, n2 in zip(numbers[:-1], numbers[1:]))\n\n\ndef get_text_from_html(html: str) -> str:\n    \"\"\" Get text from html.\n    >>> get_text_from_html('<p>Hello</p>')\n    'Hello'\n    \"\"\"\n    soup = BeautifulSoup(html, \"html.parser\")\n    return soup.get_text()\n\n\ndef get_text_from_html_with_links(html: str) -> str:\n    \"\"\" Get text from html with links.\n    >>> get_text_from_html_with_links('<p>Hello</p><a href=\"https://www.google.com\">Google</a>')\n    'Hello Google'\n    \"\"\"\n    soup = BeautifulSoup(html, \"html.parser\")\n    return soup.get_text()\n\n\ndef get_text_from_html_with_links_and_images(html: str) -> str:\n    \"\"\" Get text from html with links and images.\n    >>> get_text_from_html_with_links_and_images('<p>Hello</p><a href=\"https://www.google.com\">Google</a><img src=\"https://www.google.com/images/srpr/logo3w.png\" alt=\

In [18]:
# verbose=True will let more info print out to the sever's stderr.
# However, client will not get these info.

m_return = m(PROMPT, verbose=True)

In [19]:
print(json.dumps(m_return, indent=4, sort_keys=True))

{
    "elapsed_time": 48.80143885361031,
    "stderr": "",
    "stdout": [
        "    return any(abs(numbers[i] - numbers[i - 1]) <= threshold for i in range(1, len(numbers)))\n\n\ndef get_file_paths(path: str) -> List[str]:\n    \"\"\" Get list of file paths in given directory.\n    >>> get_file_paths(\"datasets/drinking_quality\")\n    ['datasets/drinking_quality/drinks.csv', 'datasets/drinking_quality/people.csv', 'datasets/drinking_quality/sex.csv']\n    \"\"\"\n    file_paths = []\n    for root, directories, filenames in os.walk(path):\n        for filename in filenames:\n            if filename.endswith(\".csv\"):\n                file_paths.append(os.path.join(root, filename))\n    return file_paths\n\n\ndef get_file_name(path: str) -> str:\n    \"\"\" Get file name from given path.\n    >>> get_file_name(\"datasets/drinking_quality/drinks.csv\")\n    'drinks.csv'\n    \"\"\"\n    return os.path.basename(path)\n\n\ndef get_file_extension(path: str) -> str:\n    \"\"\" Get file

In [20]:
print(m_return['stdout'][0])

    return any(abs(numbers[i] - numbers[i - 1]) <= threshold for i in range(1, len(numbers)))


def get_file_paths(path: str) -> List[str]:
    """ Get list of file paths in given directory.
    >>> get_file_paths("datasets/drinking_quality")
    ['datasets/drinking_quality/drinks.csv', 'datasets/drinking_quality/people.csv', 'datasets/drinking_quality/sex.csv']
    """
    file_paths = []
    for root, directories, filenames in os.walk(path):
        for filename in filenames:
            if filename.endswith(".csv"):
                file_paths.append(os.path.join(root, filename))
    return file_paths


def get_file_name(path: str) -> str:
    """ Get file name from given path.
    >>> get_file_name("datasets/drinking_quality/drinks.csv")
    'drinks.csv'
    """
    return os.path.basename(path)


def get_file_extension(path: str) -> str:
    """ Get file extension from given path.
    >>> get_file_extension("datasets/drinking_quality/drinks.csv")
    '.csv'
    """
    return os.pa

**Run 164 and write to jsonl file**

In [21]:
import datetime

 JSON list format. Ref: https://github.com/THUDM/CodeGeeX/blob/main/codegeex/benchmark/README.md#evaluation

In [22]:
file_name = f'results/python_t02_p095_{datetime.datetime.now()}'.replace(':', '').replace('-', '').replace('.', '_').replace(' ', '_') + '.jsonl'

In [23]:
file_name

'results/python_t02_p095_20230712_104047_994438.jsonl'

In [24]:
import time

In [25]:
st = time.perf_counter()

with open(file_name, 'w') as f:
    for i in HEP:
        print(i['task_id'])
        
        # Generate code
        # use t = 0:2; p = 0:95 for pass@1
        m_return = m(i['prompt'], temperature=0.2, top_p=0.95)
        
        line = {
            'task_id': i['task_id'],
            'prompt': i['prompt'],
            'generation': m_return['stdout'][0].replace('<|endoftext|>', '')
        }

        f.write(json.dumps(line))
        f.write('\n')

et = time.perf_counter()
elapsed_time = et - st
print(f'{len(HEP)} taske(s), elapsed_time: {elapsed_time} S, {elapsed_time/60} minutes, {elapsed_time/60/60} hours.')


Python/0
Python/1
Python/2
Python/3
Python/4
Python/5
Python/6
Python/7
Python/8
Python/9
Python/10
Python/11
Python/12
Python/13
Python/14
Python/15
Python/16
Python/17
Python/18
Python/19
Python/20
Python/21
Python/22
Python/23
Python/24
Python/25
Python/26
Python/27
Python/28
Python/29
Python/30
Python/31
Python/32
Python/33
Python/34
Python/35
Python/36
Python/37
Python/38
Python/39
Python/40
Python/41
Python/42
Python/43
Python/44
Python/45
Python/46
Python/47
Python/48
Python/49
Python/50
Python/51
Python/52
Python/53
Python/54
Python/55
Python/56
Python/57
Python/58
Python/59
Python/60
Python/61
Python/62
Python/63
Python/64
Python/65
Python/66
Python/67
Python/68
Python/69
Python/70
Python/71
Python/72
Python/73
Python/74
Python/75
Python/76
Python/77
Python/78
Python/79
Python/80
Python/81
Python/82
Python/83
Python/84
Python/85
Python/86
Python/87
Python/88
Python/89
Python/90
Python/91
Python/92
Python/93
Python/94
Python/95
Python/96
Python/97
Python/98
Python/99
Python/100