GPT-3で再学習（fine-tuning）を行うサンプルコードです

### **google driveをマウントする（Mount google drive）**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### **openaiをインストール（Install openai）**

In [None]:
!pip install openai==0.25.0 wandb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai==0.25.0
  Downloading openai-0.25.0.tar.gz (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.9/44.9 KB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting wandb
  Downloading wandb-0.13.9-py2.py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m61.1 MB/s[0m eta [36m0:00:00[0m
Collecting pandas-stubs>=1.1.0.11
  Downloading pandas_stubs-1.5.2.230105-py3-none-any.whl (148 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m148.2/148.2 KB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting GitPython>=1

### **OpenAIで取得したAPI Keyやトレーニングデータのファイル名を設定**

In [None]:
import os
import openai
import pandas as pd
import codecs

openai.api_key = "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" # 自分で取得したAPI Keyに変更する
os.environ['OPENAI_API_KEY'] = openai.api_key
train_name = "kanjimaster1"    # CSVファイル名の.csv以前の文字列
os.environ['TRAIN_NAME']  = train_name

### **トレーニングデータをJSONLファイルに変換**  
- トレーニングデータの詳しい作り方は https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset を参照  
(For detailed instructions on how to create training data, see https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset)

In [None]:
# CSVファイルの読み込みと変換
df = pd.read_csv("/content/drive/MyDrive/" + train_name + ".csv", encoding="utf8", dtype=str) # 文字コードUTF-8の場合は、encoding="utf8"に変更

# JSONファイルへ書き出し
f = codecs.open("/content/drive/MyDrive/" + train_name + ".json", 'w', "utf8")
f.write("[\n")
for index, row in df.iterrows():
  f.write('{"prompt":"' + str(row[0]).replace("\n", "\\n").replace("\"", "\\\"") + ' -> ", "completion":"' + str(row[1]).replace("\n", "\\n").replace("\"", "\\\"") + 'END"}')
  if index < len(df)-1:
    f.write(",")
  f.write("\n")
f.write("]")
f.close()

# JSONLファイルへ変換
! openai tools fine_tunes.prepare_data -f "/content/drive/MyDrive/${TRAIN_NAME}.json"


Analyzing...

- Your file appears to be in a .JSON format. Your file will be converted to JSONL format
- Your file contains 250 prompt-completion pairs
- All prompts end with suffix ` -> `
- All completions end with suffix `END`
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details

Based on the analysis we will perform the following actions:
- [Necessary] Your format `JSON` will be converted to `JSONL`
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: y


Your data will be written to a new JSONL file. Proceed [Y/n]: y

Wrote modified file to `/content/drive/MyDrive/kanjimaster1_prepared.jsonl`
Feel free to take a look!

Now use that file when fine-tuning:
> openai api fine_tunes.create -t "/content/drive/MyDrive/kanjimaster1_prepared.jsonl"

After you’ve fine-tuned a model, remembe

### **ファインチューニングを始める**  
- APIキーは https://beta.openai.com/account/api-keys で確認できる  
(The API key can be found at https://beta.openai.com/account/api-keys)
- ファインチューニングの詳しいことは https://beta.openai.com/docs/guides/fine-tuning/create-a-fine-tuned-model を参照  
(For more information on fine tuning, see https://beta.openai.com/docs/guides/fine-tuning/create-a-fine-tuned-model)

In [None]:
! openai api fine_tunes.create -t "/content/drive/MyDrive/${TRAIN_NAME}_prepared.jsonl" --batch_size 1 

Upload progress:   0% 0.00/259k [00:00<?, ?it/s]Upload progress: 100% 259k/259k [00:00<00:00, 340Mit/s]
Uploaded file from /content/drive/MyDrive/kanjimaster1_prepared.jsonl: file-RcAQ5C8c7oHBKCuHQgQ2f2r9
Created fine-tune: ft-NjbnvklOe97hEAPqPyDK6Lyu
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-01-31 00:47:46] Created fine-tune: ft-NjbnvklOe97hEAPqPyDK6Lyu

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-NjbnvklOe97hEAPqPyDK6Lyu



In [None]:
!openai api fine_tunes.follow -i ft-NjbnvklOe97hEAPqPyDK6Lyu

[2023-01-31 00:47:46] Created fine-tune: ft-NjbnvklOe97hEAPqPyDK6Lyu

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-NjbnvklOe97hEAPqPyDK6Lyu



### **ファインチューニングしたモデルをコマンドで実行する**

In [None]:
# 下記のcurie:ft-nagoya-institute-of-technology-2022-05-19-20-11-26 を自分で作ったモデル名に変更し、判定する語句も適当に減脳してみる
! openai api completions.create -m curie:ft-personal-2023-01-30-08-25-49 -p "one " -M 500 -t 0.7 --stop END

one _________ (bottom radical)
PK
You can use it to mean "bottom" or "dwarf," but it's more often used to mean "one," like 'one person,' or 'one time.'
Used In
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
I don't know the kanji, but I think it basically means, "one time."
Used In
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

'one' - you don't need to know the kanji for this one.

Lookalikes


### **ファインチューニングしたモデルをpythonで実行する (Run the model saved in python.)**  
- 詳しくは https://beta.openai.com/docs/guides/fine-tuning/use-a-fine-tuned-model を参照  
(For more information, see https://beta.openai.com/docs/guides/fine-tuning/use-a-fine-tuned-model)

In [None]:
import re

test_prompt = "above"
model_name = "curie:ft-personal-2023-01-30-08-25-49" # 自分で作ったモデル名に変更する

response = openai.Completion.create(
    model = model_name,
    prompt = test_prompt,
    max_tokens = 500,
    stop = ["END"],
    )

#result = re.search('-> (.+?)end', response['choices'][0]['text'])
#if result:
  #print(result.group(1))
print(response['choices'][0]['text'])


-the-fold text in the Washington Post more on Inauguration Day 2017 than for Donald Trump’s inauguration three months earlier. Apparently. Plus, beyond the novelty of it all, is there a temerity in only putting this in the upper left-hand quadrant of your newspaper? I presume you saved this for a spot above the fold because it encapsulates so much of what’s gone wrong with politics and media in the United States that year. And maybe, by putting it above the fold, you demonstrate for those of us who are still catching up that you know how to cover the Trump phenomenon as a writer and that you are not in this just as a journalist, that is, attached to the same wound but trying to help slough it off a little with essential blood. Anyway, now that you mentioned it, I find it stuck out. There are many things stuck out in it. If you read them out loud they sound like repeated tacky hip-hop lyrics that are repeated over and over and over again.
-Sometimes, the phrase comes at the end of the f

### **否定形テスト**

In [None]:
! openai api completions.create -m curie:ft-personal-2023-01-30-08-25-49  -p "one " -M 500 ; echo
! openai api completions.create -m curie:ft-personal-2023-01-30-08-25-49 -p "two " -M 500; echo

! openai api completions.create -m curie:ft-personal-2023-01-30-08-25-49 -p "three " -M 500; echo
! openai api completions.create -m curie:ft-personal-2023-01-30-08-25-49 -p"above " -M 500; echo

! openai api completions.create -m curie:ft-personal-2023-01-30-08-25-49 -p "good" -M 500; echo

one _____ EMPI (same-color radical)
Onyomi
SEI	
Mnemonic
There's a ___ on the EYE of your LEFT eye and a ___in the same-color spot on your RIGHT one!

Kunyomi
__*_	one person's idea or one type of q-tip's tip is the same color as another person's
_______
__*__	it's the same stuff, ('every time____in 4 kanji')
_____
Jukugo
________	
unfair treatment _____
_ (one) + _ (protest) = ___ (unfair treatment)

Lookalikes
Meaning	Hint	Radical
_	one person's idea or one type of q-tip's tip is the same color as another person's	BOTTOM CROTCH	_
_	swirl	SPINE-MUFFIN	_
My crazy, swirl-muffin has a guy's face, FACE-DOWN in the center, and he's got MOUTH and EYES at the bottom.ENDERFINISHING.ENDERFINISHERS AND NOSEbecause life is MCRUSH: we got a CRUNCHENDERTER on the bottom,COMES IN FACE-UP, and THENLIFE sneaks up ON US AND CARESS US

Used In
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Synonyms
unfair treatment
_______    ______   ENDER ELITE'S NECESSITY
_    __    __    _   ____

Prompt with GRadio

In [None]:
!pip install gradio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gradio
  Downloading gradio-3.17.0-py3-none-any.whl (14.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.2/14.2 MB[0m [31m92.8 MB/s[0m eta [36m0:00:00[0m
Collecting websockets>=10.0
  Downloading websockets-10.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.0/107.0 KB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting markdown-it-py[linkify,plugins]
  Downloading markdown_it_py-2.1.0-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.5/84.5 KB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
Collecting orjson
  Downloading orjson-3.8.5-cp38-cp38-manylinux_2_28_x86_64.whl (140 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.6/140.6 KB[0m [31m22.2 MB/s[0m eta

In [None]:
import gradio as gr
def greet(test_prompt):
  
  model_name = "curie:ft-personal-2023-01-30-08-25-49" # 自分で作ったモデル名に変更する

  response = openai.Completion.create(
      
    model = model_name,
    prompt = test_prompt,
    max_tokens = 500,
    stop = ["END"],
    )


  answer= print(response['choices'][0]['text'])
  
    
  return answer

textbox = gr.Textbox(label="Type your query here:", placeholder="Your Query", lines=2)
# 
gr.Interface(fn=greet, inputs="text", outputs="text").launch()

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>

