OpenAI api로 구현 #17

guesung · 2023-05-27T15:38:39Z

No description provided.

guesung · 2023-05-27T15:38:55Z

guesung · 2023-05-28T02:40:22Z

OpenAI API

guesung · 2023-05-28T03:27:35Z

File Tuning사용하면 될 듯 하다.

guesung · 2023-05-28T03:29:23Z

형식 :

{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}
{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}

guesung · 2023-05-28T03:43:53Z

Fine-Tuning을 하기 위해 file업로드를 하려고하는데, 자꾸 no such file or directory, open~'에러가 뜬다.

const createFineTune = async () => {
  const response = await openai.createFile(
    fs.createReadStream("data2.csv"),
    "fine-tune"
  );
  console.log(response);
};

guesung · 2023-05-28T05:15:35Z

CSV to JSONL을 이용하여 간편하게 CSV -> JSON형태로 변환하여 저장하였다. 이제 이 데이터를 creatFineTune으로 데이터를 저장해보자.

guesung · 2023-05-28T05:19:48Z

위 사이트에서 변환을 한 jsonl파일을 그대로 적용하니 여전히 400 에러가 발생한다.
변환한 아래 데이터는

아래처럼 propmt:~, completion:~형식을 띄지 않아서 발생하는 오류로 추정된다.

guesung · 2023-05-28T05:20:06Z

그렇다면, prompt와 completion에는 무엇이 들어가야할까?

guesung · 2023-05-28T05:21:17Z

prompt 예시이다.

{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}
{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}

guesung · 2023-05-28T05:26:11Z

make untrue statement, sentiment analysis, categorization for Email triage, conditional generation, wrtie an engaging ad base on a wikipeida article
등 다양한 dataset을 만드는 방법을 소개하고 있다.

guesung · 2023-05-28T05:53:35Z

이 중에서 어떤 걸 나는 사용하면 될까.
Product description based on a technical list of properties을 잘 이용하면 될 것 같다.

내가 사용할 데이터가 위와 같이 생겼으니, prompt에는 성분명 ~ 공고일자, completion에는 약품상세정보를 넣을 것이다.
이렇게 원하는 형태로 csv to jsonl을 변환해주는 곳이 있나 해서 찾아봤더니 찾지 못하여 직접 변환하였다.

guesung · 2023-05-28T05:54:13Z

const CSVLoader = require('langchain/document_loaders/fs/csv').CSVLoader;
const fs = require('fs')

const csvToJsonl = async () => {
  const loader = new CSVLoader("public/data.csv");
  const data = await loader.load();
  const docs = data.map(it => it.pageContent.split('\n'))
  const drugInfo = docs.map(it => it.pop())
  const returnValue = docs.map((it, index) => `{"prompt" : "${it.join(',')}", "completion" : "${drugInfo[index]}"}`).join('\n')
  fs.writeFileSync('public/data3.jsonl', returnValue)
}
csvToJsonl();

코딩테스트에서 익힌 알고리즘을 이렇게 써먹다니. 어렵디 어려운 코테 문제들을 풀다가, 이런 간단한 알고리즘을 구현하려니 손쉽게 구현할 수 있었다.

guesung · 2023-05-28T05:55:25Z

jsonl형식으로 변환된 데이터! 이제 openai에 읽혀보자

guesung · 2023-05-28T06:03:08Z

성공적 !!!!

guesung · 2023-05-28T06:09:14Z

List files를 해보니 잘 업로드 되었음을 확인하였다.

guesung · 2023-05-28T06:10:29Z

Create Fine-tune으로 Fine-tune까지 만들었다.

guesung · 2023-05-28T07:18:54Z

status만 filter했는데, 2개는 failed했고, 4개는 pending 상태이다.

너무 데이터가 커서 failed가 뜨는건가.

guesung · 2023-05-28T07:19:34Z

status만 filter한 값이다. 모두 Filed가 떴다. 데이터가 너무 커서 그런가. 데이터 양을 줄여서 해보자.

guesung · 2023-05-28T07:58:28Z

데이터를 줄여서 입력하니 됐다 ! status = succeeded이고, fine_tuned_model도 부여받았다.

guesung · 2023-05-28T08:16:52Z

"에이서캡슐에 대해 알려줘"라고 요청해보았다.

이상한 메시지가 왔다. 한글이여서 그런가. 지금 사용하고 있는 fine-tune모델은 curie이다.
`Curie는 매우 강력하면서도 매우 빠릅니다. Davinci는 복잡한 텍스트를 분석하는 데 더 강하지만 Curie는 감정 분류 및 요약과 같은 많은 미묘한 작업을 수행할 수 있습니다. Curie는 또한 질문에 답하고 Q&A를 수행하는 것과 일반 서비스 챗봇으로서 매우 능숙합니다. 잘하는 것: 언어 번역, 복잡한 분류, 텍스트 감성, 요약

`
참고 : 링크
curie모델도 언어 번역에 잘한다고 되어 있는데.

guesung · 2023-05-28T08:36:52Z

이렇게 하면 안되고, 하나하나 다 학습을 시켜줘야 하는 것으로 보인다.
예를 들어, prompt: What is javascript? complate : Javascript is langage that running on web site.
처럼 말이다. 이 csv파일을 전부 다 그런 식으로 바꿀 수 는 없고 .. langchain으로 다시 돌아가야 하나.

guesung · 2023-05-28T09:14:22Z

Langchain에서 하던 방식처럼 message를 입력시켜 학습시키는 방법으로 해보았다.

const createChatCompletion = async () => {
  const { data } = await axios("/api/get-data");
  const contents = data.map((it: any) => it.pageContent);
  const messageData = [];
  messageData.push({
    role: "system",
    content: `Given the following extracted parts of a long document and a question, create a final answer.
  If you don't know the answer, just say that you don't know. Don't try to make up an answer.`,
  });
  for (let i = 0; i < 5; i++) {
    messageData.push({
      role: "system",
      content: contents.slice(i * 10, (i + 1) * 10).join("\n\n") + "",
    });
  }
  messageData.push({
    role: "user",
    content: "대웅바이오아세클로페낙정의 성분코드에 대해 알려줘",
  });
  console.log(messageData);

  const completion = await openai.createChatCompletion({
    model: "gpt-3.5-turbo",
    messages: messageData,
  });
  console.log(completion.data.choices[0].message);
};

guesung · 2023-05-28T09:16:38Z

한 content에 10개의 행을 묶어서 전달하였고, 그렇게 2개를 묶어 전달하였으니 20행 정도 전달한 것이다.

guesung · 2023-05-28T09:17:12Z

20행까지는 잘 전달이 되었으나, 30행부터는 400에러가 발생하였다.
메시지에 5782토큰이 담겼으며, 최대 토큰은 4097토큰이라는 것이다. 맞다. GPT 3.5는 4097이 최대 토큰이다.
#19

guesung self-assigned this May 27, 2023

guesung mentioned this issue May 28, 2023

비용 문제 #12

Closed

guesung mentioned this issue May 28, 2023

openai.createFile -> 'no such file or directory, open~' 에러 #18

Closed

guesung closed this as completed May 28, 2023

guesung linked a pull request May 28, 2023 that will close this issue

바닐라 Open AI API 이용하여 DB검색 구현 #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI api로 구현 #17

OpenAI api로 구현 #17

guesung commented May 27, 2023

guesung commented May 27, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 •

edited

Loading

OpenAI api로 구현 #17

OpenAI api로 구현 #17

Comments

guesung commented May 27, 2023

guesung commented May 27, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023

guesung commented May 28, 2023 • edited Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading

guesung commented May 28, 2023 •

edited

Loading