In [1]:
import os
import openai

In [2]:
openai.api_key = os.getenv('OPENAI_API_KEY')

***

In [5]:
import pandas as pd

In [6]:
qa_df = pd.read_csv('./python_qa.csv')

In [7]:
qa_df.head()

Unnamed: 0,Id,OwnerUserId,CreationDate,ClosedDate,Score,Title,Body,ParentId,Answer
0,11060,912.0,2008-08-14T13:59:21Z,,18,How should I unit test a code-generator?,This is a difficult and open-ended question I ...,11060,I started writing up a summary of my experienc...
1,17250,394.0,2008-08-20T00:16:40Z,,24,Create an encrypted ZIP file in Python,I'm creating an ZIP file with ZipFile in Pytho...,17250,I created a simple library to create a passwor...
2,31340,242853.0,2008-08-27T23:44:47Z,,71,"How do threads work in Python, and what are co...",I've been trying to wrap my head around how th...,31340,"Yes, because of the Global Interpreter Lock (G..."
3,34020,3561.0,2008-08-29T05:43:16Z,,17,Are Python threads buggy?,A reliable coder friend told me that Python's ...,34020,Python threads are good for concurrent I/O pro...
4,34570,577.0,2008-08-29T16:10:41Z,2011-11-08T16:11:43Z,13,What is the best quick-read Python book out th...,I am taking a class that requires Python. We w...,34570,"I loved Dive Into Python, especially if you're..."


***

In [8]:
questions, answers = qa_df['Body'], qa_df['Answer']

In [9]:
questions

0       This is a difficult and open-ended question I ...
1       I'm creating an ZIP file with ZipFile in Pytho...
2       I've been trying to wrap my head around how th...
3       A reliable coder friend told me that Python's ...
4       I am taking a class that requires Python. We w...
                              ...                        
4424    I am trying to determine what percentage of th...
4425    How can we make a class represent itself as a ...
4426    I thought I could make my python (2.7.10) code...
4427    Say, I have given a DataFrame with most of the...
4428    Let's say I have the following code:\n\na = [1...
Name: Body, Length: 4429, dtype: object

In [10]:
answers

0       I started writing up a summary of my experienc...
1       I created a simple library to create a passwor...
2       Yes, because of the Global Interpreter Lock (G...
3       Python threads are good for concurrent I/O pro...
4       I loved Dive Into Python, especially if you're...
                              ...                        
4424    setup\ncreate 2 time series\n\nfrom StringIO i...
4425    TLDR: It's impossible to make custom classes r...
4426    You are not indexing. You are yielding a list;...
4427    You can create a look up data frame from the d...
4428    Use itertools.product within a list comprehens...
Name: Answer, Length: 4429, dtype: object

In [11]:
qa_openai_format = [{'prompt':q, 'completion':a} for q,a in zip(questions,answers)]

In [12]:
qa_openai_format[4]

{'prompt': 'I am taking a class that requires Python. We will review the language in class next week, and I am a quick study on new languages, but I was wondering if there are any really great Python books I can grab while I am struggling through the basics of setting up my IDE, server environment and all those other "gotchas" that come with a new programming language. Suggestions?\n',
 'completion': "I loved Dive Into Python, especially if you're a quick study.  The beginning basics are all covered (and may move slowly for you), but the latter few chapters are great learning tools.\n\nPlus, Pilgrim is a pretty good writer.\n"}

In [13]:
len(qa_openai_format)

4429

In [15]:
response = openai.Completion.create(model = 'text-babbage-001',
                                    prompt = qa_openai_format[4]['prompt'],
                                    temperature = 0,
                                    max_tokens = 250)

In [18]:
print(response['choices'][0]['text'])


There are a few great Python books that you could consider while you are learning Python. One book that is particularly helpful is "Python for Data Science" by Geoffrey Hinton. This book is packed with information on data science and Python, and it is a great resource for anyone who wants to learn Python for data science purposes. Another great book to consider is "Python for Data Science Mastery" by Michael Nielsen. This book is designed to help you learn more about data science and Python, and it is a great resource for anyone who wants to learn more about Python for data science purposes.


In [19]:
response = openai.Completion.create(model = 'text-davinci-003',
                                    prompt = qa_openai_format[4]['prompt'],
                                    temperature = 0,
                                    max_tokens = 250)

In [20]:
print(response['choices'][0]['text'])


Some great Python books to consider include:

1. Automate the Boring Stuff with Python by Al Sweigart
2. Python Crash Course by Eric Matthes
3. Python for Data Analysis by Wes McKinney
4. Python Cookbook by David Beazley and Brian K. Jones
5. Learning Python by Mark Lutz
6. Fluent Python by Luciano Ramalho
7. Python in a Nutshell by Alex Martelli
8. Python Pocket Reference by Mark Lutz
9. Python for Kids by Jason R. Briggs
10. Python Essential Reference by David Beazley


***

In [22]:
# !pip install tiktoken

***

In [23]:
import tiktoken

In [24]:
def no_tokens_from_string(string, encoding_name):
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [25]:
import json

In [26]:
dataset_size = 500
with open('my_example_training_data.json','w') as f:
    for entry in qa_openai_format[:500]:
        f.write(json.dumps(entry))
        f.write("\n")

In [35]:
token_counter = 0

for prompt_completion in qa_openai_format[:500]:
    for question,answer in prompt_completion.items():
        token_counter += no_tokens_from_string(question, 'p50k_base')
        token_counter += no_tokens_from_string(answer, 'p50k_base')

token_counter

186352

In [36]:
(token_counter/1000)*0.0006*4  # for babbaeg model ($0.0006 per 1000 tokens * 4(epochs) )

0.44724479999999994

In [37]:
(token_counter/1000)*0.0006*4*82  # In Rs

36.67407359999999

In [39]:
print(f"There are {token_counter} tokens")
print(f"Fine tuning using babbage costs $0.0006 per 1000 tokens")
print(f"Estimated price: ${(4*token_counter / 1000) * 0.0006}")
print(f"Estimated price: Rs{(4*token_counter / 1000) * 0.0006*82}")

There are 186352 tokens
Fine tuning using babbage costs $0.0006 per 1000 tokens
Estimated price: $0.44724479999999994
Estimated price: Rs36.67407359999999


***

### Fine Tunning the model

In [40]:
!openai api fine_tunes.create -t my_example_training_data.json -m babbage

Uploaded file from my_example_training_data.json: file-Lrgp1dDEBdMjhj4MjEgXcjUe
Created fine-tune: ft-wSQtSNGKNpJZ4SzlHa7B2AJU
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-05-07 18:50:53] Created fine-tune: ft-wSQtSNGKNpJZ4SzlHa7B2AJU
[2023-05-07 18:51:27] Fine-tune costs $0.47
[2023-05-07 18:51:27] Fine-tune enqueued. Queue number: 0




Upload progress:   0%|          | 0.00/707k [00:00<?, ?it/s]
Upload progress: 100%|##########| 707k/707k [00:00<00:00, 465Mit/s]


In [41]:
!openai api fine_tunes.list

{
  "data": [
    {
      "created_at": 1683465653,
      "fine_tuned_model": null,
      "hyperparams": {
        "batch_size": 1,
        "learning_rate_multiplier": 0.1,
        "n_epochs": 4,
        "prompt_loss_weight": 0.01
      },
      "id": "ft-wSQtSNGKNpJZ4SzlHa7B2AJU",
      "model": "babbage",
      "object": "fine-tune",
      "organization_id": "org-rvv0qZo6yHiAv5ITLnbEBTOY",
      "result_files": [],
      "status": "pending",
      "training_files": [
        {
          "bytes": 706683,
          "created_at": 1683465653,
          "filename": "my_example_training_data.json",
          "id": "file-Lrgp1dDEBdMjhj4MjEgXcjUe",
          "object": "file",
          "purpose": "fine-tune",
          "status": "processed",
          "status_details": null
        }
      ],
      "updated_at": 1683465687,
      "validation_files": []
    }
  ],
  "object": "list"
}


In [46]:
!openai api fine_tunes.get -i ft-wSQtSNGKNpJZ4SzlHa7B2AJU  # Check the status

{
  "created_at": 1683465653,
  "events": [
    {
      "created_at": 1683465653,
      "level": "info",
      "message": "Created fine-tune: ft-wSQtSNGKNpJZ4SzlHa7B2AJU",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1683465687,
      "level": "info",
      "message": "Fine-tune costs $0.47",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1683465687,
      "level": "info",
      "message": "Fine-tune enqueued. Queue number: 0",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1683466289,
      "level": "info",
      "message": "Fine-tune started",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1683466401,
      "level": "info",
      "message": "Completed epoch 1/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1683466493,
      "level": "info",
      "message": "Completed epoch 2/4",
      "object": "fine-tune-event"
    },
    {
      "created_at": 1683466585,
      "level": "info",


In [47]:
fine_tunned_model = "babbage:ft-personal-2023-05-07-13-38-24"

In [49]:
response = openai.Completion.create(model = 'text-babbage-001',
                                    prompt = "What are good Python Books ?",
                                    max_tokens=128,
                                    temperature=0)

In [51]:
print(response['choices'][0]['text'])



There are many good Python books out there, but some of the most popular choices include "Python for Data Science" by Geoffrey Hinton, "Python for Data Science Mastery" by Michael Nielsen, and "Python for Data Science Mastery 2nd Edition" by Michael Nielsen.


In [52]:
response = openai.Completion.create(model = fine_tunned_model,
                                    prompt = "What are good Python Books ?",
                                    max_tokens=128,
                                    temperature=0)

In [53]:
print(response['choices'][0]['text'])


You can't go wrong with the first one on the list, but there are others.

The Python Cookbook is a good general reference.

The Python Design and Implementation Guide is a good reference for design.

The Python UDFs Cookbook is a good reference for Python UDFs.

The Python in Practice is a good reference for Python in general.

The Python in Depth is a good reference for advanced topics.

The Python in Depth 2 is a good reference for advanced topics.

The Python in Depth 3 is a good reference for advanced topics.

The Python
