# ChatGPT and the era of prompting
This talk will cover what is ChatGPT and how to use it.

* ChatGPT Evolution : From neural network to large language model
* ChatGPT Usage     : From fine-tuning to zero-shot learning




## ChatGPT Evolution
We will introduce the notion of function and then show that GPT and ChatGPT are complex functions that can do certain tasks for us.
* Function(Traditional vs AI)
* GPT & ChatGPT as a function

### Function(Traditional)
We typically think of an application as a collection of tasks. These tasks can be achieved by writing function. Function is a very general concept and we have been writing it for ages. A function takes an input and produces an ouput. We typically write function body to do such a transformation.

### Function(AI)
With the advent of deep learning, we have a new way to write a function. Instead of explicitly writing functions, we collect a pair of inputs(**x**) and outputs(**y**).  We feed x to an untrained deep learning model **f**. f transforms x to **y'** which is our prediction. We find the gap between y(ground truth) and y'(prediction) and use this gap as a feedback mechanism. When we say model is learning, model is tweaking its internal parameters w to produce outputs closer to ground truth. So f is in-fact f(w,x) and we are learning write **w** such that y'= f(w,x) is in fact very close to y. Once training is complete, w is frozen and we can start inference on a new x to produce new y.

<br/>

![NeuralNet](https://drive.google.com/uc?export=view&id=1Tiwy1JJdP3Mu6URLVn1xjL0r8JQ0hbcv)



### Pretrained Models(aka functions)
Since we have a general mechanism of writing function, practitioners have built models(functions) that can do certain kinds of mapping. Just like container images, we can download these models from the hub and do inference on them. This however assumes that what you want to achieve is available in the model hub.

|Task|Pre-trained model(f)|x|y|
|---|---|---|--- |
| Sentiment Analysis| distilbert-base-uncased-finetuned-sst-2-english  | text  |  sentiment(+ve, -ve) |
| Summarizer| sshleifer/distilbart-cnn-12-6  | text  |  summary |
| Generator| gpt2  | text  |  continuation text |

<br/>
<br/>

Typically, our applications require us to do novel tasks which may not be directly available in the hub. What are our options?

|Function metaphor|AI task| Considerations |
|---|---| ---|
| Write _new_ function| Training from scratch | Costly in compute and data  | 
| _Refactor_ existing function| Fine-tuning existing model| Moderate requirements on new data and compute  | 
| _Pass custom function_ to pre-existing function| Few shot learning, inference time conditioning | Cheapest, require large models, no training  |

<br/>

GPT and ChatGPT are models that do not require new training for a novel task.They can just be conditioned at inference time by providing right prompt. These models are from a family of models called LLM(Large Language Model). Let us get to know these models(functions) a little bit better.

<br/>


### GPT

GPT is a large language model. 

* Large : GPT is large as it has large number of parameters. GPT-3 has 175 billion parameters. Large models have shown to be adept at doing tasks other than they were trained on. This is achieved via few shot learning.

* Language model : GPT is a language model which is trained on large amount of text from the internet. The training task involves predicting next token given starting context. Once training is complete, GPT is able to spit out meaningful continuations from the starting text. 

So, GPT is a function that takes an input prompt and produces a response that continues the input. So given input _The sun is a huge_, GPT may produce a valid continuation as _star present at the centre of the solar system_. 

GPT stands for _Generative Pretrained Transformer_. It is generative as it generates new text. It is pre-trained as the model weights are frozen before we use it. Transfomer refers to a deep learning architecture that allows us to develop large language models like GPT.

<br/>






### GPT vs ChatGPT
GPT model continues a given text but that is not very useful. We would like to have an AI assistant. So we need a modified language model. This is achieved via fine-tuning GPT.

* Supervised fine tuning: SFT
* RL based fine tuning : RLHF or Reinforcement Learning From Human Feedback

First we teach GPT to chat. This is done by feeding conversation data say from Reddit. The model is asked to continue like a conversation. After this phase the model can chat.

Secondly, we want the model to listen to instructions and respond in a way that is valued highly by humans. This is done via RLHF where model receives high score for producing helpful, honest and harmless responses.

At the end of these phases, we have transformed GPT into ChatGPT. Once ChatGPT is ready, all weights are frozen. 

## ChatGPT usage

We will revisit ways to obtaining a function for a novel task. How can we go about leveraging ChatGPT?

|Options| Considerations |
|---|---|
| Write _new_ ChatGPT| Very costly in data, compute, expertise. More than $50M. Big companies doing it.  | 
| _Fine tune_ ChatGPT| Less costly but not readily available. You take a copy of your model in your cluster and use your data to fine-tune. | 
| _Prompt_ ChatGPT   | Cheapest, require no training but need to learn on how to prompt  |

### Few shot learning
Large language models exhibit this property that they do a good job at doing novel tasks without retraining. You can provide prompt at inference time to guide the model to do a certain task. It is as-if the model has already learnt a myriad of tasks and is waiting for you to provide the right prompt. 

![Few-Shot](https://drive.google.com/uc?export=view&id=1c4EnbgGit9hUzN7GWmObjjUMCudH0GxJ)

### Prompt Anatomy

Prompt is a way to elicit desired response from ChatGPT. ChatGPT can be treated like an alien that communicates in certain way and prompt is a way to communicate with it.  Prompt has certain components, not all mandatory.

* __Instruction__: a specific task or instruction you want the model to perform
* __Context__: can involve external information or additional context that can steer the model to better responses
* __Input Data__: is the input or question that we are interested to find a response for
* __Output Indicator__: indicates the type or format of the output.

<br/>
For e.g., in the above example we have following pieces.

* Instruction: Translate English to French:
* Context: sample examples like sea otter => loutre de ner
* Input Data: cheese =>
* Output Indicator: Not used

Look at some guidance on how to build prompt at [Prompting Guide](https://www.promptingguide.ai/). 

### Prompt samples 
Let us review samples of prompt and response from ChatGPT. Keep an eye on the prompt anatomy. Please visit https://chat.openai.com/chat to try more.

#### Prompt-Fact

![Factoid](https://drive.google.com/uc?export=view&id=1V02_Abcug80RdSRi7HNyr1VB73nDld7m)



#### Prompt-Sentiment Analysis

![Sentiment Analysis](https://drive.google.com/uc?export=view&id=1gPJKAFiJTz4AmkucZzjFFx0RVSeEIjG_)


#### Prompt-Reading Comprehension

![Sentiment Analysis](https://drive.google.com/uc?export=view&id=10i95JvM3-OCqSDeJnwQvZSGxbWfHWOb8)


#### Prompt-Maths Reasoning

![Sentiment Analysis](https://drive.google.com/uc?export=view&id=1rBLnRcZoYWoaVgQ082yRaHRiKDz66VxZ)


#### Prompt-Coding

![Sentiment Analysis](https://drive.google.com/uc?export=view&id=1JghmGZx150xf_pUd2iNePp4KXiPPpytD)


### Limitations
ChatGPT gives us impressive capabilities without retraining initial model. That said, it still has few challenges.
  * __Prompt engineering__ : We are still learning how to elicit best response from ChatGPT by doing appropriate prompt engineering. Users may not be able to talk in the language of ChatGPT. This means that we need to _design_ the right prompt for the right task. This can be cumbersome. So there are various efforts in the direction of _**Automatic** Prompt Engineering_ where we would like to automate generation of prompts.

  * __Prompt size__ : ChatGPT limits the prompt size to 4000 tokens that amounts to around 3000 words. This limits how much context can be provided at inference time. In future, we expect this limitation to be mitigated.

  * __Hallucination__ : ChatGPT has no notion of session. In multi-turn scenario, we pass the snapshot of conversation till that time to ChatGPT to maintain the session. For longer session, ChatGPT has been observed to make facts which is commonly termed as hallucination. One way to mitigate this problem is to teach ChatGPT to call APIs for tasks it is not good at. OpenAI has released [plugins](https://openai.com/blog/chatgpt-plugins) models to do that. With the help of ChatGPT plugins, a user can interact with ChatGPT in natural language and ChatGPT can talk to [WolframAlpha](https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/) and help you learn Physics, Chemistry & Maths. You may watch the [chemistry example](https://www.youtube.com/watch?v=EOQV9VakBgE&t=1948s)

## Questions?

## Appendix

Ignore the content below. This is just a filler to allow pdf export.

Essay on Solar System and Planets (300 words)
Introduction

Our solar system was formed billions of years ago. It consists of numerous celestial bodies including planets, satellites, asteroids, comets, meteorites and a massive star. Our solar system forms a part of the Milky Way Galaxy. Various celestial bodies in our solar system revolve around the Sun directly or indirectly.

The Formation of the Solar System

It is believed that around 4.6 billion years ago, the gravitational collapse of a giant interstellar molecular cloud gave shape to our solar system. Major part of the collapsing mass collated at the centre, that formed the Sun. The remaining mass flattened into a proto planetary disk and formed the planets, satellites and other objects in the solar system. Planet Jupiter, the biggest planet in our solar system, contains major chunk of the remaining mass.

Our solar system is believed to have evolved substantially since its inception. Many new moons have come into shape from the gases and dust around the planets. Several collisions among the celestial bodies have also occurred and still continue to occur thereby contributing to the evolution of the solar system.

The Discovery of Planets

For thousands of years astronomers believed that Earth was stationary and formed the centre of the universe. It was in the 18th century that the astronomers accepted that Earth orbits around the Sun.

In 2nd millennium BC, Mercury, Venus, Mars, Jupiter and Saturn were identified by ancient Babylonian astronomers. Later, Nicolaus Copernicus also identified them. Uranus was discovered by famous astronomer, Sir William Herschel in 1781. Neptune was discovered by English astronomer and mathematician, John Couch Adams in the year 1846. It was in the year 1930 that the ninth planet, Pluto was discovered. Astronomer Clyde Tombaugh discovered Pluto which is now identified as a dwarf planet.

Conclusion

The study of the universe and heavenly bodies is one of the most fascinating studies. Through continuous research, astronomers have found out several surprising facts about the universe and our solar system. Our solar system is ever evolving and newer facts are being discovered and studied by researchers year after year.