# Getting off the Ground with LLMs

In this section, we will walk through how to access cutting-edge LLMs from your Python codes.
We will walk through the basics of commercial APIs, open-source APIs, and a bit about their relative capabilities.

## Commercial APIs

If you don't have a plan, but want to build a language model-driven solution, commercial APIs are a good way to start.

This section shows you how to get started with the leading commercial APIs as of October 2023:
- [OpenAI](https://platform.openai.com/docs/api-reference)
- [Cohere](https://docs.cohere.com/reference/about)
- [Jurassic-2](https://docs.ai21.com/reference/python-sdk) from [A21 Labs](https://www.ai21.com/)
- [Claude](https://github.com/anthropics/anthropic-sdk-python) from Anthropic

### Dependencies

In [1]:
! pip install -qqq openai cohere ai21 transformers

In [4]:
### Notebook display
from IPython.display import display, Markdown

### Data processing
import pandas as pd

### Commerical LLM providers
import openai
import cohere
import ai21

### Open-source LLMs
from transformers import AutoModel, pipeline

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


### Set API Keys
 
Go to these links to find your tokens (after signing up):
- [OpenAI](https://platform.openai.com/account/api-keys)
- [Cohere](https://dashboard.cohere.com/api-keys)
- [A21 Labs](https://studio.ai21.com/account/api-key)

In [2]:
openai_key = ...
cohere_key = ...
ai21_key = ...
# Note: Did not include Claude here, as SDK/API access is gated: https://docs.anthropic.com/claude/docs/getting-access-to-claude

### The common prompt

In [3]:
PROMPT = "How is generative AI affecting the infrastrucutre machine learning developers need access to?"

### Hello OpenAI API

In [4]:
openai.api_key = openai_key
gpt35_completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[{"role": "user", "content": PROMPT}]
)
gpt35_text_response = gpt35_completion.to_dict()['choices'][0]['message']['content'].strip()

### Hello Cohere API

In [5]:
co = cohere.Client(cohere_key)
cohere_cmd_completion = co.generate(prompt=PROMPT, model="command")
cohere_cmd_response = cohere_cmd_completion.data[0].text.strip()

### Hello A21 Labs API

In [8]:
ai21.api_key = ai21_key
jurassic2_completion = ai21.Completion.execute(
    model="j2-mid", 
    prompt=PROMPT,
    maxTokens=250,
)
jurassic2_response = jurassic2_completion.completions[0]['data']['text']

In [9]:
from IPython.display import Markdown

Markdown(f"""

**OpenAI GPT3.5**: {gpt35_text_response}

{"="*160}

**Cohere Command**: {cohere_cmd_response}

{"="*160}

**AI21 Jurassic2**: {jurassic2_response}
""")



**OpenAI GPT3.5**: Generative AI is significantly impacting the infrastructure requirements for machine learning developers. Traditionally, machine learning development focused on supervised learning, where large amounts of labeled data were required. This necessitated the need for extensive compute resources and storage to handle the data processing and training tasks.

However, generative AI, which includes techniques like generative adversarial networks (GANs) and transformers, has revolutionized this paradigm. It enables the generation of new data samples, such as images, text, or even complete programs, without the need for extensive manual labeling. This has several implications for infrastructure:

1. Computing power: Training generative models can be computationally intensive and often requires powerful hardware, such as GPUs or even specialized hardware like TPUs (Tensor Processing Units). Developers need access to these resources to train and fine-tune generative models effectively.

2. Data storage and management: Generative models typically require large-scale datasets, which need to be stored and efficiently managed. Handling significant amounts of data necessitates scalable storage systems that provide fast access and retrieval speeds.

3. Data preprocessing: Preparing data for generative AI involves various preprocessing steps like cleaning, augmentation, and formatting. This requires efficient data processing frameworks that can handle these operations at scale.

4. Model sharing and collaboration: As the field advances, developers need platforms and infrastructure to share pretrained models, collaborate on model development, and perform model evaluations. This requires robust infrastructure to host and distribute the models and frameworks to organize collaboration effectively.

5. Real-time inference: Deploying generative models in real-time applications necessitates low-latency inference. This requires efficient model serving infrastructure that can handle high-throughput requests and respond quickly.

Overall, generative AI has increased the demand for infrastructure resources, including powerful compute, efficient data management, preprocessing tools, collaborative platforms, and real-time deployment capabilities. Machine learning developers need access to such resources to effectively experiment, train, and deploy generative models.

================================================================================================================================================================

**Cohere Command**: As generative AI grows in popularity, it is placing greater emphasis on the infrastructure that machine learning developers require. In particular, the increasing demand for high-performance computing resources like GPUs and TPUs is necessitating a rethinking of how these resources are delivered.

Currently, cloud providers such as Google, Microsoft, and Amazon are bolstering their infrastructure to better support generative AI. This includes upgrades to their hardware as well as the development of new services and products that are specifically tailored for generative AI workloads. For instance, Google has released the generative AI cloud service AI Hub, while Microsoft has created the Azure AI Platform. 

In addition to cloud providers, hardware manufacturers like NVIDIA and AMD are developing more specialized hardware for generative AI. This includes chips that are designed specifically for AI workloads, such as NVIDIA's A100 and AMD's Radeon Instinct.

As the demand for generative AI continues to rise, it is likely that we will see further developments in the infrastructure that machine learning developers need to access. This includes improvements to both hardware and software, as well as the creation of new tools and services that are specifically designed for generative AI.

================================================================================================================================================================

**AI21 Jurassic2**: 
Generative AI is a branch of machine learning that aims to create artificial models that can generate new synthetic data or patterns based on learned data. This technology has the potential to significantly impact the infrastructure that machine learning developers need access to, as it requires large amounts of data and complex computational resources. Here are some ways in which generative AI is affecting the infrastructure required by machine learning developers:

1. Enhanced Data Collection: Generative AI requires large amounts of data to learn from and create new patterns. As a result, there is a growing demand for data collection efforts and the ability to store and manage vast amounts of data. This requires robust data storage infrastructure, including databases and file systems that can handle the large volumes of data.

2. Increased Computing Power: Generative AI models are highly computationally intensive and require very powerful computing infrastructure, such as high-performance computing (HPC) systems or cloud clusters. These computing resources are used to train and fine-tune the models, as well as to generate new patterns and data. Machine learning developers need access to adequate computing infrastructure to train and run these models effectively.

3. Data Processing and Preprocessing: Generative AI models are complex and require preprocessing and manipulation of raw data. This includes tasks such as normalizing the data, handling missing values, and converting data types. Developers need infrastructure that can handle these preprocessing tasks quickly and efficiently, and that enables the integration of various data sources.

4. Model Training and Evaluation: Training and evaluating generative AI models is computationally intensive and requires significant resources. This includes access to computing infrastructure that can handle the iterative nature of model training, as well as the parallelization of computations to maximize performance.




## Endpoint support across APIs

A **rough** picture of what endpoints these APIs have available as of October 20, 2023, without much more effort than what you just saw.

> Note: You can make each of these models do almost anything, making this all muddy. <br/>The point of this table is to highlight which of these APIs have documented endpoints for certain tasks.

<center>

| Endpoint / API | OpenAI | Cohere | Claude | A21 |
| :---: | :---: | :---: | :---: | :---: |
| Prompt-to-response | ✅ | ✅ | ✅ | ✅ |
| Chat-to-response | ✅ | ✅ | ✅ | ✅ |
| Text embeddings | ✅ | ✅ | ❌ | ✅ |
| Fine-tuning | ✅ | ✅ | ❌ | ✅ |
| Language detection | ❌ | ✅ | ❌ | ❌ |
| Raw document processing | ✅ | ✅ | ❌ | ✅ | 
| Rerank / document relevance | ❌ | ✅ | ❌ | ✅ |
| Text/image to image | ✅ | ❌ | ❌ | ❌ |
| Audio-to-text | ✅ | ❌ | ❌ | ❌ |
| Moderations / toxicitiy | ✅ | ✅ | ❌ | ❌ | 

</center>

Some opinions related to this table:
- If you want to pay for the **best chat model** --> 
    - OpenAI's GPT4 API is the current gold standard.
- If you want **multimodal** --> 
    - OpenAI APIs are great - [Images](https://platform.openai.com/docs/api-reference/images), [Audio](https://platform.openai.com/docs/api-reference/audio)
    -  Stability AI has some nice products not listed here. 
    - Check out this recent survey paper: [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421)
- If you care a lot about **content moderation** --> 
    - [Cohere](https://docs.cohere.com/docs/content-moderation-with-classify) and [OpenAI](https://platform.openai.com/docs/api-reference/moderations) have the most API support
- If you want fine-grained **multi-lingual models** --> 
    - Try Cohere's [Multilingual Embedding](https://docs.cohere.com/docs/multilingual-language-models) APIs
- If you want a model that can **analyze grammar** --> 
    - Try A21's [Text Improvements](https://docs.ai21.com/reference/text-improvements-api-ref) and [Grammatical Error Corrections](https://docs.ai21.com/reference/gec-api-ref) APIs
- The public Claude product is a personal favorite, however their API access and feature support is lacking behind others in this list

Of course, you can also try meshing them together if you have the budget and engineering will!

# Getting Started with Open-source Models

## Discussion
- Why would you want to use open-source LLMs?
- Will they ever really be competitive? 
    - What drives the competition if OpenAI's models are 10x bigger and performance keeps scaling with model size?

## Load Model

In this section we will see how to load a pre-trained model from the HuggingFace Hub. 
You can shop for models [here](https://huggingface.co/models).

After, you'll see how to use these models for text classification and text generation, similar to the core mechanism of how the commerical APIs you saw above are generating text.

In [10]:
# T5 paper: https://arxiv.org/pdf/2210.11416.pdf
model_name = "t5-small"
model = AutoModel.from_pretrained(model_name, device_map="auto")

In [11]:
print(model)

T5Model(
  (shared): Embedding(32128, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Dropout(p=0.1, inplace=

Want to learn more about transformers like the BERT and GPT family and how they work? 
- Sebastian Raschka recently gave his description of the history of the transformer in this concise [post](https://www.linkedin.com/posts/sebastianraschka_llms-largelanguagemodels-ai-activity-7121484400701186048--47t?utm_source=share&utm_medium=member_desktop).
- Check out the amazing [Bertviz](https://github.com/jessevig/bertviz) tool by [jessevig](https://github.com/jessevig/). you can see a pre-loaded demo [here](https://colab.research.google.com/drive/1hXIQ77A4TYS4y3UthWF-Ci7V7vVUoxmQ?usp=sharing#scrollTo=twSVFOM9SopW).

## HuggingFace Pipeline API

In the previous section we saw how to load a model, in this section we see the easiest way to use HuggingFace models for inference like with the earlier examples using commercial APIs.

You will see how the [HuggingFace Pipeline API](https://huggingface.co/docs/transformers/v4.34.0/en/main_classes/pipelines) perform tasks including:
* [Text Classsification](#text-classification)
* [Text Generation](#text-generation)
* many more tasks [here](https://huggingface.co/tasks)

## Text Generation
https://huggingface.co/tasks/text-generation

In [None]:
model_name = "bigscience/bloom-560m" # https://huggingface.co/bigscience/bloom-560m
generator = pipeline("text-generation", model=model_name, device_map="auto")

prompt = "The Generative AI World Summit is a" 
response = generator(prompt, do_sample=False, max_new_tokens=25)

In [None]:
Markdown(f"""
**Prompt**: {prompt}

**{model_name}'s continuation**: {response[0]['generated_text']}...
""")

## Text Classification
https://huggingface.co/tasks/text-classification

In [12]:
# More text classification models: https://huggingface.co/models?pipeline_tag=text-classification&sort=trending
model_name = "SamLowe/roberta-base-go_emotions" 

# Create a text classification pipeline using HuggingFace transformers pipeline.
classifier_pipe = pipeline("text-classification", model=model_name)

# Sample data we want to classify the sentiment of.
sentences = [
    "I am feeling inspired today. What a time to be alive!",
    "This talk is informative, but a bit high-level, where I can find more details?",
    "I wonder about all the hype around Generative AI, is it smoke and mirrors?",
    "Building production-grade machine learning systems is challenging."
]

# Run the pipeline!
classifier_pipe(sentences)

[{'label': 'excitement', 'score': 0.24082745611667633},
 {'label': 'admiration', 'score': 0.5622110366821289},
 {'label': 'curiosity', 'score': 0.5444050431251526},
 {'label': 'neutral', 'score': 0.5019526481628418}]

# Summary

In this lesson, you've learned:
- how to programmatically query the leading commercial generative AI APIs
- which endpoints are supported by the leading generative AI APIs
- how to get started with replicating the core modeling loops of generative AI using open-source

In the next lessons we will discuss methods for increasing the relevance of LLM responses, starting with basic prompt engineering, retrieval-augmented generation (RAG), and changing the model itself through fine-tuning and serving it behind an API you can control.