# Getting off the Ground with Commercial APIs

If you don't have a plan, but want to build a language model-driven solution, commercial APIs are a good way to start.

This section shows you how to get started with the leading commercial APIs as of October 2023:
- [OpenAI](https://platform.openai.com/docs/api-reference)
- [Cohere](https://docs.cohere.com/reference/about)
- [Jurassic-2](https://docs.ai21.com/reference/python-sdk) from [A21 Labs](https://www.ai21.com/)
- [Claude](https://github.com/anthropics/anthropic-sdk-python) from Anthropic

### Dependencies

In [None]:
! pip install -qqq openai cohere ai21

In [1]:
### Notebook display
from IPython.display import display, Markdown

### Data processing
import pandas as pd

### Commerical LLM providers
import openai
import cohere

### Open-source LLMs
from transformers import AutoModel, pipeline
from optimum.bettertransformer import BetterTransformer

### Set API Keys
 
Go to these links to find your tokens (after signing up):
- [OpenAI](https://platform.openai.com/account/api-keys)
- [Cohere](https://dashboard.cohere.com/api-keys)
- [A21 Labs](https://studio.ai21.com/account/api-key)

In [2]:
openai_key = ...
cohere_key = ...
ai21_key = ...
# Note: Did not include Claude here, as SDK/API access is gated: https://docs.anthropic.com/claude/docs/getting-access-to-claude

### The common prompt

In [3]:
PROMPT = "How is generative AI affecting the infrastrucutre machine learning developers need access to?"

### Hello OpenAI API

In [5]:
openai.api_key = openai_key
gpt35_completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[{"role": "user", "content": PROMPT}]
)
gpt35_text_response = gpt35_completion.to_dict()['choices'][0]['message']['content'].strip()

### Hello Cohere API

In [6]:
co = cohere.Client(cohere_key)
cohere_cmd_completion = co.generate(prompt=PROMPT, model="command")
cohere_cmd_response = cohere_cmd_completion.data[0].text.strip()

### Hello A21 Labs API

In [7]:
import ai21
ai21.api_key = ai21_key
jurassic2_completion = ai21.Completion.execute(
    model="j2-mid", 
    prompt=PROMPT,
    maxTokens=250,
)
jurassic2_response = jurassic2_completion.completions[0]['data']['text']

In [8]:
from IPython.display import Markdown

Markdown(f"""

**OpenAI GPT3.5**: {gpt35_text_response}

{"="*160}

**Cohere Command**: {cohere_cmd_response}

{"="*160}

**AI21 Jurassic2**: {jurassic2_response}
""")



**OpenAI GPT3.5**: Generative AI is significantly impacting the infrastructure required by machine learning developers. Here are a few key ways:

1. Increased computational demands: Generative AI models, such as deep generative models and variational autoencoders, are often computationally intensive. These models require access to high-performance GPUs or even specialized hardware like TPUs (Tensor Processing Units) to accelerate training or inference. As a result, developers need access to powerful hardware infrastructure to train and deploy generative AI models effectively.

2. Large-scale training data: Many generative AI models require a massive amount of training data to learn effectively. For instance, generative adversarial networks (GANs) need substantial datasets to capture the underlying distribution accurately. Machine learning developers need access to large, high-quality datasets, often stored in distributed file systems or cloud storage, and efficient data processing capabilities to train generative models.

3. Advanced model architectures: Generative AI has introduced complex architectures like GANs, transformers, and autoregressive models that require sophisticated infrastructure for their development and deployment. These models often involve intricate architectures with numerous layers and attention mechanisms. Developers need access to capable deep learning frameworks and libraries, such as TensorFlow or PyTorch, and compatible infrastructure that supports these advanced model architectures.

4. Efficient hyperparameter tuning: Generative AI models often have numerous hyperparameters, such as learning rates, batch sizes, or regularization terms. Tuning these hyperparameters is critical to achieving optimal performance. Machine learning developers require access to infrastructure that enables efficient hyperparameter optimization techniques like grid search, random search, or Bayesian optimization. This may involve distributed computing resources to parallelize hyperparameter search and training processes.

5. Real-time or interactive inference: Some generative AI applications, like style transfer or text-to-image synthesis, require real-time or interactive inference capabilities. Developers need infrastructure with low-latency processing and high throughput to serve these applications quickly. This may involve deploying models on cloud-based systems, edge devices, or specialized hardware like GPUs or FPGAs (Field-Programmable Gate Arrays) to ensure responsive and interactive generative AI experiences.

In summary, generative AI has driven the need for powerful computational resources, large-scale datasets, advanced model architectures, efficient hyperparameter tuning, and real-time inference capabilities—shaping the infrastructure requirements for machine learning developers.

================================================================================================================================================================

**Cohere Command**: As generative AI emerges as a powerful technology for creating new content and ideas, it is having a significant impact on the infrastructure and resources that machine learning (ML) developers need to create and train AI models. Here are some key ways in which generative AI is affecting the ML infrastructure:

Data storage and management: Generative AI models can produce a large volume of data, which requires developers to have sufficient data storage and management infrastructure. This can include cloud-based storage solutions, as well as data processing and analysis tools to handle the large volume of data generated.

Computational power: Training generative AI models requires significant computational power, including access to high-performance GPUs and other specialized hardware. This has led to the development of specialized cloud-based services and infrastructure for training AI models, such as Amazon Web Services (AWS) SageMaker and Google Cloud AI Platform.

Model development and training: The development and training of generative AI models requires developers to have access to advanced machine learning frameworks and tools, such as TensorFlow, PyTorch, and MXNet. These frameworks provide the necessary infrastructure for building and training complex AI models, as well as for deploying them into production.

Data preprocessing and cleaning: The quality of the data used to train generative AI models is critical to their performance. This has led to the development of new tools and techniques for data preprocessing and cleaning, including data augmentation and data normalization.

Security and privacy: As generative AI models become more powerful, they may also become more vulnerable to security and privacy threats. This has led to the development of new security and privacy measures, such as secure data storage and encryption, as well as new techniques for protecting the privacy of users.

Overall, the rise of generative AI is driving the development of new infrastructure and resources for machine learning developers, including advanced data storage and management solutions, high-performance computing resources, and advanced machine learning frameworks and tools.

================================================================================================================================================================

**AI21 Jurassic2**: 
Generative AI is impacting the infrastructure that machine learning developers need access to in several ways:

1. Faster Training: Generative models require large amounts of data to train, and the availability of powerful computing resources is crucial for training them efficiently. With advancements in hardware and the emergence of specialized hardware like GPUs and TPUs, it is now possible to train generative models on larger datasets in shorter timeframes.
2. Enhanced Storage: Generative models generate new data, which can significantly increase the demand for storage. As generative models become more prevalent in various applications, the need for specialized storage systems that can handle large datasets efficiently continues to grow.
3. Improved Networking: Training generative models can involve the transfer of large amounts of data between the model and the training infrastructure. Faster networking infrastructure that can efficiently handle large data transfers is necessary for facilitating the smooth training of generative models.
4. Security and Privacy Concerns: Generative models involve the processing of sensitive personal data, so the infrastructure and systems supporting them need to incorporate robust security and privacy measures. This includes encrypting data at rest and in transit, implementing access controls, and conducting regular security audits.
5. Cloud Adoption: Generative models are computationally intensive and often require specialized hardware, which can be challenging to implement on-premises. As a result, many organizations are moving to cloud-based infrastructure, which offers scalable and easily accessible computing resources. Cloud providers often provide preconfigured environments for deep learning that make it easier for developers to train generative models.
6. Collaboration and Sharing: Generative models often benefit from the collective efforts of


## Endpoint support across APIs

A **rough** picture of what endpoints these APIs have available as of October 20, 2023, without much more effort than what you just saw.

> Note: You can make each of these models do almost anything, making this all muddy. <br/>The point of this table is to highlight which of these APIs have documented endpoints for certain tasks.

<center>

| Endpoint / API | OpenAI | Cohere | Claude | A21 |
| :---: | :---: | :---: | :---: | :---: |
| Prompt-to-response | ✅ | ✅ | ✅ | ✅ |
| Chat-to-response | ✅ | ✅ | ✅ | ✅ |
| Text embeddings | ✅ | ✅ | ❌ | ✅ |
| Fine-tuning | ✅ | ✅ | ❌ | ✅ |
| Language detection | ❌ | ✅ | ❌ | ❌ |
| Raw document processing | ✅ | ✅ | ❌ | ✅ | 
| Rerank / document relevance | ❌ | ✅ | ❌ | ✅ |
| Text/image to image | ✅ | ❌ | ❌ | ❌ |
| Audio-to-text | ✅ | ❌ | ❌ | ❌ |
| Moderations / toxicitiy | ✅ | ✅ | ❌ | ❌ | 

</center>

Some opinions underlying this table:
- If you want to pay for the best chat model --> OpenAI's GPT4 API is gold standard
- If you want multimodal --> OpenAI APIs are great. Stability AI has some nice products not listed here
- If you care a lot about content moderation --> Cohere and OpenAI have the most API support
- If you want fine-grained multi-lingual models --> Try Cohere's [Multilingual Embedding](https://docs.cohere.com/docs/multilingual-language-models) APIs
- If you want build a model that can analyze grammar carefully --> Try A21's [Text Improvements](https://docs.ai21.com/reference/text-improvements-api-ref) and [Grammatical Error Corrections](https://docs.ai21.com/reference/gec-api-ref) APIs
- The public Claude product is a personal favorite, however their API access and feature support is lacking behind others in this list

Of course, you can also try meshing them together if you have the budget and engineering will!

# Getting Started with Open-source Models

## Load Model

In this section we will see how to load a pre-trained model from the HuggingFace Hub. 
You can shop for models [here](https://huggingface.co/models).

After, you'll see how to use these models for text classification and text generation, similar to the core mechanism of how the commerical APIs you saw above are generating text.

In [9]:
model_name = "roberta-base" 
model = AutoModel.from_pretrained(model_name, device_map="auto")

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
model

Want to learn more about transformers like the BERT and GPT family and how they work? Check out the amazing [Bertviz](https://github.com/jessevig/bertviz) tool by [jessevig](https://github.com/jessevig/). you can see a pre-loaded demo [here](https://colab.research.google.com/drive/1hXIQ77A4TYS4y3UthWF-Ci7V7vVUoxmQ?usp=sharing#scrollTo=twSVFOM9SopW).

## HuggingFace Pipeline API

In the previous section we saw how to load a model, in this section we see the easiest way to use HuggingFace models for inference like with the earlier examples using commercial APIs.

You will see how the [HuggingFace Pipeline API](https://huggingface.co/docs/transformers/v4.34.0/en/main_classes/pipelines) perform tasks including:
* [Text Classsification](#text-classification)
* [Text Generation](#text-generation)
* [Text Mask Fill - Optimum](#optimum-for-faster-latency)

## Text Classification

This section picks a leading model for classifying sentiment of chunks of texts.

In [None]:
# More text classification models: https://huggingface.co/models?pipeline_tag=text-classification&sort=trending
model_name = "SamLowe/roberta-base-go_emotions" 
classifier_pipe = pipeline("text-classification", model=model_name)

In [None]:
sentences = [
    "I am feeling inspired today.",
    "This talk is informative, but a bit high-level, where I can find more details?",
    "I wonder about all the hype around Generative AI, is smoke and mirrors?",
    "Building production machine learning systems is challenging."
]

In [None]:
classifier_pipe(sentences)

## Text Generation

In [None]:
model_name = "bigscience/bloom-560m" # https://huggingface.co/bigscience/bloom-560m
generator = pipeline("text-generation", model=model_name, device_map="auto")

In [None]:
prompt = "The Generative AI World Summit is a"
response = generator(prompt, do_sample=False, max_new_tokens=25)

In [None]:
Markdown(f"""
**Prompt**: {prompt}

**{model_name}'s continuation**: {response[0]['generated_text']}...
""")

## Optimum for Faster Latency

In [None]:
from optimum.pipelines import pipeline

model_name = "distilbert-base-uncased"
prompt = "I am attending the Generative AI Summit and I am a practicing [MASK]."

unmasked_optimum_pipeline = pipeline(task="fill-mask", model=model_name, accelerator="bettertransformer")
response = unmasked_optimum_pipeline(prompt)

In [None]:
pd.set_option('display.max_colwidth', 0)
col_mapping = {"score": "Score", "token_str": "Token mask fill", "token": "Token ID", "sequence": "Full generated text"}
pd.DataFrame(response).rename(columns=col_mapping)