<a href="https://colab.research.google.com/github/saad-a-li/Cohere/blob/main/Cohere_API_Workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Cohere API Workshop**
University of Waterloo Data Science Club | June 2023

This notebook has been created to compliment the LLM API slide deck. In this notebook we explore the different Cohere API endpoints as well as LangChain.

**Before running the cells:**
1. Create a copy of this notebook by navigating to: File > Save a copy in Drive. This allows you to edit the notebook.

2. The first step is to create a Cohere account [here](https://dashboard.cohere.ai/welcome/register?__hstc=14363112.b9ea3a4df23480959007724c0be23db2.1684965394606.1685218260676.1686410973933.3&__hssc=14363112.1.1686410973933&__hsfp=2642313323).

3. Once you have logged in to the Cohere platform, on the left pannel navigate to: SETTINGS > API Keys. Then scroll down to Trial keys and press the copy icon next to your default key.

4. Paste your API key in quotations to replace the 'api_key' string two cells down. Now you are ready to run the code!

Note that there is also a way to use some of the functionality of Cohere's API without needing to use any code at all. To explore this press the PLAYGROUND button on the top right hand corner once you are logged in. As of right now, you can use the generate, classify, embed, and summarize endpoints. Once you have found a setup you like, you can press the View code button below the PLAYGROUND button and this will automatically create the code you need to get the job done.

## Basic Setup
There is a python library to interface with the Cohere API. It can be easily installed using PIP. Then we can import the libraries we need and authenticate ourselves using the API key we generated in the steps above.

In [None]:
! pip install cohere

In [None]:
import time
import cohere
from cohere.responses.classify import Example

api_key = 'api_key'  # Add your API key here instead of api_key
co = cohere.Client(api_key)

## Generate
Generates a text completion based on prompt. This is the most basic endpoint but can be used for all sorts of interesting and creative purposes.

The only required argument to get started with it is just the prompt as shown below.

In [None]:
response = co.generate(prompt='Generate a list of reasons why learning data science is good')

print(response)

[cohere.Generation {
	id: 55261d17-648d-4312-a52f-897c8e91c597
	prompt: Generate a list of reasons why learning data science is good
	text: 
1. Learning data science will help you understand how to use data to make decisions
2.
	likelihood: None
	token_likelihoods: None
}]


Now there are a lot of additional peices of information here that may make it hard to clearly see what the completion was so we can just print the text response part as shown in the code block below.

In [None]:
print(response.generations[0].text) # To just see the text only


1. Learning data science will help you understand how to use data to make decisions
2.


But this is sort of unfortunate, because the default completion is so short and cut off mid-scentence. Sometimes these outputs will cut off mid-scentence becasue of the max_tokens number being reached, the default value is 20 but this is something we can edit. Below I change this to 200 for a longer output.

In [None]:
response = co.generate(
  prompt='Generate a list of reasons why learning data science is good',
  max_tokens=200)
print(response.generations[0].text)



- Data science is a high-paying field
- The skills learned in data science are transferable to other fields
- The demand for data scientists is high
- The work is rewarding
- The skills learned in data science are useful in a variety of industries
- The field is constantly changing, so there is always something new to learn
- The field is interdisciplinary, so you can apply your skills in a variety of ways
- The field is international, so you can work anywhere in the world
- The field is in demand, so there are many job opportunities


If you are interested in learning about how you can change other default parameters, use the [API reference on the Cohere documentation website](https://docs.cohere.com/reference/generate).

One of the next-most influencial parameters is temperature. Below is a function that calls the generate endpoint with specific default parameters changed like the model used as well as the number of outputs generated.

In [None]:
# Create a function to call the endpoint
def generate_text(prompt, temperature, num_gens=3):
  response = co.generate(
    model='command-nightly',
    prompt=prompt,
    max_tokens=100,
    temperature=temperature,
    num_generations = num_gens,
    return_likelihoods='GENERATION')
  return response

Now we can set a range of temperatures to explore and a prompt to use. As you can see from the outputs of the cell below, as temperature is increased the outputs become more creative and different from one another.

In [None]:
# Define the prompt
prompt='Generate a concise but catchy tagline for the University of Waterloo Data Science Club.'

# Define the range of temperature values and num_generations
temperatures = [x / 10.0 for x in range(0, 60, 10)]

# Iterate generation over the range of temperature values
for temperature in temperatures:
  response = generate_text(prompt, temperature)
  print("-"*10)
  print(f'Temperature: {temperature}')
  print("-"*10)
  for i in range(3):
    text = response.generations[i].text
    likelihood = response.generations[i].likelihood
    print(f'Generation #{i+1}')
    print(f'Text: {text}\n')
    print(f'Likelihood: {likelihood}\n')

  # Pause due to rate limitting
  time.sleep(60)

----------
Temperature: 0.0
----------
Generation #1
Text: 

Data Science Club | University of Waterloo

Likelihood: -0.3078624

Generation #2
Text: 

Data Science Club | University of Waterloo

Likelihood: -0.30933592

Generation #3
Text: 

Data Science Club | University of Waterloo

Likelihood: -0.3081709

----------
Temperature: 1.0
----------
Generation #1
Text: 

Data Science. Knowledge Discovery. Empowering tomorrow.

Likelihood: -1.8021718

Generation #2
Text: 

Data Science Club | University of Waterloo

Likelihood: -0.30933592

Generation #3
Text: 

Data Science Club | Data Science for All

Likelihood: -0.69838613

----------
Temperature: 2.0
----------
Generation #1
Text: 

Data Science Club | University of Waterloo

Likelihood: -0.3078624

Generation #2
Text: 

The University of Waterloo Data Science Club: Building a community of data science experts.

Likelihood: -0.5085161

Generation #3
Text: 

Data Science. intersects. Everything.

Likelihood: -1.19391

----------
Temper

In the examples above, I've used this endpoint in a really similar and basic way. However, at some point you may run into situations where you need more specific kinds of outputs to complete more challenging tasks, this is where prompt engineering comes into play. An example of prompt engineering is shown below where the background is taken from an [article](https://www.therecord.com/news/waterloo-region/2015/04/21/wild-turkey-smashes-uw-window-after-canada-geese-attack.html) called "Wild turkey smashes UW window after Canada geese attack" from The Record.

In [None]:
context = "Background on the recent Waterloo geese attack:\n WATERLOO — The Spawn of Satan, as deviously dubbed by his detractors,\
 was missing from the big bird's usual lunch-hour perch as rain began to pour on Tuesday. Neither the ghoulish goose, nor his less-notorious \
 posse, peered over the fat lip of the lower roof beside the University of Waterloo's Humanities Theatre. Maybe the Canada geese were nursing \
 a gaggle of guilty consciences. It had been a wild 24 hours in the Hagey Hall heart of a goose-friendly campus — with its creek, little lakes \
 and marshlands — that pushes stuffed geese and I-survived-nesting-season T-shirts at the school swag shop. It started with a few bangs above the \
 mostly enclosed courtyard. The wild turkey, who had inexplicably taken up residence in the area late last week appeared to have been harassed and \
 hemmed into the square by territorial goose tormentors, was trying to fly to freedom, school officials confirm. The bullied butterball made it \
 three stories high before crashing into an empty philosophy seminar room."

answer_start = "Poem inspired from the text above: Oh the waterloo geese,"

response = co.generate(
  prompt=context+answer_start,
  max_tokens=200)
print(response.generations[0].text)



Oh the waterloo geese,
A wild turkey in their sights,
They chased him from the yard,
And he tried to fly away,
But the turkey wasn't quick enough,
And he crashed into a room,
Where he lay in a heap,
Until the end of his days.


## Classify
The second endpoint we will use is able to classify sections of text into labels that you create. In order for this to work you need to give it some examples.

Below, I create a few example workshop reviews for our UWDSC workshops and note if they are good or bad. Then I have a list of other reviews that I want to know if they are good or bad without having to read them all so I input them to the classify endpoint.

In [None]:
example_lst = [Example("the uwdsc is the worst", "bad"), Example("the uwdsc is the best", "good"), Example("a waste of time", "bad"), Example("incredibly useful", "good")] # can also add examples from a file
input_lst = ["i hated the data science club workshops", "the workshop sucked", "the workshop was helpful", "the data science club workshops are the best thing ever", "it was okay", "would go again"]

response = co.classify(
  model='embed-english-v2.0',
  inputs=input_lst,
  examples=example_lst)

In [None]:
for r, input in zip(response.classifications, input_lst):
  print('################')
  print(input)
  print(r.prediction, 'with confidence', round(r.confidence,3))

################
i hated the data science club workshops
bad with confidence 0.97
################
the workshop sucked
bad with confidence 0.96
################
the workshop was helpful
good with confidence 0.993
################
the data science club workshops are the best thing ever
good with confidence 0.996
################
it was okay
bad with confidence 0.897
################
would go again
good with confidence 0.903


## Summarize
Summarize a chunk of text. This example shows a really short summary but you can make it longer by changing lenght from 'short' to 'medium' or 'long'.

Text in the example is copied from: https://www.britannica.com/biography/Taylor-Swift

In [None]:
response = co.summarize(
  text='Early life\nSwift showed an interest in music at an early age, and she progressed quickly from roles in children’s theatre to her first appearance before a crowd of thousands. She was age 11 when she sang “The Star-Spangled Banner” before a Philadelphia 76ers basketball game, and the following year she picked up the guitar and began to write songs. Taking her inspiration from country music artists such as Shania Twain and the Dixie Chicks, Swift crafted original material that reflected her experiences of tween alienation. When she was 13, Swift’s parents sold their farm in Pennsylvania to move to Hendersonville, Tennessee, so that she could devote more of her time to courting country labels in nearby Nashville.\n\n(Left) Luis Fonsi and Daddy Yankee (Ramon Luis Ayala Rodriguez) perform during the 2017 Billboard Latin Music Awards and Show at the Bank United Center, University of Miami, Miami, Florida on April 27, 2017. (music)\nBritannica Quiz\n2010s Music Quiz\nA development deal with RCA Records allowed Swift to make the acquaintance of recording-industry veterans, and in 2004, at age 14, she signed with Sony/ATV as a songwriter. At venues in the Nashville area, she performed many of the songs she had written, and it was at one such performance that she was noticed by record executive Scott Borchetta. Borchetta signed Swift to his fledgling Big Machine label, and her first single, “Tim McGraw” (inspired by and prominently referencing a song by Swift’s favourite country artist), was released in the summer of 2006.\n\nDebut album and Fearless\nTaylor Swift\nTaylor Swift\nThe song was an immediate success, spending eight months on the Billboard country singles chart. Now age 16, Swift followed with a self-titled debut album, and she went on tour, opening for Rascal Flatts. Taylor Swift was certified platinum in 2007, having sold more than one million copies in the United States, and Swift continued a rigorous touring schedule, opening for artists such as George Strait, Kenny Chesney, Tim McGraw, and Faith Hill. That November Swift received the Horizon Award for best new artist from the Country Music Association (CMA), capping the year in which she emerged as country music’s most-visible young star.\n\nOn Swift’s second album, Fearless (2008), she demonstrated a refined pop sensibility, managing to court the mainstream pop audience without losing sight of her country roots. With sales of more than half a million copies in its first week, Fearless opened at number one on the Billboard 200 chart. It ultimately spent more time atop that chart than any other album released that decade. Singles such as “You Belong with Me” and “Love Story” were popular in the digital market as well, the latter accounting for more than four million paid downloads.\n\nKanye West incident at the VMAs, Red, and 1989\nTaylor Swift\nTaylor Swift\nIn 2009 Swift embarked on her first tour as a headliner, playing to sold-out venues across North America. That year also saw Swift dominate the industry award circuit. Fearless was recognized as album of the year by the Academy of Country Music in April, and she topped the best female video category for “You Belong with Me” at the MTV Video Music Awards (VMAs) in September. During her VMA acceptance speech, Swift was interrupted by rapper Kanye West, who protested that the award should have gone to Beyoncé for what he called “one of the best videos of all time.” Later in the program, when Beyoncé was accepting the award for video of the year, she invited Swift onstage to conclude her speech, a move that drew a standing ovation for both performers. At the CMA Awards that November, Swift won all four categories in which she was nominated. Her recognition as CMA entertainer of the year made her the youngest-ever winner of that award, as well as the first female solo artist to win since 1999. She began 2010 with an impressive showing at the Grammy Awards, where she collected four honours, including best country song, best country album, and the top prize of album of the year.\n\n\n\nGet a Britannica Premium subscription and gain access to exclusive content.\nLater that year Swift made her feature-film debut in the romantic comedy Valentine’s Day and was named the new spokesperson for CoverGirl cosmetics. Although Swift avoided discussing her personal life in interviews, she was surprisingly frank in her music. Her third album, Speak Now (2010), was littered with allusions to romantic relationships with John Mayer, Joe Jonas of the Jonas Brothers, and Twilight series actor Taylor Lautner. Swift reclaimed the CMA entertainer of the year award in 2011, and the following year she won Grammys for best country solo performance and best country song for “Mean,” a single from Speak Now.\n\nTaylor Swift\nTaylor Swift\nTaylor Swift\nTaylor Swift\nSwift continued her acting career with a voice role in the animated Dr. Seuss’ The Lorax (2012) before releasing her next collection of songs, Red (2012). While she remained focused on the vagaries of young love, her songwriting reflected a deepened perspective on the subject, and much of the album embraced a bold pop-rock sound. In its first week on sale in the United States, Red sold 1.2 million copies—the highest one-week total in 10 years. In addition, its lead single, the gleeful “We Are Never Ever Getting Back Together,” gave Swift her first number-one hit on the Billboard pop singles chart.\n\nIn 2014 Swift released 1989, an album titled after the year of her birth and reportedly inspired by the music of that era. Although Swift had already been steadily moving away from the traditional country signifiers that marked her early work—“I Knew You Were Trouble,” the second single from Red, even flirted with electronic dance music—she called 1989 her first “official pop album.” On the strength of the upbeat “Shake It Off,” the album proved to be another blockbuster for Swift, with its first-week sales surpassing those of Red. It went on to sell more than five million copies in the United States and earned Swift her second Grammy for album of the year. In 2014 Swift also appeared in a supporting role in The Giver, a film adaptation of Lois Lowry’s dystopian novel for young readers.\n\nMichael Ray\nReputation, Lover, Folklore, Evermore, Midnights, and controversies\nTaylor Swift\nTaylor Swift\nIn 2016 Swift’s feud with Kanye West resumed after he released the single “Famous.” The song included a lyric in which Swift was referred to as a “bitch,” and she alleged that it was misogynistic. The public spat escalated after West’s wife, Kim Kardashian, released a recording of a phone call in which Swift gave her approval for the line, though West made no mention of calling her a bitch. Swift’s controversies continued as she took part in a widely publicized civil trial in August 2017, after former radio host David Mueller sued the singer, her mother, and a promoter, claiming that Swift had falsely accused him of sexually groping her in 2013 during the taking of a photograph and thus destroyed his career. She countersued, maintaining that the assault had taken place. At the trial, Swift was removed from Mueller’s suit and the other two defendants were found not liable as the jury found in favour of Swift’s countersuit. Shortly thereafter Swift released the hit song “Look What You Made Me Do,” and her album Reputation became the top-selling American LP of 2017.\n\nIn 2018 Swift left Big Machine and signed with Republic Records and Universal Music Group. The following year her former label, which owned the master recordings of her six albums, was sold to Scooter Braun, a talent manager whose clients had included Kanye West. Swift publicly spoke out against the deal, claiming that Borchetta had rejected her attempts to acquire the master tapes and that Braun had bullied her over the years. She subsequently tried to negotiate a deal with Braun, but he sold her back catalog to a private investment firm in 2020. Against this backdrop, Swift began rerecording her early material in an effort to gain control of it—the hope being that her remade songs and not the originals would be sought out for licensing deals—and in 2021 Fearless (Taylor’s Version) and Red (Taylor’s Version) appeared. They were remakes of earlier albums with several previously unreleased tracks.\n\nTaylor Swift\nTaylor Swift\nIn 2019 Swift released her seventh album, Lover, which she described as “a love letter to love itself.” That year she also appeared in the musical Cats, a film adaptation of Andrew Lloyd Webber’s hugely successful stage production. Miss Americana (2020) is a documentary about her life and career. With little advance notice, she released Folklore in 2020. A departure from her previous pop-inspired work, Swift’s eighth studio album drew praise for its introspection and restraint, and it won the Grammy for album of the year. The “sister record,” Evermore, appeared later in 2020. Swift adopted a synth-pop sound for the candid Midnights (2022), which she described as “the story of 13 sleepless nights scattered throughout my life.”',
  length='short',
  format='auto',
  model='summarize-xlarge',
  additional_command='',
  temperature=0.1,
)

print('Summary:', response.summary)

Summary: Taylor Swift is a popular American singer-songwriter who has won numerous awards for her work.


## Rerank
Returns list of most relevant search results from chunks of text.


In [None]:
docs = ['Carson City is the capital city of the American state of Nevada.',
'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.',
'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.',
'Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.']

response = co.rerank(
  model = 'rerank-english-v2.0',
  query = 'What is the capital of the United States?',
  documents = docs,
  top_n = 3,
)
print(response)

[RerankResult<document['text']: Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district., index: 2, relevance_score: 0.98005307>, RerankResult<document['text']: Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states., index: 3, relevance_score: 0.27904198>, RerankResult<document['text']: Carson City is the capital city of the American state of Nevada., index: 0, relevance_score: 0.10194652>]


## Detect Language
This endpoint returns the langauges of the provided text.

In [None]:
text_lst = ["Hello world", "'Здравствуй, Мир'", "Hallo Welt", "Hej Verden", "Hei maailma", "Hej världen"]

response = co.detect_language(
  texts=text_lst
)

for language, text in zip(response.results, text_lst):
  print(language.language_name, ':', text)

English : Hello world
Russian : 'Здравствуй, Мир'
German : Hallo Welt
Polish : Hej Verden
Finnish : Hei maailma
Swedish : Hej världen


Feel free to play around with other languages, [here](https://mixable.blog/hello-world-in-74-natural-languages/) you can find "Hello world!" written in 74 natural languages.

## LangChain
This is a framework to help you build applications powered by LLMs. It can work with a long list of LLMs including Cohere's models.

You can read more about all the utilities [here](https://python.langchain.com/en/latest/index.html) and see all sorts of other examples. Two really neat examples not shown here are being able to query documents and using chains.

Much of the code below was taken from this github [repo](https://github.com/sophiamyang/tutorials-LangChain/tree/main).

In [None]:
! pip install langchain

In [None]:
import os
import langchain
from langchain.llms import Cohere
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory, CombinedMemory, ConversationSummaryMemory
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [None]:
os.environ["COHERE_API_KEY"] = api_key
cohere = Cohere(model='command-xlarge')

In [None]:
# Basic text generation
text = "How to be happy as a waterloo student in midterm season?"
print(cohere(text))


As a Waterloo student in midterm season, you can try to be happy by managing your time effectively, getting enough sleep and exercise, and taking breaks when needed. You can also try to stay motivated by setting goals for yourself and rewarding yourself when you achieve them. Additionally, it can be helpful to surround yourself with supportive friends and family members who can help you feel less stressed.


In [None]:
# Chatbot
conv_memory = ConversationBufferWindowMemory(
    memory_key="chat_history_lines",
    input_key="input",
    k=1
)

summary_memory = ConversationSummaryMemory(llm=Cohere(), input_key="input")

memory = CombinedMemory(memories=[conv_memory, summary_memory])
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Summary of conversation:
{history}
Current conversation:
{chat_history_lines}
Human: {input}
AI:"""
PROMPT = PromptTemplate(
    input_variables=["history", "input", "chat_history_lines"], template=_DEFAULT_TEMPLATE
)
llm = Cohere(temperature=0)
conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=memory,
    prompt=PROMPT
)

In [None]:
conversation.run("Hey! What's up?")



Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Summary of conversation:

Current conversation:

Human: Hey! What's up?
AI:[0m

[1m> Finished chain.[0m


"\nHey! I'm doing well, how about you?"

In [None]:
conversation.run("I'm doing well too, but a bit bored. Do you have any fun ideas for side projects I could work on?")



Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Summary of conversation:

The human and AI greeted each other.
Current conversation:
Human: Hey! What's up?
AI: 
Hey! I'm doing well, how about you?
Human: I'm doing well too, but a bit bored. Do you have any fun ideas for side projects I could work on?
AI:[0m

[1m> Finished chain.[0m


'\nSure! There are a few things you could try. For one, you could try building a small game or app. This would give you something to work on and also be a fun project to show off to your friends. You could also try learning a new skill, like programming or design. This would be a great way to keep yourself busy and also learn something new.'

Thanks for following along with this tutorial, to learn more about LangChain specifically reference their [handbook](https://www.pinecone.io/learn/langchain-intro/). My personal recommendation is to check out the Building Custom Tools for LLM Agents chapter and see how they make it possible to give an LLM access to a simple calculator so that it can actually do math that makes sense.


In the workshop session Luis sent a link to LangChain lab on the Cohere website that you can access [here](https://txt.cohere.com/search-cohere-langchain/), it walks you through how to build a multilingual semantic search application on top of Cohere's multilingual model.