<a href="https://colab.research.google.com/github/paradimes/cohere/blob/main/Cohere_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Learning and trying out Cohere APIs. Following along with https://txt.cohere.com/hello-world-p1/ and https://txt.cohere.com/hello-world-p2/.

# Setup

In [2]:
# Install libraries
! pip install cohere altair umap-learn > /dev/null

In [3]:
# Import libraries
import cohere
import pandas as pd
import numpy as np
import altair as alt
import textwrap as tr

# Setup Cohere client
api_key = 'PudFoWTJJgo3Xg4CLoEydHjJoReesIg5JXG4mjkt'
co = cohere.Client(api_key)

# 1. Generating Text

In [None]:
# Generating Text

paragraphs = []

# Creating a prompt. Prompts provide a context for the text that we want the model to generate.
# Prompt Engineering: Shape the output to our desire by figuring out how to oprimally prompt the model for a particular task.

# base_prompt1 = """
# This program will generate the first paragraph of a blog post given a blog title.
# --
# Blog Title: Best Activities in Toronto
# First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
# largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
# visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
# this blog post, I'll share some of my favorite recommendations
# --
# Blog Title: Mastering Dynamic Programming
# First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
# and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
# solve dynamic programming problems.
# --
# Blog Title:"""

base_prompt = """
This program will generate the table of contents of a given topic.
--
Topic: Software Engineering
Table of Contents:
1. Introduction to Software Engineering
   1.1 Evolution and Importance of Software Engineering
   1.2 Software Engineering Principles and Practices
   1.3 Software Development Life Cycle Models
   1.4 Software Engineering Ethics and Professionalism

2. Requirements Engineering
   2.1 Gathering and Eliciting Requirements
   2.2 Analyzing and Documenting Requirements
   2.3 Requirements Validation and Verification
   2.4 Requirements Management and Change Control

3. Software Design
   3.1 Principles of Software Design
   3.2 Architectural Design
   3.3 Detailed Design and Object-Oriented Design
   3.4 Design Patterns and Software Reusability

4. Software Construction
   4.1 Coding Practices and Standards
   4.2 Software Development Tools and Environments
   4.3 Testing and Debugging Techniques
   4.4 Software Configuration Management

5. Software Testing and Quality Assurance
   5.1 Software Testing Fundamentals
   5.2 Testing Techniques and Strategies
   5.3 Test Case Design and Execution
   5.4 Software Quality Assurance and Metrics

6. Software Maintenance and Evolution
   6.1 Software Maintenance Process
   6.2 Reverse Engineering and Reengineering
   6.3 Software Evolution and Versioning
   6.4 Legacy System Integration and Migration

7. Software Project Management
   7.1 Project Planning and Estimation
   7.2 Risk Management
   7.3 Agile and Scrum Methodologies
   7.4 Software Project Monitoring and Control

8. Software Engineering for Web Applications
   8.1 Web Development Technologies and Frameworks
   8.2 Client-Server Architecture and Web Services
   8.3 Web Security and Performance Optimization
   8.4 Mobile Application Development

9. Emerging Trends in Software Engineering
   9.1 Cloud Computing and Software as a Service
   9.2 DevOps and Continuous Integration/Delivery
   9.3 Artificial Intelligence and Machine Learning in Software Engineering
   9.4 Blockchain Technology and Smart Contracts

10. Software Engineering Ethics and Professionalism
   10.1 Ethical Issues in Software Engineering
   10.2 Professional Codes of Conduct and Responsibilities
   10.3 Intellectual Property and Legal Considerations
   10.4 Privacy and Data Protection in Software Engineering

Appendix A: Case Studies and Real-World Examples
Appendix B: Glossary of Software Engineering Terms
Appendix C: Recommended Reading and Resources
--
Topic: Evolution
Table of Contents:
1. Introduction to Evolution
   1.1 What is Evolution?
   1.2 The Historical Context of Evolutionary Theory
   1.3 The Scientific Method in Evolutionary Studies
   1.4 The Importance of Understanding Evolution

2. The Mechanisms of Evolution
   2.1 Natural Selection
   2.2 Genetic Drift
   2.3 Gene Flow
   2.4 Mutation
   2.5 Non-Random Mating

3. Evidence for Evolution
   3.1 Fossil Record and Transitional Forms
   3.2 Comparative Anatomy and Homologous Structures
   3.3 Embryology and Developmental Patterns
   3.4 Molecular Biology and DNA Sequencing
   3.5 Biogeography and Distribution of Species

4. Patterns of Evolution
   4.1 Divergent Evolution and Adaptive Radiation
   4.2 Convergent Evolution and Analogous Structures
   4.3 Coevolution and Mutualism
   4.4 Extinction and Mass Extinction Events
   4.5 Patterns of Speciation

5. The Evolution of Life on Earth
   5.1 The Origin of Life
   5.2 Early Evolutionary History
   5.3 Major Transitions in Evolution
   5.4 Human Evolution and Ancestry

6. Evolution and Ecology
   6.1 Evolutionary Adaptations and Ecological Niches
   6.2 Predator-Prey Interactions
   6.3 Coevolutionary Relationships
   6.4 Evolutionary Responses to Environmental Changes

7. Evolutionary Genetics
   7.1 Population Genetics
   7.2 Evolutionary Forces and Genetic Variation
   7.3 Molecular Evolution and Genetic Divergence
   7.4 Evolutionary Developmental Biology (Evo-Devo)

8. Evolutionary Perspectives in Science and Society
   8.1 Evolution and Medicine
   8.2 Evolutionary Psychology
   8.3 Evolution and Conservation Biology
   8.4 Evolutionary Ethics and Societal Implications

9. The Future of Evolutionary Studies
   9.1 Current Research and Frontiers in Evolutionary Biology
   9.2 The Role of Technology in Advancing Evolutionary Studies
   9.3 Ethical Considerations in Evolutionary Research

Appendix A: Glossary of Key Terms
Appendix B: Biographies of Notable Evolutionary Scientists
Appendix C: Recommended Reading and Resources
--
Topic:"""

# topics
topics = [
    "Artificial Intelligence in Healthcare",
    "Sustainable Energy Technologies",
    "Future of Space Exploration",
    # "Impact of Social Media on Society",
]

# The Generate endpoint to generate text. Model settings.
def generate_text(base_prompt, current_prompt):
  response = co.generate(
      model='base',
      prompt=base_prompt + current_prompt,
      max_tokens=500,
      temperature=0.4,
      stop_sequences=["--"])
  generation = response.generations[0].text

  return generation


for topic in topics:
  current_prompt = " " + topic + "\n" + "Table of Contents:"
  para = generate_text(base_prompt, current_prompt)
  para = para.strip().replace("--","")
  paragraphs.append(para)


for topic, para in zip(topics, paragraphs):
  print(f"Topic: {topic}")
  print(f"Table of Contents: \n {para}")
  print("-"*100)


Topic: Artificial Intelligence in Healthcare
Table of Contents: 
 1. Introduction to Artificial Intelligence in Healthcare
   1.1 Artificial Intelligence and Healthcare
   1.2 Historical Development of AI in Healthcare
   1.3 AI in Healthcare Today
   1.4 AI in Healthcare of the Future

2. AI in Diagnosis and Treatment
   2.1 Diagnosis and Treatment with AI
   2.2 AI in Medical Imaging
   2.3 AI in Medical Robots
   2.4 AI in Medical Devices
   2.5 AI in Medical Informatics
   2.6 AI in Medical Decision Support Systems
   2.7 AI in Medical Research
   2.8 AI in Medical Education
   2.9 AI in Medical Care Delivery

3. AI in Disease Prevention and Health Promotion
   3.1 AI in Disease Prevention
   3.2 AI in Health Promotion

4. AI in Healthcare Administration
   4.1 AI in Healthcare Administration
   4.2 AI in Healthcare Financing
   4.3 AI in Healthcare Policy

5. AI in Healthcare Ethics
   5.1 AI in Healthcare Ethics
   5.2 AI in Healthcare Law

6. AI in Healthcare Education
   6.1 AI

# 2. Classifying Text (Sentiment Analysis)

In [None]:
# Create the training examples for the classifier
from cohere.responses.classify import Example

examples = [Example("I’m so proud of you", "positive"),
            Example("What a great time to be alive", "positive"),
            Example("That’s awesome work", "positive"),
            Example("The service was amazing", "positive"),
            Example("I love my family", "positive"),
            Example("They don't care about me", "negative"),
            Example("I hate this place", "negative"),
            Example("The most ridiculous thing I've ever heard", "negative"),
            Example("I am really frustrated", "negative"),
            Example("This is so unfair", "negative"),
            Example("This made me think", "neutral"),
            Example("The good old days", "neutral"),
            Example("What's the difference", "neutral"),
            Example("You can't ignore this", "neutral"),
            Example("That's how I see it", "neutral")
            ]

In [None]:
# Enter the inputs to be classified
inputs = [
    "I absolutely loved the movie!",
    "The food at that restaurant was terrible.",
    "I'm feeling indifferent about the weather today.",
    "The customer service was excellent.",
    "The traffic jam this morning ruined my day.",
    "I have mixed feelings about the new software update.",
    "The performance of the team was outstanding.",
    "The hotel room was disappointing and unclean.",
    "I don't have any strong opinions about the book.",
    "The concert was a complete letdown.",
]

In [None]:
# A function that classifies a list of inputs given the examples
def classify_text(inputs, examples):
  response = co.classify(
    model='embed-english-v2.0',
    inputs=inputs,
    examples=examples)

  classifications = response.classifications

  return classifications

In [None]:
# Classify the inputs
predictions = classify_text(inputs,examples)
# print(f"Predictions: {predictions}")

# Display the classification outcomes
# classes = ["positive","negative","neutral"]
for inp,pred in zip(inputs,predictions):
  # print(pred)
  class_pred = pred.prediction
  # class_idx = classes.index(class_pred)
  class_conf = pred.confidence

  print(f"Input: {inp}")
  print(f"Prediction: {class_pred}")
  print(f"Confidence: {class_conf:.3f}")
  print("-"*100)

Input: I absolutely loved the movie!
Prediction: positive
Confidence: 0.927
----------------------------------------------------------------------------------------------------
Input: The food at that restaurant was terrible.
Prediction: negative
Confidence: 0.988
----------------------------------------------------------------------------------------------------
Input: I'm feeling indifferent about the weather today.
Prediction: neutral
Confidence: 0.739
----------------------------------------------------------------------------------------------------
Input: The customer service was excellent.
Prediction: positive
Confidence: 0.998
----------------------------------------------------------------------------------------------------
Input: The traffic jam this morning ruined my day.
Prediction: negative
Confidence: 0.943
----------------------------------------------------------------------------------------------------
Input: I have mixed feelings about the new software update.
Predi

# 3. Analyzing Text

In [9]:
# Get a list of text and add to a dataframe
df = pd.read_csv("https://github.com/cohere-ai/notebooks/raw/main/notebooks/data/hello-world-kw.csv", names=["search_term"])
df.head()

Unnamed: 0,search_term
0,how to print hello world in python
1,what is hello world
2,how do you write hello world in an alert box
3,how to print hello world in java
4,how to write hello world in eclipse


In [10]:
# Using Embed endpoint
def embed_text(texts):
  output = co.embed(
      model="embed-english-v2.0",
      texts=texts
  )
  embedding = output.embeddings
  return embedding

In [54]:
# Embeddings of all search terms
df["search_term_embeds"] = embed_text(df["search_term"].tolist())
# print(df["search_term_embeds"])
# print('\n')
embeds = np.array(df["search_term_embeds"].tolist())
print(embeds)


[[ 1.9628906   0.1315918   2.1230469  ... -0.29833984 -1.3574219
   2.3710938 ]
 [-0.04370117 -0.47460938  0.98339844 ... -0.40234375 -1.8457031
   2.9746094 ]
 [ 1.4082031   0.4038086  -0.01279449 ... -0.34277344 -1.5292969
   1.8701172 ]
 ...
 [ 2.5917969  -0.05551147  1.7724609  ... -0.20983887 -1.7636719
   2.2597656 ]
 [ 1.6191406  -0.56591797  1.4794922  ... -0.07458496 -1.9589844
   1.1845703 ]
 [ 1.8681641  -0.47998047  1.7529297  ... -1.5009766  -2.6347656
   1.5849609 ]]


## Semantic Search

In [66]:
# Embeddings of new_query
new_query= "What is the purpose of hello world?"
new_query_embeds = embed_text([new_query])[0]
# print(new_query_embeds)

[-0.10211182, -0.45166016, 2.3144531, -0.15966797, 0.04675293, -0.66796875, -2.1074219, 1.21875, 1.2900391, -0.04638672, 1.5068359, -1.7753906, 2.0664062, 0.68652344, 0.796875, 1.4580078, 2.078125, 0.008880615, -1.5166016, 0.13549805, 1.6289062, 0.7495117, 1.7109375, -2.5605469, 0.039367676, 2.1074219, 0.34570312, -1.5429688, 0.0078125, -1.0644531, 1.0615234, 1.1113281, 2.0546875, 0.8544922, 0.27978516, -0.19238281, -0.4411621, 1.8466797, 0.8330078, 0.071777344, -1.4921875, 1.4394531, -1.3759766, 0.40844727, -1.6787109, -0.4963379, -2.8203125, 0.11071777, 0.4597168, -1.3164062, 0.8666992, -1.5615234, -0.32592773, 0.98095703, 0.1385498, -0.23999023, -1.0917969, -2.6582031, 1.3964844, -0.7421875, -0.035888672, 1.8056641, -0.21801758, -1.0800781, 0.7216797, 1.0859375, -0.2043457, 0.20507812, -1.9902344, -1.1347656, 0.8046875, -1.2099609, 1.1728516, -2.640625, 1.2373047, -0.07208252, -1.2294922, 0.6503906, -1.28125, -0.296875, -1.5976562, -0.30615234, -1.4306641, -1.734375, -0.24060059, -0

In [34]:
# Cosine similarity

from sklearn.metrics.pairwise import cosine_similarity

def get_similarity(target, candidates):
  candidates = np.array(candidates)
  target = np.expand_dims(np.array(target),axis=0)

  sim = cosine_similarity(target, candidates)
  sim = np.squeeze(sim).tolist()

  sim = list(enumerate(sim))
  sim = sorted(sim, key=lambda x:x[1], reverse=True)

  return sim

In [67]:
# Example: Similarity between existing and new queries. Displays top 5 matches.

similarity = get_similarity(new_query_embeds, embeds)

# Display the top 5 FAQs
print("New query:")
print(new_query,'\n')

print("Similar queries:")
for idx,score in similarity[:5]:
  print(f"Similarity: {score:.2f};", df.iloc[idx]["search_term"])

New query:
What is the purpose of hello world? 

Similar queries:
Similarity: 0.76; what is hello world
Similarity: 0.72; why hello world
Similarity: 0.65; how to do hello world
Similarity: 0.64; how did hello world originate
Similarity: 0.63; how to write hello world


## Semantic Exploration