# Welcome to the Practice Task

You have been hired by a 40-year-old news company called FF-NEWS. They have provided you with a list of their news headlines from March 2004. They are seeking 10 headlines from their newspaper specifically related to `climate change and global warming`.

As an AI Engineer, your task is to utilize the OpenAI text embedding model to identify 10 headlines about climate change and global warming from their archive.


----
Run the following block to set up the OpenAI API and import the necessary modules.

**Do not forget to upload your apikey.env file into the Google Colab environment.**

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import pandas as pd
import os
from openai import OpenAI
from dotenv import load_dotenv
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Loading API key and organization ID from a dotenv file
load_dotenv(dotenv_path='/content/drive/MyDrive/Colab Notebooks/Data Scenarios/#1: E-Commerce Product /.env')

# Retrieving API key and organization ID from environment variables
APIKEY = os.getenv("APIKEY")
ORGID = os.getenv("ORGID")

# Creating an instance of the OpenAI client with the provided API key and organization ID
client = OpenAI(
  organization= ORGID,
  api_key=APIKEY
)

client

<openai.OpenAI at 0x7fc05c3d2550>

Here is the list of news headlines

In [5]:
# DATA
headlines = [
    "Political leaders convene for summit on climate change",
    "New breakthrough in renewable energy technology announced",
    "Global warming activists rally in major cities worldwide",
    "Sports teams unite to raise awareness about climate crisis",
    "Government announces new policy to combat climate change",
    "Tech giant unveils revolutionary AI-powered gadget",
    "Climate scientists warn of irreversible damage to the planet",
    "Political turmoil grips nation as elections approach",
    "Athletes advocate for environmental conservation efforts",
    "Tech startups compete for funding in Silicon Valley",
    "Rising temperatures threaten biodiversity, experts warn",
    "Sports championship marred by controversy over doping allegations",
    "New legislation aims to reduce carbon emissions",
    "Tech company announces plans for expansion into new markets",
    "Global warming effects felt in Arctic region, scientists say",
    "Political leaders clash over proposed climate change regulations",
    "Team achieves victory in sporting event despite odds",
    "Breakthrough in renewable energy research promises brighter future",
    "Climate summit ends with pledges for carbon neutrality",
    "Athlete breaks world record in thrilling sporting event",
    "Tech industry faces criticism over data privacy concerns",
    "Government launches initiative to promote green technology",
    "Climate change impacts highlighted in new scientific report",
    "Political scandal rocks nation's capital",
    "Sports star announces retirement after illustrious career",
    "Tech company introduces innovative solution to environmental challenges",
    "Global warming exacerbates natural disasters, experts warn",
    "Political candidates debate strategies to address climate crisis",
    "Sports league adopts sustainability measures to reduce carbon footprint",
    "Tech expo showcases latest advancements in artificial intelligence",
    "Climate activists stage protest outside government buildings",
    "Political leaders vow to prioritize climate change action",
    "Athlete overcomes injury to win gold medal",
    "Tech company partners with environmental organizations for conservation projects",
    "Global warming threatens food security, scientists caution",
    "Political campaign focuses on environmental policies",
    "Sports event canceled due to extreme weather conditions",
    "Tech startup revolutionizes transportation industry with electric vehicles",
    "Climate change summit attracts attention from around the world",
    "Political unrest leads to protests in major cities",
    "Sports team celebrates victory with championship parade",
    "Tech industry grapples with cybersecurity challenges",
    "Global warming impacts discussed at international conference",
    "Political leaders face scrutiny over handling of climate crisis",
    "Athlete honored with prestigious award for sportsmanship",
    "Tech conference explores the future of artificial intelligence",
    "Climate activists demand immediate action from world leaders",
    "Political debate intensifies ahead of election day",
    "Sports fans rally behind team in championship match",
    "Tech company accused of monopolistic practices",
    "Global warming solutions proposed at climate change forum",
    "Political summit focuses on diplomatic relations",
    "Athlete achieves personal best in sporting competition",
    "Tech industry leaders testify before congressional committee",
    "Climate change effects seen in rising sea levels, researchers find",
    "Political parties clash over environmental policies",
    "Sports league implements measures to promote diversity and inclusion",
    "Tech startup secures funding for groundbreaking project",
    "Global warming awareness campaign gains momentum",
    "Political leaders spar over economic policies",
    "Athlete inspires youth through community outreach programs",
    "Tech company launches new product to improve quality of life",
    "Climate change activists call for divestment from fossil fuels",
    "Political commentator discusses implications of recent events",
    "Sports tournament draws record-breaking viewership",
    "Tech industry grapples with ethical dilemmas of AI",
    "Global warming impact on wildlife habitats documented in new study",
    "Political rally attracts thousands of supporters",
    "Athlete makes comeback after overcoming adversity",
    "Tech conference showcases cutting-edge innovations",
    "Climate change legislation faces opposition in parliament",
    "Political upheaval leads to government reshuffle",
    "Sports team embarks on goodwill tour to promote peace",
    "Tech company releases annual sustainability report",
    "Global warming awareness raised through art and music festival",
    "Political campaign enters final stretch with heated debates",
    "Athlete named ambassador for youth sports program",
    "Tech industry leaders meet to discuss future trends",
    "Climate change protesters disrupt international summit",
    "Political scandal unfolds with leaked documents",
    "Sports star launches foundation to support underprivileged youth",
    "Tech startup awarded for innovation in environmental sustainability",
    "Global warming impact on agriculture highlighted in report",
    "Political leaders negotiate international trade agreements",
    "Athlete honored with induction into Hall of Fame",
    "Tech company invests in renewable energy research",
    "Climate change task force formed to address urgent issues",
    "Political tensions escalate in region, raising concerns",
    "Sports organization partners with charity for humanitarian efforts",
    "Tech expo features demonstrations of virtual reality technology",
    "Global warming debate intensifies with new scientific findings",
    "Political campaign focuses on grassroots activism",
    "Athlete advocates for gender equality in sports",
    "Tech industry pioneers explore potential of blockchain technology",
    "Climate change summit results in historic agreement",
    "Political leaders reach compromise on controversial legislation",
    "Sports team wins championship with dramatic final play",
    "Tech company launches initiative to bridge digital divide",
    "Global warming effects observed in changing weather patterns",
    "Political candidates engage voters in town hall meetings",
    "Athlete inspires next generation through mentorship program",
    "Tech conference showcases breakthroughs in quantum computing",
    "Climate change action plan receives bipartisan support",
    "Political movement gains momentum with widespread support",
    "Sports star honored with prestigious sportsmanship award",
    "Tech industry leaders advocate for diversity and inclusion initiatives",
    "Global warming impact on coastal communities examined in documentary",
    "Political summit addresses refugee crisis and humanitarian aid",
    "Athlete donates winnings to charity for children's education",
    "Tech startup disrupts industry with innovative business model",
    "Climate change awareness campaign launches on social media",
    "Political leaders engage in diplomatic talks to promote peace",
    "Sports league implements strict anti-doping measures",
    "Tech company pledges to reduce carbon footprint with sustainability initiatives",
    "Global warming research expedition uncovers new insights",
    "Political unrest sparks protests and civil unrest",
    "Athlete breaks barriers as first in their sport",
    "Tech industry leaders collaborate on open-source projects",
    "Climate change documentary wins prestigious film award",
    "Political candidates make final push in campaign rallies",
    "Sports team celebrates victory with parade through city streets",
    "Tech conference addresses cybersecurity threats and solutions",
    "Global warming impact on indigenous communities addressed at UN summit"
]

Use `text-embedding-3-small` text embedding model to generate the embedding vectors with 256 dimensions of the headlines.

In [7]:
# I want to create a text embedding vector for each of the headlines that we have.
# basically text embedding model gets a piece of text and converted into a vector of numerical values.
# these vectors capture the contextual meaning of the text

response = client.embeddings.create(
    model = "text-embedding-3-small",
    input = headlines,
    dimensions = 256
)
response

CreateEmbeddingResponse(data=[Embedding(embedding=[-0.011477109044790268, -0.0821317508816719, 0.15208518505096436, 0.005672628525644541, -0.06218615546822548, 0.08395370841026306, -0.09517310559749603, 0.057151809334754944, 0.02670600451529026, -0.029199205338954926, 0.08107693493366241, 0.01631367765367031, -0.08572771400213242, -0.11833109706640244, -0.026873815804719925, 0.029271123930811882, -0.08486468344926834, 0.05964500829577446, 0.028983447700738907, 0.11037203669548035, -0.09953620284795761, -0.021527821198105812, 0.039363786578178406, 0.10701580345630646, -0.049192748963832855, 0.0705767348408699, -0.02900741994380951, -0.030613616108894348, 0.15122215449810028, -0.01639758236706257, 0.12753675878047943, -0.07316582649946213, -0.0010293439263477921, 0.006706467363983393, -0.02903139404952526, 0.00580747751519084, -0.04430224001407623, -0.09560462087392807, -0.0078032356686890125, 0.00146760162897408, 0.07206305861473083, -0.07211101055145264, -0.03732607513666153, 0.0428878

Extract the vector embeddings from the `response`

In [8]:
vectors = [e.embedding for e in response.data]
vectors

[[-0.011477109044790268,
  -0.0821317508816719,
  0.15208518505096436,
  0.005672628525644541,
  -0.06218615546822548,
  0.08395370841026306,
  -0.09517310559749603,
  0.057151809334754944,
  0.02670600451529026,
  -0.029199205338954926,
  0.08107693493366241,
  0.01631367765367031,
  -0.08572771400213242,
  -0.11833109706640244,
  -0.026873815804719925,
  0.029271123930811882,
  -0.08486468344926834,
  0.05964500829577446,
  0.028983447700738907,
  0.11037203669548035,
  -0.09953620284795761,
  -0.021527821198105812,
  0.039363786578178406,
  0.10701580345630646,
  -0.049192748963832855,
  0.0705767348408699,
  -0.02900741994380951,
  -0.030613616108894348,
  0.15122215449810028,
  -0.01639758236706257,
  0.12753675878047943,
  -0.07316582649946213,
  -0.0010293439263477921,
  0.006706467363983393,
  -0.02903139404952526,
  0.00580747751519084,
  -0.04430224001407623,
  -0.09560462087392807,
  -0.0078032356686890125,
  0.00146760162897408,
  0.07206305861473083,
  -0.07211101055145264

Use the embedding model to generate a 256-dimensional embedding vector related to the phrase: `"global warming and climate change"`. then extract the embedding vector from the openAI response and store it into a variable called `search_vector`

In [9]:
response = client.embeddings.create(
    model = "text-embedding-3-small",
    input = ["global warming and climate change"],
    dimensions = 256
)
search_phrase_vector = [e.embedding for e in response.data]
search_phrase_vector

[[0.01729126274585724,
  -0.05900658667087555,
  0.14737065136432648,
  0.00675610825419426,
  -0.06464477628469467,
  0.023014511913061142,
  0.042335037142038345,
  -0.09356480836868286,
  -0.04026932269334793,
  -0.06809573620557785,
  0.022431250661611557,
  -0.005777930840849876,
  -0.1502869576215744,
  -0.02695152536034584,
  -0.14970369637012482,
  0.012831744737923145,
  -0.00894941296428442,
  0.10080696642398834,
  0.023634226992726326,
  0.02087588794529438,
  -0.08068446069955826,
  -0.07183833420276642,
  0.025080230087041855,
  0.15398094058036804,
  -0.1505785882472992,
  0.05103535205125809,
  0.01152548287063837,
  0.0519588477909565,
  0.14017710089683533,
  0.12880350649356842,
  0.04950429126620293,
  -0.06770689785480499,
  -0.03577335178852081,
  -0.0010882985079661012,
  0.06372127681970596,
  0.007558092474937439,
  -0.10236233472824097,
  0.10576468706130981,
  0.04367167875170708,
  -0.038155000656843185,
  -0.05020906403660774,
  -0.10537584871053696,
  0.05

Now create a dataframe containing the news headlines and their corresponding 256-dimensional vectors. Your dataframe should have two columns: `vectors` and `headlines`.

In [10]:
df = pd.DataFrame()
df["vectors"] = vectors
df["headlines"] = headlines
df

Unnamed: 0,vectors,headlines
0,"[-0.011477109044790268, -0.0821317508816719, 0...",Political leaders convene for summit on climat...
1,"[-0.019270075485110283, -0.03727833926677704, ...",New breakthrough in renewable energy technolog...
2,"[0.11269595474004745, 0.06312204897403717, 0.1...",Global warming activists rally in major cities...
3,"[-0.035984743386507034, -0.04121216759085655, ...",Sports teams unite to raise awareness about cl...
4,"[-0.01350752916187048, -0.014983558095991611, ...",Government announces new policy to combat clim...
...,...,...
118,"[0.026382433250546455, 0.030329806730151176, -...",Climate change documentary wins prestigious fi...
119,"[0.11015244573354721, -0.005524963606148958, 0...",Political candidates make final push in campai...
120,"[0.1331469565629959, -0.0553782656788826, -0.0...",Sports team celebrates victory with parade thr...
121,"[-0.009627350606024265, -0.009440184570848942,...",Tech conference addresses cybersecurity threat...


Use the `cosine similarity` measure to calculate the similarity between the `search_phrase_vector` and each of the embedding vectors of the headlines.

Write your code in a way that includes a new column in the dataframe called `similarity score`.


In [11]:
df["similarity score"] = df.vectors.apply(lambda x: cosine_similarity([x],search_phrase_vector)[0][0])
df.head()

Unnamed: 0,vectors,headlines,similarity score
0,"[-0.011477109044790268, -0.0821317508816719, 0...",Political leaders convene for summit on climat...,0.402299
1,"[-0.019270075485110283, -0.03727833926677704, ...",New breakthrough in renewable energy technolog...,0.156692
2,"[0.11269595474004745, 0.06312204897403717, 0.1...",Global warming activists rally in major cities...,0.482658
3,"[-0.035984743386507034, -0.04121216759085655, ...",Sports teams unite to raise awareness about cl...,0.409744
4,"[-0.01350752916187048, -0.014983558095991611, ...",Government announces new policy to combat clim...,0.423715


Sort the dataframe by the `similarity score` column and find 10 headlines that are related to the search phrase.

In [13]:
search_results = list(df.sort_values(by = "similarity score",ascending=False).head(10).headlines)

for r in search_results:
  print(r)

Global warming effects observed in changing weather patterns
Global warming impacts discussed at international conference
Global warming awareness campaign gains momentum
Climate change task force formed to address urgent issues
Climate change impacts highlighted in new scientific report
Climate change summit results in historic agreement
Global warming effects felt in Arctic region, scientists say
Global warming solutions proposed at climate change forum
Climate change summit attracts attention from around the world
Global warming threatens food security, scientists caution
