## Setup 

This section is for setting up the python environment. 

It loads a few python libraries we'll be using.

You can just run it and move on! 

In [40]:
# Pandas and NumPy settings for data analysis
import pandas as pd
import numpy as np

# Always show all dataframe columns
pd.set_option('display.max_columns', None)

# Some libraries for better displaying in Jupyter
from IPython.display import display, HTML

# TQDM for progress bars
from tqdm.notebook import tqdm
tqdm.pandas()

In [41]:
import os
import openai
import dotenv
dotenv.load_dotenv()

openai.organization = None
openai.api_key = os.getenv("OPENAI_API_KEY")
# openai.Model.list() # see all openai models

## Load Data

In this section I show the ouptput of some data we grabbed from MediaCloud.

In [42]:
 # Load bill summaries for Maryland bills
stories = pd.read_csv('stories_df.csv')

# Display the stories
stories

Unnamed: 0,title,publication_date,capture_time,language,domain,url,original_capture_url,archive_playback_url,article_url,snippet
0,Tom Sandoval invoked George Floyd when talking...,2024-03-01,2024-03-03T13:50:42Z,en,latimes.com,https://www.latimes.com/entertainment-arts/tv/...,https://web.archive.org/web/20240303135042id_/...,https://web.archive.org/web/20240303135042/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Tom Sandoval invoked George Floyd when talking...
1,Water levels plunge at California's biggest re...,2024-03-01,2024-03-02T21:06:48Z,en,newsweek.com,https://www.newsweek.com/water-levels-plunge-c...,https://web.archive.org/web/20240302210648id_/...,https://web.archive.org/web/20240302210648/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Water levels have plunged by nearly 7 feet at ...
2,Why U.S. Troops Should Remain in Syria,2024-03-01,2024-03-02T22:53:20Z,en,newsweek.com,https://www.newsweek.com/why-us-troops-should-...,https://web.archive.org/web/20240302225320id_/...,https://web.archive.org/web/20240302225320/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Few people today recognize the name of Alois B...
3,Winter storm warning issued for 8 states as ex...,2024-03-01,2024-03-02T08:31:19Z,en,newsweek.com,https://www.newsweek.com/winter-storm-warning-...,https://web.archive.org/web/20240302083119id_/...,https://web.archive.org/web/20240302083119/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Winter storm warnings are in place for parts o...
4,Woman lets fiancé help with wedding invitation...,2024-03-01,2024-03-02T06:48:27Z,en,newsweek.com,https://www.newsweek.com/woman-fiance-help-wed...,https://web.archive.org/web/20240302064827id_/...,https://web.archive.org/web/20240302064827/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In the world of wedding planning, there are bo..."
...,...,...,...,...,...,...,...,...,...,...
9080,We found the best deals on any version of Appl...,2024-02-26,2024-02-27T03:53:43Z,en,cbsnews.com,https://www.cbsnews.com/essentials/best-apple-...,https://web.archive.org/web/20240227035343id_/...,https://web.archive.org/web/20240227035343/htt...,https://wayback-api.archive.org/colsearch/v1/m...,We found the best deals on any version of Appl...
9081,Donald Trump accused of stealing music,2024-02-26,2024-02-27T01:46:28Z,en,newsweek.com,https://www.newsweek.com/donald-trump-accused-...,https://web.archive.org/web/20240227014628id_/...,https://web.archive.org/web/20240227014628/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Donald Trump has once again been accused of st...
9082,Better sleep is yours with these top mattresse...,2024-02-26,2024-02-27T04:58:13Z,en,cbsnews.com,https://www.cbsnews.com/essentials/best-mattre...,https://web.archive.org/web/20240227045813id_/...,https://web.archive.org/web/20240227045813/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Better sleep is yours with these top mattresse...
9083,The 5 best spring cleaning deals on robot vacu...,2024-02-26,2024-02-27T06:13:41Z,en,cbsnews.com,https://www.cbsnews.com/essentials/best-robot-...,https://web.archive.org/web/20240227061341id_/...,https://web.archive.org/web/20240227061341/htt...,https://wayback-api.archive.org/colsearch/v1/m...,The 5 best spring cleaning deals on robot vacu...


# Keywords

This section extracts some keywords from the bills using a library called `yake` (Yet Another Keyword Extractor)

It also uses `pandarallel` to parallelize this process and get it done faster!

In [43]:
from yake import KeywordExtractor
from pandarallel import pandarallel

# Define a function to extract ketwords from a text
kw_extractor = KeywordExtractor()
def get_keywords(text):
    keywords = kw_extractor.extract_keywords(text)
    return [x for x,y in keywords]

# Run the function in parallel to speed things up
pandarallel.initialize(progress_bar=True)
stories['keywords'] = stories['snippet'].parallel_apply(get_keywords)

# display the stories with the keywords attatched
stories

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.


VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=1136), Label(value='0 / 1136'))), …

Unnamed: 0,title,publication_date,capture_time,language,domain,url,original_capture_url,archive_playback_url,article_url,snippet,keywords
0,Tom Sandoval invoked George Floyd when talking...,2024-03-01,2024-03-03T13:50:42Z,en,latimes.com,https://www.latimes.com/entertainment-arts/tv/...,https://web.archive.org/web/20240303135042id_/...,https://web.archive.org/web/20240303135042/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Tom Sandoval invoked George Floyd when talking...,"[Sandoval invoked George, invoked George Floyd..."
1,Water levels plunge at California's biggest re...,2024-03-01,2024-03-02T21:06:48Z,en,newsweek.com,https://www.newsweek.com/water-levels-plunge-c...,https://web.archive.org/web/20240302210648id_/...,https://web.archive.org/web/20240302210648/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Water levels have plunged by nearly 7 feet at ...,"[Lake Shasta, Lake Shasta water, Lake, Shasta,..."
2,Why U.S. Troops Should Remain in Syria,2024-03-01,2024-03-02T22:53:20Z,en,newsweek.com,https://www.newsweek.com/why-us-troops-should-...,https://web.archive.org/web/20240302225320id_/...,https://web.archive.org/web/20240302225320/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Few people today recognize the name of Alois B...,"[Alois Brunner, Syria, official Adolph Eichman..."
3,Winter storm warning issued for 8 states as ex...,2024-03-01,2024-03-02T08:31:19Z,en,newsweek.com,https://www.newsweek.com/winter-storm-warning-...,https://web.archive.org/web/20240302083119id_/...,https://web.archive.org/web/20240302083119/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Winter storm warnings are in place for parts o...,"[National Weather Service, snow, Sierra Nevada..."
4,Woman lets fiancé help with wedding invitation...,2024-03-01,2024-03-02T06:48:27Z,en,newsweek.com,https://www.newsweek.com/woman-fiance-help-wed...,https://web.archive.org/web/20240302064827id_/...,https://web.archive.org/web/20240302064827/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In the world of wedding planning, there are bo...","[wedding planning, wedding, Brooke, Brooke Pie..."
...,...,...,...,...,...,...,...,...,...,...,...
9080,We found the best deals on any version of Appl...,2024-02-26,2024-02-27T03:53:43Z,en,cbsnews.com,https://www.cbsnews.com/essentials/best-apple-...,https://web.archive.org/web/20240227035343id_/...,https://web.archive.org/web/20240227035343/htt...,https://wayback-api.archive.org/colsearch/v1/m...,We found the best deals on any version of Appl...,"[Apple AirPods, Apple, Apple AirPods Pro, AirP..."
9081,Donald Trump accused of stealing music,2024-02-26,2024-02-27T01:46:28Z,en,newsweek.com,https://www.newsweek.com/donald-trump-accused-...,https://web.archive.org/web/20240227014628id_/...,https://web.archive.org/web/20240227014628/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Donald Trump has once again been accused of st...,"[Trump, White House leader, Donald Trump, News..."
9082,Better sleep is yours with these top mattresse...,2024-02-26,2024-02-27T04:58:13Z,en,cbsnews.com,https://www.cbsnews.com/essentials/best-mattre...,https://web.archive.org/web/20240227045813id_/...,https://web.archive.org/web/20240227045813/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Better sleep is yours with these top mattresse...,"[mattress, sleep, sleepers, mattresses, memory..."
9083,The 5 best spring cleaning deals on robot vacu...,2024-02-26,2024-02-27T06:13:41Z,en,cbsnews.com,https://www.cbsnews.com/essentials/best-robot-...,https://web.archive.org/web/20240227061341id_/...,https://web.archive.org/web/20240227061341/htt...,https://wayback-api.archive.org/colsearch/v1/m...,The 5 best spring cleaning deals on robot vacu...,"[robot vacuum, robot, vacuum, robot vacuum dea..."


## Remove documents that are too big

OpenAI models have a context window of ~8000 tokens. 

https://platform.openai.com/docs/guides/embeddings/embedding-models

So we'll remove any documents that are longer than that for now.

We can use `tiktoken` to [count the tokens](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb) in each document.

In [44]:
# Import modules
import tiktoken
from openai import OpenAI
client = OpenAI()

# Set embedding model parameters
embedding_model = "text-embedding-3-small" # this is the model we will use to make embeddings
embedding_encoding = "cl100k_base"  # this the encoding for text-embedding-ada-002
max_tokens = 8000  # the maximum for text-embedding-ada-002 is 8191

# Get the encoding for the specified model
encoding = tiktoken.get_encoding(embedding_encoding)

# Make a new column with the combined title and summary
stories["combined"] = (
    "Title: " + stories.title.str.strip() + "; Content: " + stories.snippet.str.strip()
)

# Make a new column with the number of tokens in the combined title and summary
stories["n_tokens"] = stories.combined.apply(lambda x: len(encoding.encode(x)))

# Sort by that column
stories = stories.sort_values(by='n_tokens', ascending=False)

# Display the stories
stories


Unnamed: 0,title,publication_date,capture_time,language,domain,url,original_capture_url,archive_playback_url,article_url,snippet,keywords,combined,n_tokens
1504,"Gemini’s Culture War, Kara Swisher Burns Us an...",2024-03-01,2024-03-04T03:17:00Z,en,nytimes.com,https://www.nytimes.com/2024/03/01/podcasts/ha...,https://web.archive.org/web/20240304031700id_/...,https://web.archive.org/web/20240304031700/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"transcript\nGemini’s Culture War, Kara Swisher...","[kevin roose, casey newton, Kara Swisher, Kevi...","Title: Gemini’s Culture War, Kara Swisher Burn...",23245
2287,A look at the politics behind Trump and Biden’...,2024-02-29,2024-03-02T05:16:54Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302051654id_/...,https://web.archive.org/web/20240302051654/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Biden, Eagle Pass, Trump, Bo...",Title: A look at the politics behind Trump and...,14287
5785,The Tip Sheet: How Biden Can Flip the Script A...,2024-02-27,2024-03-05T04:31:24Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/27/opinio...,https://web.archive.org/web/20240305043124id_/...,https://web.archive.org/web/20240305043124/htt...,https://wayback-api.archive.org/colsearch/v1/m...,The PointConversations and insights about the ...,"[Trump, Donald Trump, n’t, Biden, President Bi...",Title: The Tip Sheet: How Biden Can Flip the S...,13607
3275,49 Items That'll Get Name Brand Results At A L...,2024-02-29,2024-03-03T02:37:39Z,en,buzzfeed.com,https://www.buzzfeed.com/nataliebrown/products...,https://web.archive.org/web/20240303023739id_/...,https://web.archive.org/web/20240303023739/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"1. Neutrogena Norwegian Formula Hand Cream, wh...","[Promising reviews, Amazon, Promising, love, r...",Title: 49 Items That'll Get Name Brand Results...,13170
2354,“Because of Joe Biden’s policies — and the mor...,2024-02-29,2024-03-02T15:27:17Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302152717id_/...,https://web.archive.org/web/20240302152717/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Eagle Pass, Biden, Border, T...",Title: “Because of Joe Biden’s policies — and ...,13079
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3009,Lancaster Roasting Company,2024-02-29,2024-03-03T00:55:05Z,en,cbsnews.com,https://www.cbsnews.com/losangeles/video/lanca...,https://web.archive.org/web/20240303005505id_/...,https://web.archive.org/web/20240303005505/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Lancaster Roasting Company | People Making A D...,"[Lancaster Roasting Company, Roasting Company,...",Title: Lancaster Roasting Company; Content: La...,58
3101,Florida grandmother accused of kidnapping sent...,2024-02-29,2024-03-03T08:11:35Z,en,cbsnews.com,https://www.cbsnews.com/miami/video/florida-gr...,https://web.archive.org/web/20240303081135id_/...,https://web.archive.org/web/20240303081135/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Florida grandmother accused of kidnapping sent...,"[Florida grandmother accused, Florida grandmot...",Title: Florida grandmother accused of kidnappi...,58
5356,Lightning bolt flashes over downtown Pittsburgh,2024-02-28,2024-03-01T17:48:09Z,en,cbsnews.com,https://www.cbsnews.com/pittsburgh/video/light...,https://web.archive.org/web/20240301174809id_/...,https://web.archive.org/web/20240301174809/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Lightning bolt flashes over downtown Pittsburg...,"[downtown Pittsburgh, Pittsburgh, Tower Cam ca...",Title: Lightning bolt flashes over downtown Pi...,57
6656,Learning more about Pittsburgh's Citizen Scien...,2024-02-27,2024-02-29T05:29:51Z,en,cbsnews.com,https://www.cbsnews.com/pittsburgh/video/learn...,https://web.archive.org/web/20240229052951id_/...,https://web.archive.org/web/20240229052951/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Learning more about Pittsburgh's Citizen Scien...,"[Citizen Science Lab, Pittsburgh Citizen Scien...",Title: Learning more about Pittsburgh's Citize...,56


In [45]:
# Grab the rows where the text is too big for the context window of the mmodel (>8000 tokens)
too_long = stories.query("n_tokens > @max_tokens") 

# Print how many will be removed
print(f"Removing {len(too_long)} stories that are too long")

# Display the removed stories here in this cell so we can see what we're losing
display(too_long)  

# Remove the rows where the text is too big for the context window of the model
stories = stories.query("n_tokens <= @max_tokens")  

Removing 43 stories that are too long


Unnamed: 0,title,publication_date,capture_time,language,domain,url,original_capture_url,archive_playback_url,article_url,snippet,keywords,combined,n_tokens
1504,"Gemini’s Culture War, Kara Swisher Burns Us an...",2024-03-01,2024-03-04T03:17:00Z,en,nytimes.com,https://www.nytimes.com/2024/03/01/podcasts/ha...,https://web.archive.org/web/20240304031700id_/...,https://web.archive.org/web/20240304031700/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"transcript\nGemini’s Culture War, Kara Swisher...","[kevin roose, casey newton, Kara Swisher, Kevi...","Title: Gemini’s Culture War, Kara Swisher Burn...",23245
2287,A look at the politics behind Trump and Biden’...,2024-02-29,2024-03-02T05:16:54Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302051654id_/...,https://web.archive.org/web/20240302051654/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Biden, Eagle Pass, Trump, Bo...",Title: A look at the politics behind Trump and...,14287
5785,The Tip Sheet: How Biden Can Flip the Script A...,2024-02-27,2024-03-05T04:31:24Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/27/opinio...,https://web.archive.org/web/20240305043124id_/...,https://web.archive.org/web/20240305043124/htt...,https://wayback-api.archive.org/colsearch/v1/m...,The PointConversations and insights about the ...,"[Trump, Donald Trump, n’t, Biden, President Bi...",Title: The Tip Sheet: How Biden Can Flip the S...,13607
3275,49 Items That'll Get Name Brand Results At A L...,2024-02-29,2024-03-03T02:37:39Z,en,buzzfeed.com,https://www.buzzfeed.com/nataliebrown/products...,https://web.archive.org/web/20240303023739id_/...,https://web.archive.org/web/20240303023739/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"1. Neutrogena Norwegian Formula Hand Cream, wh...","[Promising reviews, Amazon, Promising, love, r...",Title: 49 Items That'll Get Name Brand Results...,13170
2354,“Because of Joe Biden’s policies — and the mor...,2024-02-29,2024-03-02T15:27:17Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302152717id_/...,https://web.archive.org/web/20240302152717/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Eagle Pass, Biden, Border, T...",Title: “Because of Joe Biden’s policies — and ...,13079
2832,“Now the United States is being overrun by the...,2024-02-29,2024-03-02T19:15:02Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302191502id_/...,https://web.archive.org/web/20240302191502/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Eagle Pass, Biden, Border, T...",Title: “Now the United States is being overrun...,12896
2838,"‘We built 571 miles of border wall, much more ...",2024-02-29,2024-03-02T20:12:38Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302201238id_/...,https://web.archive.org/web/20240302201238/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Eagle Pass, Biden, Border, T...","Title: ‘We built 571 miles of border wall, muc...",12889
3278,2024 Netflix TV Shows You Must Watch,2024-02-29,2024-03-01T08:42:10Z,en,buzzfeed.com,https://www.buzzfeed.com/mychalthompson/2024-n...,https://web.archive.org/web/20240301084210id_/...,https://web.archive.org/web/20240301084210/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Trending badgeTrendingTV and MoviesÂ·Posted 5 ...,"[Netflix revealed, Netflix, Courtesy Everett C...",Title: 2024 Netflix TV Shows You Must Watch; C...,12887
2327,An uptick of arrivals in Brownsville reinforce...,2024-02-29,2024-03-02T15:47:19Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302154719id_/...,https://web.archive.org/web/20240302154719/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Eagle Pass, Biden, Border, T...",Title: An uptick of arrivals in Brownsville re...,12885
2834,Visiting the U.S. border has become a potent f...,2024-02-29,2024-03-02T10:05:15Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/29/us/bid...,https://web.archive.org/web/20240302100515id_/...,https://web.archive.org/web/20240302100515/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"In Dual Border Visits, Biden and Trump Try to ...","[President Biden, Eagle Pass, Biden, Border, T...",Title: Visiting the U.S. border has become a p...,12885


## Embeddings

Now we take the "combined" column, which contains the 

In [46]:
from openai import OpenAI
client = OpenAI()

def get_embeddings(texts, model="text-embedding-3-small"):
    # Replace newlines in each text and ensure it's a list of texts
    texts = [text.replace("\n", " ") for text in texts]
    # OpenAI's embeddings.create can process multiple inputs as a list
    response = client.embeddings.create(input=texts, model=model)
    # Extract embeddings from the response
    embeddings = [item.embedding for item in response.data]
    return embeddings

# Function to process DataFrame in batches and return a list of embeddings
def process_in_batches(df, column_name, batch_size=10):
    # Break the DataFrame into batches of size `batch_size`
    batches = [df[column_name].iloc[i:i + batch_size] for i in range(0, len(df), batch_size)]
    # Process each batch and collect embeddings
    all_embeddings = []
    for batch in tqdm(batches, desc="Processing batches"):
        batch_embeddings = get_embeddings(batch.tolist())
        all_embeddings.extend(batch_embeddings)
    return all_embeddings

# Example usage
batch_size = 100  # Adjust based on your preference and rate limits
stories['embedding'] = process_in_batches(stories, 'combined', batch_size=batch_size)


Processing batches:   0%|          | 0/91 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  stories['embedding'] = process_in_batches(stories, 'combined', batch_size=batch_size)


In [47]:
# drop combined column since those were only for the purposes of making the embeddings
stories = stories.drop(columns=['combined', 'n_tokens'])

## Dimensionality Reduction (t-SNE)

The embedding is a vector in 1536 dimensions. Viewing data in that many dimensions would break your brain. 🤯

Our brains can only handle 2 or 3 dimensions at a time. We'll use t-SNE to reduce the number of dimensions, flattening the multi-dimensional space into 2 dimensions.

> Here are some things to keep in mind about t-SNE if you use it in the future. You may have to tweak some parameters to fit the needs of your data.
>
> Blog Post: [How to Use t-SNE Effectively](https://distill.pub/2016/misread-tsne/)



In [48]:
# find the one where bill.embedding is nan
stories[stories.embedding.isna()]

Unnamed: 0,title,publication_date,capture_time,language,domain,url,original_capture_url,archive_playback_url,article_url,snippet,keywords,embedding


In [49]:
# remove where embedding is na 
stories = stories.dropna(subset=['embedding'])

In [50]:
from sklearn.manifold import TSNE
import numpy as np


# Convert to a list of lists of floats
matrix = np.array(stories.embedding.to_list())

# Create a t-SNE model and transform the data
tsne = TSNE(n_components=2, perplexity=30, random_state=42, init='random', learning_rate=400)
vis_dims = tsne.fit_transform(matrix)

# add to dataframe and write to csv
stories = stories\
    .assign(
        x = vis_dims[:,0], 
        y = vis_dims[:,1])


In [51]:
# Write the data to a CSV file
# stories.to_csv('../stories-with-embeddings.csv', index=False)
stories.drop(columns=['snippet', 'archive_playback_url', 'original_capture_url', 'embedding']).to_csv('../stories-no-snippet.csv', index=False)

# Display the bills
stories.head()

Unnamed: 0,title,publication_date,capture_time,language,domain,url,original_capture_url,archive_playback_url,article_url,snippet,keywords,embedding,x,y
2095,Uncovering the higher truth of Jay Shetty,2024-02-29,2024-03-01T06:09:03Z,en,theguardian.com,https://www.theguardian.com/lifeandstyle/ng-in...,https://web.archive.org/web/20240301060903id_/...,https://web.archive.org/web/20240301060903/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"On 20 August 2022, Jennifer Lopez and Ben Affl...","[Jay Shetty Certification, Shetty Certificatio...","[-0.006095091812312603, 0.0003404601593501866,...",-24.708408,-41.372944
5783,Biden will meet next month with the Teamsters ...,2024-02-27,2024-02-28T02:38:34Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/27/us/tru...,https://web.archive.org/web/20240228023834id_/...,https://web.archive.org/web/20240228023834/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Michigan Primary Live Updates: Biden Confronts...,"[Donald Trump, President Biden, Trump, Biden, ...","[0.016431955620646477, 0.012380499392747879, 0...",-72.246902,22.965057
4381,Three Powerful Lessons About Love,2024-02-28,2024-03-01T21:54:04Z,en,nytimes.com,https://www.nytimes.com/2024/02/28/podcasts/mo...,https://web.archive.org/web/20240301215404id_/...,https://web.archive.org/web/20240301215404/htt...,https://wayback-api.archive.org/colsearch/v1/m...,transcript\nThree Powerful Lessons About Love\...,"[Daniel Jones, Modern Love, anna martin, Love,...","[0.0265309140086174, 0.016592005267739296, -0....",-8.077727,-35.600754
4742,22 Creepy Real-Life Stories February 2024,2024-02-28,2024-03-01T09:53:11Z,en,buzzfeed.com,https://www.buzzfeed.com/angelicaamartinez/cre...,https://web.archive.org/web/20240301095311id_/...,https://web.archive.org/web/20240301095311/htt...,https://wayback-api.archive.org/colsearch/v1/m...,"Every month, I ask BuzzFeed readers like you t...","[back, room, night, looked, house, phone, âAno...","[0.007826545275747776, 4.250243728165515e-05, ...",0.78314,-31.468397
8644,"On the eve of the Michigan primary, a determin...",2024-02-26,2024-02-27T11:09:37Z,en,nytimes.com,https://www.nytimes.com/live/2024/02/26/us/tru...,https://web.archive.org/web/20240227110937id_/...,https://web.archive.org/web/20240227110937/htt...,https://wayback-api.archive.org/colsearch/v1/m...,Election Updates: Trump and Biden will both vi...,"[Trump, Donald Trump, President Biden, Biden, ...","[0.03690981864929199, -0.0016414514975622296, ...",-71.33371,23.542429
