In [11]:
import pandas as pd
import numpy as np
import os
from dotenv import load_dotenv


from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
from langchain.vectorstores import FAISS, Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain.document_loaders import DataFrameLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain import hub



from langchain_core.runnables import (
    ConfigurableField,
    RunnableBinding,
    RunnableLambda,
    RunnablePassthrough,
    RunnableParallel
)

pd.set_option('display.max_colwidth', 300)  # Or use a large number if 'None' does not work in some environments
pd.set_option('display.max_columns', 300)  # Show all columns
pd.set_option('display.max_rows', 20)  # Show all row

load_dotenv()
OPENAI_APIKEY = os.environ['OPENAI_APIKEY']

# Instantiate Models

In [12]:
embeddings_model = OpenAIEmbeddings(api_key=OPENAI_APIKEY, model='text-embedding-3-large', max_retries=100, chunk_size=16, show_progress_bar=False)


# Load Data

In [17]:
df_transcripts = pd.read_csv('../data/transcripts_all_2024-04-10_cleaned', usecols=['videoId', 'title', 'description', 'publishedAt', 'date', 'year', 'transcript'])
df_transcripts  

Unnamed: 0,videoId,title,description,publishedAt,date,year,transcript
0,r3PuCQ8CxTc,Benefits and Side Effects of the Pneumonia Vaccine,"Randomized controlled trials have found that pneumonia vaccines significantly reduce the risk of pneumococcal pneumonia in people 65 and older.\n\nIf you missed it, check out the previous video: Benefits and Side Effects of the Flu Vaccine (https://nutritionfacts.org/video/benefits-and-side-effe...",2024-04-10T11:59:52Z,2024-04-10,2024,"""Benefits and Side Effects of the Pneumonia Vaccine"" Vaccines are considered one of the greatest public health achievements of the last century, having eradicated smallpox, a scourge that killed hundreds of millions of people, and greatly reducing other major diseases, such as measles and polio...."
1,oa6UtySJKFE,Benefits and Side Effects of the Flu Vaccine,"Flu shots can prevent more than just the flu. Randomized placebo-controlled trials show that they can be extraordinary lifesavers.\n\nStay tuned for the next two videos, which examine the benefits and side effects of the pneumonia (https://nutritionfacts.org/video/benefits-and-side-effects-of-th...",2024-04-08T11:59:55Z,2024-04-08,2024,"In this 3-video series, I show the science behind the pros and cons of the flu vaccine, pneumonia vaccine, and shingles vaccine. ""Benefits and Side Effects of the Flu Vaccine"" Every year, influenza typically kills between 4,000 and 20,000 Americans, though the death toll for the 2017 to 2018 sea..."
2,wZLgy4dvM1Y,New Sponsorship: Big Broccoli,"Help keep us ad- and sponsorship-free by making a donation today: https://nutritionfacts.app.neoncrm.com/forms/2024-spread-the-health\n\nNutritionFacts.org is an online health and nutrition hub, and our work is focused in the digital arena where we aim to reach as many people as possible with po...",2024-04-07T15:59:56Z,2024-04-07,2024,"I’m Dr. Michael Greger and ever since I started prescribing my patients broccoli, they’ve seen incredible results… More energy. Better bowling. Better budgeting at the grocery store. More social media followers. Better dance moves. Ask your doctor if broccoli is right for you. Side effects inclu..."
3,Et0lozAIbI8,Friday Favorites: Removing Warts with Duct Tape,"Duct tape beat out cryotherapy (freezing) and ten other treatments for removal of warts in a randomized, controlled, head-to-head trial. Why isn’t it standard practice?\n\nThis includes the first and second videos in a three-part series on duct tape for wart removal. Check out the last one: Whic...",2024-04-05T11:59:52Z,2024-04-05,2024,"You can find home remedies for all sorts of ailments, but the science doesn’t always back them up. What does it have to say about using a common household product on warts? ""Removing Warts with Duct Tape"" When I was reviewing the science behind common over-the-counter remedies used in dermatolog..."
4,EoAaCdlK6CU,Podcast: Hot Flashes,Turns out that hot flashes are not inevitable. This episode features audio from:\nhttps://nutritionfacts.org/video/menopausal-hot-flashes-are-not-inevitable/\nhttps://nutritionfacts.org/video/soy-foods-for-menopause-hot-flash-symptoms/\nVisit the video pages for all sources and doctor's notes re...,2024-04-04T12:59:57Z,2024-04-04,2024,"I hear from lots of people every day who are concerned about how their diet is affecting their health. They need answers based on facts. In other words, in the peer-reviewed medical literature, and that is what I’m here for. Welcome to the Nutrition Facts Podcast. I’m your host, Dr. Michael Greg..."
...,...,...,...,...,...,...,...
2564,P_X3exQtuGA,The Healthiest Herbal Tea,New subscribers to our e-newsletter always receive a free gift. Get yours here: https://nutritionfacts.org/subscribe/\n\n\n \n\n\nDonate at \nDESCRIPTION: More than a dozen herbal teas were compared for their antioxidant activity. Have a question about this video? Leave it in the comment section...,2011-02-08T14:59:42Z,2011-02-08,2011,"""The Healthiest Herbal Tea"" Walking through the herbal tea aisle can be daunting. Thankfully, last year, this study was published, comparing the antioxidant activity of more than a dozen different types of herbal tea. They’re all good for you, so the healthiest one is probably the one you’ll..."
2565,ce-pvksbiwM,Update on Yerba Maté,New subscribers to our e-newsletter always receive a free gift. Get yours here: https://nutritionfacts.org/subscribe/\n\n\n \n\nDESCRIPTION: Why this herbal tea may increase cancer risk. Have a question about this video? Leave it in the comment section at http://nutritionfacts.org/videos/update-...,2011-02-08T14:57:18Z,2011-02-08,2011,"""Update on Yerba Maté"" And finally, what about yerba maté, a Latin American herbal tea that’s been linked to cancer? Should we continue to avoid it? Yes, an evaluation of the cytotoxic and genotoxic activity of yerba maté on human white blood cells found it caused a significant increase in th..."
2566,1Yb5MjU38ng,Update on MSG,New subscribers to our e-newsletter always receive a free gift. Get yours here: https://nutritionfacts.org/subscribe/\n\n\n \n\n\nDonate at \nDESCRIPTION: The reactions attributed to monosodium glutamate do not appear to hold up to scientific scrutiny. Have a question about this video? Leave it ...,2011-02-08T14:48:31Z,2011-02-08,2011,"""Update on MSG"" What about MSG? The scientific consensus has been that MSG is generally harmless. But what’s the latest? Monosodium glutamate allergy— menace or myth? Myth. Despite concerns raised in the 1960s, “decades of research have failed to demonstrate a clear and consistent relationship b..."
2567,lD2RzNJYGxQ,Update on Juice Plus+®,New subscribers to our e-newsletter always receive a free gift. Get yours here: https://nutritionfacts.org/subscribe/\n\n\n \n\n\nDonate at \nDESCRIPTION: More evidence linking Juice Plus+® supplements with liver inflammation. Have a question about this video? Leave it in the comment section at ...,2011-02-08T05:41:12Z,2011-02-08,2011,"""Update on Juice Plus+"" What about Juice Plus+ supplements (dubbed in the American Journal of Medicine as “toxicity plus”)? Here are indicators of liver inflammation before Juice Plus+, the huge spike, and then, coming down after stopping it."


# Vectorize and Persist ChromaDb

In [18]:
loader = DataFrameLoader(df_transcripts, 'transcript')
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

In [19]:
# save to disk
db2 = Chroma.from_documents(docs, embeddings_model, persist_directory="./chroma_db")
# docs = db2.similarity_search(query)



In [23]:
# Test it out
docs = db2.similarity_search("heart disease", k=5)
docs

[Document(page_content='"Eliminating the #1 Cause of Death" The #1 cause of death in the United  States, however, is heart disease. Late last year, a landmark review was  published on the cause of our #1 killer, by Dr. William Clifford Roberts. First of all, who is this guy? The head of Baylor\'s  Cardiovascular Institute. he\'s authored a mere  1,387 scientific publications, written more than a dozen  textbooks on cardiology, and has been the Editor in Chief of the American Journal  of Cardiology for 25 years. Well then, what is the  cause of atherosclerosis? Well, first of all, doesn\'t  he mean causes, though? I mean, there are lots of things  that increase a risk of heart disease -- hypertension, diabetes, obesity,  inactivity, cigarette smoking. None of that matters, he says,  unless we have high cholesterol. All those things can speed the  buildup of plaque in our arteries, but if our cholesterol  level is low enough, there\'s nothing our body  can build the plaque with. Accordin

In [31]:
# # load from disk
db3 = Chroma(persist_directory="./chroma_db", embedding_function=embeddings_model)
docs = db3.similarity_search("heart disease", k=5)
docs

[Document(page_content='"Eliminating the #1 Cause of Death" The #1 cause of death in the United  States, however, is heart disease. Late last year, a landmark review was  published on the cause of our #1 killer, by Dr. William Clifford Roberts. First of all, who is this guy? The head of Baylor\'s  Cardiovascular Institute. he\'s authored a mere  1,387 scientific publications, written more than a dozen  textbooks on cardiology, and has been the Editor in Chief of the American Journal  of Cardiology for 25 years. Well then, what is the  cause of atherosclerosis? Well, first of all, doesn\'t  he mean causes, though? I mean, there are lots of things  that increase a risk of heart disease -- hypertension, diabetes, obesity,  inactivity, cigarette smoking. None of that matters, he says,  unless we have high cholesterol. All those things can speed the  buildup of plaque in our arteries, but if our cholesterol  level is low enough, there\'s nothing our body  can build the plaque with. Accordin

It works!