# Lila Sciences Sandbox
This notebook is a sandbox for testing and RAG prompts related to a Lila Sciences job posting.

## Initial Setup

In [1]:
# Load environment variables
from dotenv import load_dotenv
import os
load_dotenv(r"C:\Users\rward\Documents\Pycharm\CoverLetterAI\api_keys.env")
os.environ["LANGSMITH_PROJECT"] = 'lila-sciences-sandbox'

In [8]:
from langchain.chat_models import init_chat_model
from langchain_cohere import CohereEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

llm = init_chat_model("command-r-plus", model_provider="cohere")
embeddings = CohereEmbeddings(model="embed-english-v3.0")

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)

### Load Job Posting

In [10]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("job__title", "job__description body"))
loader = WebBaseLoader(
    web_paths=("https://job-boards.greenhouse.io/lila/jobs/7769327002",),
    bs_kwargs={"parse_only": bs4_strainer},
)
posting_doc = loader.load()
posting_vector_store = InMemoryVectorStore(embeddings)
posting_ids = posting_vector_store.add_documents(documents=text_splitter.split_documents(posting_doc))

### Load Info about Lila Sciences

In [11]:
loader = WebBaseLoader(
    web_paths=(
        "https://www.lila.ai/about",
        "https://www.bostonglobe.com/2025/01/14/business/ai-moderna-cofounder-noubar-afeyan-flagship/?event=event12",
        "https://endpts.com/pharma-is-losing-the-ai-talent-war-against-startups-analysis/"
    )
)

lila_docs = loader.load()
lila_vector_store = InMemoryVectorStore(embeddings)
lila_ids = lila_vector_store.add_documents(documents=text_splitter.split_documents(lila_docs))

## Define/Test RAG Prompts

### Get info about Lila Sciences

In [15]:
from langchain_core.prompts import PromptTemplate

prompt_1_text = """
You are a helpful assistant. I need to know more about {company}. Please summarize the following context, focusing on the main priorities of the company and what sets them apart from competitors.

Context:
{context_info}
"""

prompt_1 = PromptTemplate.from_template(prompt_1_text)
prompt_1_messages = prompt_1.invoke({"company": "Lila Sciences", "context_info": '\n\n'.join(d.page_content for d in lila_docs)}).to_messages()
prompt_1_response = llm.invoke(prompt_1_messages)
print(prompt_1_response.content)

Lila Sciences, a Flagship Pioneering company, is focused on building scientific superintelligence to address humanity's most pressing issues. They believe that the key to achieving this lies in reinventing the scientific method and creating a laboratory where AI can explore beyond existing knowledge. 

The company emphasizes the importance of scaling scientific experimentation, mirroring the progress seen in AI with larger models unlocking new abilities. By combining AI, human efforts, and an understanding of natural processes, Lila Sciences aims to accelerate scientific progress and enable transformative discoveries in health, sustainability, and other fields. 

What sets them apart is their ambition to create a new form of intelligence, Scientific Superintelligence, where AI designs and conducts thousands of parallel experiments, pushing the boundaries of human understanding and accelerating solutions to global challenges. This approach, referred to as "polyintelligence" by Noubar Af

Maybe it would be better to split that into two prompts? One for the main priorities and one for what sets them apart from competitors.

In [28]:
company_info_prompt = PromptTemplate.from_template("""
You are a helpful assistant. I need to know more about {company}. Please summarize the following context, focusing on {focus}. Respond with a list of keywords and short phrases.

Context:
{context_info}
""")
company_name = 'Lila Sciences'

In [25]:
from pydantic import BaseModel, Field
from typing import List

class Priorities(BaseModel):
    """The company's main priorities."""
    priorities: List[str] = Field(..., description="The main priorities of the company.")

priority_focus = f"the main priorities of {company_name} and what problems they are trying to solve"
priority_docs = lila_vector_store.similarity_search(priority_focus)
priority_messages = company_info_prompt.invoke({"company": "Lila Sciences", "focus": priority_focus, "context_info": priority_docs})
structured_llm = llm.with_structured_output(Priorities)
priority_response: Priorities = structured_llm.invoke(priority_messages)
print('\n'.join(priority_response.priorities))


Scientific Superintelligence
Reinventing Scientific Method
AI-Powered Laboratory
Parallel Experiments
Scaling Scientific Experimentation
Accelerating Breakthroughs in Medicine


In [33]:
from cohere import UnprocessableEntityError

class Differentiators(BaseModel):
    """The company's differentiators."""
    differentiators: List[str] = Field(..., description="What sets the company apart from competitors.")

differentiators_focus = f"what sets {company_name} apart from competitors or what makes it special"
differentiators_docs = lila_vector_store.similarity_search(differentiators_focus)
differentiators_messages = company_info_prompt.invoke({"company": "Lila Sciences", "focus": differentiators_focus, "context_info": differentiators_docs})
structured_llm = llm.with_structured_output(Differentiators)
try:
    differentiators_response: Differentiators = structured_llm.invoke(differentiators_messages)
except UnprocessableEntityError as e:
    differentiators_response = Differentiators(differentiators=["None Found"])
print('\n'.join(differentiators_response.differentiators))





### Get info about the job posting

In [45]:
posting_info_prompt = PromptTemplate.from_template("""
You are a professional recruiter who is helping me prepare a job application. Given the following job posting, please give me a list of specific experiences I should focus on while preparing a cover letter and resume for this job. You should take into account the company's main priorities and what differentiates them from competitors. Your response should be a json list of items (experiences and skills) that are most relevant to the job.

Main Priorities:
{priorities}

Differentiators:
{differentiators}

Job Posting:
{posting_text}
""")

class CoverLetterFocuses(BaseModel):
    """The main experiences to focus on."""
    items: List[str] = Field(..., description="The list of experiences and skills.")

focus_messages = posting_info_prompt.invoke({'priorities': '\n'.join(priority_response.priorities), 'differentiators': '\n'.join(differentiators_response.differentiators), 'posting_text': posting_doc[0].page_content})
structured_llm = llm.with_structured_output(CoverLetterFocuses)
focus_response = structured_llm.invoke(focus_messages)
print('\n'.join(focus_response.items))

Experience in materials science, chemistry, or chemical engineering, with a focus on experimental work and PhD-level expertise
Proficiency in project management, especially in handling complex, multidisciplinary projects
Familiarity with automation, lab robotics, and automated/self-driving labs
Ability to work with machine learning models and interpret data
Strong collaboration and communication skills, with an emphasis on empathy and adaptability
Comfort with ambiguity and a track record of breaking down complex goals into achievable tasks
Passion for innovation and a dedication to driving impact and accelerating breakthroughs
Transferable skills and a diverse mindset, aligning with the company's inclusive and creative culture


## See how the focuses align with my CV

### Load my CV

In [50]:
loader = WebBaseLoader(
    web_paths=("https://tana.pub/C9FUOKB-_m-F/rob-warden-rothman",)
)
cv_docs = loader.load()
cv_vector_store = InMemoryVectorStore(embeddings)
cv_ids = cv_vector_store.add_documents(documents=text_splitter.split_documents(cv_docs))

In [60]:
from langchain_docling import DoclingLoader
from langchain_docling.loader import ExportType
from langchain_text_splitters import MarkdownHeaderTextSplitter

headers = [("#" * (i+1), f'Header {i+1}') for i in range(3)]
markdown_splitter = MarkdownHeaderTextSplitter(headers)

loader = DoclingLoader(file_path="https://tana.pub/C9FUOKB-_m-F/rob-warden-rothman", export_type=ExportType.MARKDOWN)
cv_docs = loader.load()
cv_vector_store = InMemoryVectorStore(embeddings)
cv_ids = cv_vector_store.add_documents(documents=markdown_splitter.split_text(cv_docs[0].page_content))

### Test prompt with the first focus

In [67]:
from typing_extensions import Optional


cv_focus_prompt = PromptTemplate.from_template("""
You are a professional recruiter who is helping me prepare a job application. We have already identified a set of important skills on which to focus my resume and CV. For each item, we need to determine if I actually have experience that can speak to that focus (it's okay if I don't). The following context documents have been retrieved from my CV as possible matches to the current focus point. If my experience is relevant, please summarize how. Reply in one paragraph.

Current Focus:
{current_focus}

Context Documents:
{context}
""")

focus_responses = []
for cur_focus in focus_response.items:
    cv_docs = cv_vector_store.similarity_search(cur_focus)
    cv_focus_messages = cv_focus_prompt.invoke({'current_focus': cur_focus, 'context': '\n\n'.join(d.page_content for d in cv_docs)})
    cv_focus_response = llm.invoke(cv_focus_messages)
    focus_responses.append(dict(focus=cur_focus, response=cv_focus_response.content))
    print(f"# {cur_focus}\n{cv_focus_response.content}\n\n")

# Experience in materials science, chemistry, or chemical engineering, with a focus on experimental work and PhD-level expertise
Your work experience demonstrates a strong background in materials science, chemistry, and chemical engineering, with a clear focus on experimental work. Your postdoctoral research at MIT involved creating libraries of enzymes and developing a retrosynthesis algorithm for the fermentation of novel compounds, which required a deep understanding of chemistry and chemical engineering principles. Additionally, your role at GRO Biosciences, where you managed a biofoundry and developed automation protocols, further showcases your expertise in laboratory automation, assay development, and fermentation—all of which are highly applicable to the field of chemical engineering. Your dissertation research at the University of Pennsylvania also contributes to your qualifications, as it involved designing and implementing a novel bioconjugation technique, STEPL, which requi