-----

# **Scrape Website Data Using Langchain's WebBaseLoader and Experiment with It**


------

### **How to use groq to query**

In [133]:
# importing Libraries
import os
import json
from langchain_groq import ChatGroq
from dotenv import load_dotenv
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

llm = ChatGroq(
    temperature=0.3,
    groq_api_key=groq_api_key,
    model_name= "Llama-3.1-8b-Instant",

)

response = llm.invoke("what is the meaning of life?")   
print(response.content)

The meaning of life is a complex and multifaceted question that has been debated by philosophers, theologians, scientists, and thinkers for centuries. There is no one definitive answer, and it may vary depending on cultural, personal, and individual perspectives.

Some possible perspectives on the meaning of life include:

1. **Biological perspective**: From a biological standpoint, the meaning of life is to survive and reproduce. This is the fundamental drive of all living organisms, and it is the basis for the continuation of species.
2. **Psychological perspective**: From a psychological perspective, the meaning of life is to find happiness, fulfillment, and purpose. This can be achieved through personal growth, relationships, and contributing to society.
3. **Philosophical perspective**: Philosophers have proposed various theories about the meaning of life, including:
 * **Existentialism**: Life has no inherent meaning, and it is up to individuals to create their own purpose and me

### **Setting up WebBaseLoader**

In [134]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://jobs.nike.com/job/R-33460")
page_data = loader.load().pop().page_content
print(page_data)



Search JobsSkip navigationSearch JobsNIKE, INC. JOBSContract JobsJoin The Talent CommunityLife @ NikeOverviewBenefitsBrandsOverviewJordanConverseTeamsOverviewAdministrative SupportAdvanced InnovationAir Manufacturing InnovationAviationCommunicationsCustomer ServiceDesignDigitalFacilitiesFinance & AccountingGovernment & Public AffairsHuman ResourcesInsights & AnalyticsLegalManufacturing & EngineeringMarketingMerchandisingPlanningPrivacyProcurementProduct Creation, Development & ManagementRetail CorporateRetail StoresSalesSocial & Community ImpactSports MarketingStrategic PlanningSupply Chain, Distribution & LogisticsSustainabilityTechnologyLocationsOverviewNike WHQNike New York HQEHQ: Hilversum, The NetherlandsELC: Laakdal, BelgiumGreater China HQDiversity, Equity & InclusionOverviewMilitary InclusionDisability InclusionIndigenous InclusionInternshipsGIFT CARDSPROMOTIONSFIND A STORESIGN UP FOR EMAILBECOME A MEMBERNIKE JOURNALSEND US FEEDBACKGET HELPGET HELPOrder StatusShipping and Del

### **Create a Prompt Template**

In [135]:
from langchain_core.prompts import PromptTemplate

prompt_extract = PromptTemplate.from_template(
    """
    ### SCRAPED TEXT FROM WEBSITE:
    {page_data}
    ### INSTRUCTION:
    The scraped text is from the careers page of a website.
    Your job is to extract the job postings and return them in JSON format containing the 
    following keys: `role`, `experience`, `skills`, and `description`.
    Ensure each key contains accurate and relevant information. 
    If some information is not available, leave that field empty but include the key.
    **Do not include any additional text, only return the JSON object.**
    ### VALID JSON OUTPUT (NO PREAMBLE):
    """
)

chain_extract = prompt_extract | llm 
res = chain_extract.invoke(input={'page_data':page_data})
type(res.content)

str

### **Load data in json**

In [136]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()
json_res = json_parser.parse(res.content)
json_res

{'Administrative Support': {'role': 'Administrative Support',
  'experience': '',
  'skills': '',
  'description': 'Supports the day-to-day activities of the team, including administrative tasks, data entry, and other duties as assigned.'},
 'Advanced Innovation': {'role': 'Advanced Innovation',
  'experience': '',
  'skills': '',
  'description': 'Develops and implements innovative solutions to drive business growth and improve operational efficiency.'},
 'Air Manufacturing Innovation': {'role': 'Air Manufacturing Innovation',
  'experience': '',
  'skills': '',
  'description': 'Drives innovation in air manufacturing, including the development of new products, processes, and technologies.'},
 'Aviation': {'role': 'Aviation',
  'experience': '',
  'skills': '',
  'description': 'Supports the development and implementation of aviation-related projects and initiatives.'},
 'Communications': {'role': 'Communications',
  'experience': '',
  'skills': '',
  'description': 'Develops and imp

In [137]:
type(json_res)

dict

### **Load an Example Dataset**

In [138]:
import pandas as pd

df = pd.read_csv(r"E:\Cold Email Generater\my_portfolio.csv")
df

Unnamed: 0,Techstack,Links
0,"React, Node.js, MongoDB",https://example.com/react-portfolio
1,"Angular,.NET, SQL Server",https://example.com/angular-portfolio
2,"Vue.js, Ruby on Rails, PostgreSQL",https://example.com/vue-portfolio
3,"Python, Django, MySQL",https://example.com/python-portfolio
4,"Java, Spring Boot, Oracle",https://example.com/java-portfolio
5,"Flutter, Firebase, GraphQL",https://example.com/flutter-portfolio
6,"WordPress, PHP, MySQL",https://example.com/wordpress-portfolio
7,"Magento, PHP, MySQL",https://example.com/magento-portfolio
8,"React Native, Node.js, MongoDB",https://example.com/react-native-portfolio
9,"iOS, Swift, Core Data",https://example.com/ios-portfolio


### **Insert it into Chromadb**

In [139]:
import uuid
import chromadb

client = chromadb.PersistentClient('vectorstore')
collection = client.get_or_create_collection(name="portfolio")

if not collection.count():
    for _, row in df.iterrows():
        collection.add(documents=row["Techstack"],
                       metadatas={"links": row["Links"]},
                       ids=[str(uuid.uuid4())])

### **Let's Query**

In [140]:
links = collection.query(query_texts=['Experience in Python', "Experties in React"], n_results=2).get("metadatas")
links

[[{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/python-portfolio'}],
 [{'links': 'https://example.com/react-portfolio'},
  {'links': 'https://example.com/react-native-portfolio'}]]

In [141]:
job

{'Administrative Support': {'role': 'Administrative Support',
  'experience': '',
  'skills': '',
  'description': 'Support the day-to-day operations of the business, ensuring the smooth delivery of administrative tasks and projects.'},
 'Advanced Innovation': {'role': 'Advanced Innovation',
  'experience': '',
  'skills': '',
  'description': 'Drive innovation and experimentation to develop new products, services, and business models that meet the evolving needs of consumers.'},
 'Air Manufacturing Innovation': {'role': 'Air Manufacturing Innovation',
  'experience': '',
  'skills': '',
  'description': 'Develop and implement innovative manufacturing processes and technologies to improve efficiency, quality, and sustainability.'},
 'Aviation': {'role': 'Aviation',
  'experience': '',
  'skills': '',
  'description': ''},
 'Communications': {'role': 'Communications',
  'experience': '',
  'skills': '',
  'description': 'Develop and implement effective communication strategies to engage

In [142]:
job = json_res

### **Define a Prompt Template TO GENERATE A COLD EMAIL**

In [143]:
prompt_email = PromptTemplate.from_template(
        """
        ### JOB DESCRIPTION:
        {job_description}
        
        ### INSTRUCTION:
        You are Raheel, a business development executive at AtliQ. AtliQ is an AI & Software Consulting company dedicated to facilitating
        the seamless integration of business processes through automated tools. 
        Over our experience, we have empowered numerous enterprises with tailored solutions, fostering scalability, 
        process optimization, cost reduction, and heightened overall efficiency. 
        Your job is to write a cold email to the client regarding the job mentioned above describing the capability of AtliQ 
        in fulfilling their needs.
        Also add the most relevant ones from the following links to showcase Atliq's portfolio: {link_list}
        Remember you are Raheel, BDE at AtliQ. 
        Do not provide a preamble.
        ### EMAIL (NO PREAMBLE):
        
        """
        )

chain_email = prompt_email | llm
res = chain_email.invoke({"job_description": str(job), "link_list": links})
print(res.content)

Subject: Unlock Business Efficiency with AtliQ's Expert Solutions

Dear [Client Name],

I hope this email finds you well. My name is Raheel, and I am a Business Development Executive at AtliQ, a leading AI & Software Consulting company. We specialize in streamlining business processes through automated tools, empowering enterprises to achieve scalability, process optimization, cost reduction, and enhanced overall efficiency.

I came across your job description for a [Job Title] role, and I believe AtliQ's expertise can significantly contribute to your organization's success. Our team has a proven track record of delivering tailored solutions that address the unique needs of our clients.

Some of the key areas where we can provide value include:

- **Digital Solutions**: We can help you develop and implement digital solutions to drive business growth and improve operational efficiency. Our expertise in [React](https://example.com/react-portfolio) and [React Native](https://example.com/r