#### Import LLM from groqcloud

In [1]:
from langchain_groq import ChatGroq

In [2]:
llm = ChatGroq(
    model = "llama-3.3-70b-versatile",
    temperature = 0,
    groq_api_key = "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"
)

response = llm.invoke("What is Quantum Computing? Explain in a Nutshell")
print(response.content)

**Quantum Computing in a Nutshell:**

Quantum Computing is a revolutionary technology that uses the principles of quantum mechanics to perform calculations and operations on data. It's different from classical computing, which uses bits (0s and 1s) to process information.

**Key aspects:**

1. **Qubits**: Quantum Computing uses quantum bits (qubits) that can exist in multiple states (0, 1, and both) simultaneously.
2. **Superposition**: Qubits can process multiple possibilities at the same time, making them incredibly fast.
3. **Entanglement**: Qubits can be connected, allowing them to affect each other even when separated.
4. **Quantum algorithms**: Special algorithms are designed to take advantage of qubits' unique properties.

**Potential benefits:**

1. **Faster processing**: Quantum Computing can solve complex problems much faster than classical computers.
2. **Cryptography**: Quantum Computing can break certain encryption methods, but it can also create unbreakable ones.
3. **Opt

### Scrap the data from the company's job posting website 
For this we can use langchain webbaseloader <br>
We need to install langchain_community package

In [3]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://careers.nike.com/lead-machine-learning-engineer-supply-chain/job/R-40001")

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
import re

page_data = loader.load().pop().page_content # web scraping
print(re.sub(r"\n+", "\n ", page_data))


 Lead Machine Learning Engineer - Supply Chain
 Skip to main content
 Open Virtual Assistant
 Home
 Career Areas
 Total Rewards
 Life@Nike
 Purpose
 Language
 Select a Language
   Deutsch  
   English  
   Español (España)  
   Español (América Latina)  
   Français  
   Italiano  
   Nederlands  
   Polski  
   Tiếng Việt  
   Türkçe  
   简体中文  
   繁體中文  
   עִברִית  
   한국어  
   日本語  
 Careers
 Close Menu
 Careers
 Chat
                                 Home
                             
                                 Career Areas
                             
                                 Total Rewards
                             
                                 Life@Nike
                             
                                 Purpose
                             
 Jordan Careers
 Converse Careers
 Language
 Menu
 Return to Previous Menu
 Select a Language
   Deutsch  
   English  
   Español (España)  
   Español (América Latina)  
   Français  
   Italiano  
   Neder

#### Use LLM to convert scraped data to a required json data 
* Pass the job posting to LLM along with a prompt template
* Invoke the LLM to return only a valid Json Object without preamble and additional formatting

In [5]:
from langchain_core.prompts import PromptTemplate

## creating a prompt to extract info as json from the JD
## The optimal prompt is found through several iterations
prompt_extract = PromptTemplate.from_template(
    """
    ### SCRAPED TEXT FROM WEBSITE:
    {page_data}
    ### INSTRUCTION:
    The scrapped text is form the career's page of a website.
    Your job is to extract the job posting and return them in JSON Format containing
    following keys: 'role', 'experience', 'skills' and 'description'
    The JSON must start with `{{` and end with `}}` with no extra text, quotes, or markdown formatting.
    ### STRICTLY VALID JSON OUTPUT(NO PREAMBLE):
"""
)

## using a pipe operation to form a chain - passing the prompt to the llm
chain_extract = prompt_extract | llm

res = chain_extract.invoke(input = {"page_data":page_data})
print(res.content)

{
"role": "Lead Machine Learning Engineer - Supply Chain",
"experience": "5+ years of professional experience in software engineering, data engineering, machine learning, or related field",
"skills": [
"Python",
"algorithms and data structures",
"AWS",
"database technology (e.g. Postgres, Redis)",
"data processing technology (e.g. EMR)",
"agile development",
"test driven development",
"MLOps",
"API development",
"mathematical optimization",
"cloud architecture",
"Amazon Web Services",
"Spark",
"Kubernetes",
"Docker",
"Jenkins",
"Databricks",
"Terraform"
],
"description": "Develop robust advanced analytics and machine learning solutions that have a direct impact on the business. Own projects end-to-end - from conception to operationalization, demonstrating an understanding of the full software development lifecycle. Provide technical vision and guidance to teammates. Design and implement scalable applications that leverage prediction models and optimization programs to deliver data driv

#### convert the text output to a JSON object
* using Jsonparser from LangChain

In [6]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()
json_res = json_parser.parse(res.content)
json_res

{'role': 'Lead Machine Learning Engineer - Supply Chain',
 'experience': '5+ years of professional experience in software engineering, data engineering, machine learning, or related field',
 'skills': ['Python',
  'algorithms and data structures',
  'AWS',
  'database technology (e.g. Postgres, Redis)',
  'data processing technology (e.g. EMR)',
  'agile development',
  'test driven development',
  'MLOps',
  'API development',
  'mathematical optimization',
  'cloud architecture',
  'Amazon Web Services',
  'Spark',
  'Kubernetes',
  'Docker',
  'Jenkins',
  'Databricks',
  'Terraform'],
 'description': 'Develop robust advanced analytics and machine learning solutions that have a direct impact on the business. Own projects end-to-end - from conception to operationalization, demonstrating an understanding of the full software development lifecycle. Provide technical vision and guidance to teammates. Design and implement scalable applications that leverage prediction models and optimiza

In [7]:
type(json_res) ## type shoud be a dictionary

dict

#### Prepare Chromadb
* chromadb will contain data of tech stacks and corresponding portfolio links
* We match skills from the job posting with the skills in the chromadb
* return few top matched portfolios that matched the skills form the job posting 
* Here we use persistent client rather than standard client (Ephemeral client)
    * Standrad Client - The client operates entirely in memory, and all data is stored temporarily during the session. Once the session ends and the application is terminated all the stored data is lost.
    * Persistent Client - The client stores data on disk at a specified local path, ensuring the informatioon remains available across all sessions.

In [8]:
import pandas as pd

df = pd.read_csv("my_portfolio.csv")

df.sample(5)

Unnamed: 0,Techstack,Links
11,"Kotlin, Android, Firebase",https://example.com/kotlin-android-portfolio
15,"Backend, Kotlin, Spring Boot",https://example.com/kotlin-backend-portfolio
1,"Angular,.NET, SQL Server",https://example.com/angular-portfolio
6,"WordPress, PHP, MySQL",https://example.com/wordpress-portfolio
18,"Machine Learning, Python, TensorFlow",https://example.com/ml-python-portfolio


In [9]:
## iterate over the dataframe and input each recod into chromadb
import chromadb
import uuid
## uuid (Universally unique identifier) - generates  aunique identifier for each record added to chromadb collection

client = chromadb.PersistentClient("vectorstore")
## Persistent client creates a folder and stores the data inside that folder
collection = client.get_or_create_collection(name = "portfolio")

if not collection.count():
    for _,row in df.iterrows():
        collection.add(
            documents=row["Techstack"],
            metadatas={"links": row["Links"]},
            ids = [str(uuid.uuid4())]           
            )

In [10]:
## simple collection query check
links = collection.query(
    query_texts=[
        "Expertise in Python",
        "Expertise in React Native"
    ], n_results = 2

).get("metadatas")
links
## for each of the queries we requested 2 top responses and thier corresponding metadata

[[{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/python-portfolio'}],
 [{'links': 'https://example.com/react-native-portfolio'},
  {'links': 'https://example.com/react-portfolio'}]]

In [11]:
job = json_res
job['skills']

['Python',
 'algorithms and data structures',
 'AWS',
 'database technology (e.g. Postgres, Redis)',
 'data processing technology (e.g. EMR)',
 'agile development',
 'test driven development',
 'MLOps',
 'API development',
 'mathematical optimization',
 'cloud architecture',
 'Amazon Web Services',
 'Spark',
 'Kubernetes',
 'Docker',
 'Jenkins',
 'Databricks',
 'Terraform']

In [12]:
## retrieving top portfolios against the job posting skill requirements
links = collection.query(
    query_texts=job["skills"], n_results = 2

).get("metadatas")
links[0:5]
## for each of the queries we requested 2 top responses and thier corresponding metadata

[[{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/python-portfolio'}],
 [{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/magento-portfolio'}],
 [{'links': 'https://example.com/ios-ar-portfolio'},
  {'links': 'https://example.com/wordpress-portfolio'}],
 [{'links': 'https://example.com/magento-portfolio'},
  {'links': 'https://example.com/vue-portfolio'}],
 [{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/magento-portfolio'}]]

In [13]:
## create a prompt template for generating an Email
prompt_email = PromptTemplate.from_template(
    """
    ### JOB DESCRIPTION
    {job_description}

    ### INSTRUCTION
    You are Mr.James a business development executive at NewTech. NewTech is an AI & Software 
    consulting company dedicated to facilitating the seamless integration of business process thriugh automated tools.
    Over our experience , we have empowered neumerous enterprices with tailored solutions, fostering scalability,
    process optimization, cost reduction, and heightened overall efficiency.
    Your job is t write a cold email to a client the job mentioned above describing the capability of
    NewTech in fullfilling thier needs.
    Also add the most relevant ones from the following links to showcase Atliq's portfolio: {link_list}
    Remember you are Mr.James , BDE at NewTech.
    Do not provide a preamble.
    ### EMAIL (NO PREAMBLE)

"""
)

chain_email = prompt_email | llm

res = chain_email.invoke({"job_description" :str(job), "link_list" : links})

print(res.content)

Subject: Unlocking Business Potential with AI-Driven Supply Chain Solutions

Dear Hiring Manager,

I came across the job description for a Lead Machine Learning Engineer - Supply Chain, and I was impressed by the scope of the role. As a Business Development Executive at NewTech, I believe our team can help you develop robust advanced analytics and machine learning solutions that drive business impact.

With our expertise in Python, algorithms, and data structures, we can design and implement scalable applications that leverage prediction models and optimization programs. Our experience with AWS, database technology (e.g., Postgres, Redis), and data processing technology (e.g., EMR) enables us to build efficient and reliable systems. We are well-versed in agile development, test-driven development, MLOps, API development, and mathematical optimization, ensuring that our solutions are tailored to your specific needs.

Our team has a strong background in cloud architecture, with proficien