<a href="https://colab.research.google.com/github/lcbjrrr/genai/blob/main/GCP_RAG_WVec_LLM_GCP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Natural Language Processing (NLP)

## Embeddings and Tokenization

Natural Language Processing (NLP) is a field dedicated to enabling computers to understand, interpret, and generate human language. A cornerstone of modern NLP is the use of **embeddings**, which are numerical representations of words or phrases that capture their semantic meaning, allowing algorithms to process linguistic data effectively. Building upon these foundations, **Large Language Models (LLMs)** represent a significant advancement, utilizing vast datasets and sophisticated architectures to understand context, generate coherent text, and perform a wide array of language-based tasks.

In [None]:
!pip install spacy gensim


Collecting gensim
  Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (8.4 kB)
Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (27.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m66.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gensim
Successfully installed gensim-4.4.0


Word2Vec is a technique used in natural language processing to generate word embeddings. It represents words as dense vectors in a continuous vector space, where words with similar meanings are located closer to each other. This is achieved by analyzing the context in which words appear in a large corpus of text.

In [None]:
%%time
from gensim.models import KeyedVectors
word_vectors = KeyedVectors.load_word2vec_format("http://wikipedia2vec.s3.amazonaws.com/models/en/2018-04-20/enwiki_20180420_100d.txt.bz2", binary=False,unicode_errors='ignore' )

**Embeddings** are numerical representations of real-world objects, such as words, images, or entire documents. In the context of natural language processing, word embeddings are vectors that capture semantic relationships between words, allowing algorithms to process and understand human language more effectively.

![](https://towardsdatascience.com/wp-content/uploads/2020/06/1HOvcH2lZXWyOtmcqwniahQ.png)

In [None]:
embed = word_vectors['generative']
embed

array([-0.1944,  0.3173, -0.6029, -0.2289, -0.159 , -0.6233,  0.1637,
        0.4812,  0.3664, -0.4371,  0.4014,  0.356 , -0.4489, -0.1209,
        0.1787, -0.4777,  0.4284, -0.5101,  0.574 , -0.4029,  0.2346,
        0.0616, -0.5732, -0.5495,  0.0648, -0.5697,  0.3056,  0.0199,
        0.4866,  0.5387,  0.1568,  0.2029,  0.9879,  0.3328,  0.6987,
        0.314 , -0.271 , -0.2186,  0.2244,  0.0273, -0.1938,  0.2384,
       -0.3099,  0.5102,  0.2235, -0.1594, -0.8178, -0.014 , -0.4044,
        0.1803,  0.2592,  0.1052, -0.0816,  0.06  , -0.0441, -0.3898,
       -0.6213,  0.2516, -0.2886,  0.7391, -0.2618,  0.4155, -0.4727,
        0.785 ,  0.7197, -0.0754, -0.4997,  0.1545, -0.8258, -0.7265,
        0.3349,  0.1798, -0.5484, -0.2569,  0.0863, -0.4086, -0.8779,
        0.3763,  0.3226,  0.641 , -1.3968,  0.0903,  0.3317, -0.7599,
       -0.1855,  0.7091, -0.2894, -0.1777,  0.3832, -0.3214,  0.0448,
       -0.0806, -0.5196,  0.3394,  0.4149, -0.3964,  0.5066,  0.3642,
       -0.3679,  0.2

In [None]:
word_vectors.most_similar("generative")

[('lexicalist', 0.779882550239563),
 ('chomskyan', 0.7596522569656372),
 ('connectivist', 0.7340349555015564),
 ('neuroaesthetics', 0.729445219039917),
 ('generativist', 0.7284153699874878),
 ('homuncular', 0.7270619869232178),
 ('ENTITY/Glue_semantics', 0.7268419861793518),
 ('meinongian', 0.7229824066162109),
 ('metasystems', 0.720165491104126),
 ('ENTITY/Constraint-based_grammar', 0.7194753289222717)]

**Tokens** are the fundamental building blocks of text data in natural language processing. They are discrete units derived from a larger body of text, typically words, punctuation marks, or subword units (like 'ing' or 'un'). This process of breaking down text into tokens is called tokenization and is a crucial first step for most NLP tasks, as models often operate on these individual tokens rather than raw text.

## GenAI: LLMs

Generative AI represents a significant advancement, allowing the creation of original and diverse content, such as texts, images, and audio, promising to transform industries by automating creative processes and offering new tools for innovation. The training of these models is based on exposure to vast amounts of data relevant to the desired content, using neural network architectures. Furthermore, the process of creating and refining the inputs provided to a language model to obtain desired responses is crucial for effectively directing the generative capacity of AI.

![](https://pbs.twimg.com/media/G5XShNuWsAA_sWo?format=jpg&name=medium)

Large Language Models (LLMs) are a highly advanced class of generative AI models built upon the principles of **Deep Learning**. They primarily utilize sophisticated **neural networks**, which are computational architectures inspired by the human brain, characterized by multiple layers that enable them to learn complex patterns from data. The most prevalent architecture within LLMs today is the **transformer** model, which excels at processing sequential data like text by employing attention mechanisms to weigh the importance of different parts of the input. This combination allows LLMs to process vast amounts of text, understand context, and generate human-like language for a wide array of tasks, from translation and summarization to creative writing and question answering.

![](https://pbs.twimg.com/media/G5XS8P4WUAAYvO2?format=jpg&name=medium)

## GCP

In [4]:
import os
GOOGLE_API_KEY = 'key'
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
GCP_MODEL = 'gemini-2.5-flash'


In [5]:

from google import genai
client = genai.Client()
response = client.models.generate_content(
    model=GCP_MODEL,
    contents="Explain large language models in one sentence."
)
print(response.text)

Large language models are AI systems trained on vast amounts of text data that learn to understand, generate, and respond to human language by predicting the most probable next sequence of words.


In [None]:
import os
from google import genai
from google.genai import types

SYSTEM_INSTRUCTION = 'You are a GenAI expert. That will explain concepts to a layman audience'

client = genai.Client()
question='What is the difference from Deep Learning?'

config = types.GenerateContentConfig(
    system_instruction=SYSTEM_INSTRUCTION,
    temperature=0.9,  #(0.0=deterministic, 2.0=highly creative)
    max_output_tokens=1000)

initial_history = [
    types.Content(role="user",parts=[types.Part(text="In a sentence, what are LLMs?")]),
    types.Content(role="model",parts=[types.Part(text="LLMs are advanced AI systems trained on vast amounts of text data to understand prompts and generate human-like.")])
  ]

chat = client.chats.create(
        model=GCP_MODEL,
        history=initial_history,
        config=config)

response = chat.send_message(question)
print(f"Model: {response.text}")

In [6]:
jobs=['''
Senior Python Developer
Location: [Remote/Hybrid/City]
Type: Full-time
About the Role
We are looking for a Senior Python Developer to join our core engineering team. In this role, you will be responsible for building high-quality, scalable backend services and integrating complex data systems. You’ll work closely with our data scientists and frontend teams to turn complex requirements into elegant, functional code.
Key Responsibilities
•	Architect & Build: Design and maintain scalable, low-latency, and highly available Python applications.
•	API Design: Build and document RESTful or GraphQL APIs using frameworks like FastAPI or Django.
•	Integration: Connect backend services with external web services and third-party APIs.
•	Optimization: Identify and fix performance bottlenecks and bugs.
•	Mentorship: Conduct code reviews and help junior developers grow their technical skills.
Required Skills & Qualifications
•	Expert Python Knowledge: 5+ years of experience with Python and its core libraries.
•	Web Frameworks: Mastery of Django, Flask, or FastAPI.
•	Database Management: Strong experience with PostgreSQL, MySQL, or MongoDB.
•	Asynchronous Programming: Experience with Celery, Redis, or asyncio for background tasks.
•	Testing: Proficiency in writing unit and integration tests using pytest or unittest.
Bonus Points
•	Experience with Data Science libraries (Pandas, NumPy, Scikit-learn).
•	Knowledge of containerization (Docker, Kubernetes).
•	Experience with AWS, GCP, or Azure.
''',
'''
Java Backend Engineer
Location: [Remote/Hybrid/City]
Type: Full-time
About the Role
We are seeking a Java Backend Engineer to help us build and scale our enterprise-grade microservices architecture. You will be a key player in developing robust, secure, and high-performance systems that handle millions of transactions. If you enjoy solving complex architectural puzzles and writing thread-safe, efficient code, we want to meet you.
Key Responsibilities
•	Microservices Development: Build and maintain scalable backend services using the Spring Boot ecosystem.
•	System Reliability: Implement robust security protocols and data protection measures.
•	Collaboration: Work within an Agile team to deliver high-quality software in two-week sprints.
•	Legacy Modernization: Help migrate legacy monolithic components into modern microservices.
•	Documentation: Maintain clear documentation for system architecture and API endpoints.
Required Skills & Qualifications
•	Java Mastery: 5+ years of professional experience with Java (Version 11 or higher preferred).
•	Spring Ecosystem: Deep expertise in Spring Boot, Spring Security, and Spring Data JPA.
•	ORM Frameworks: Strong understanding of Hibernate or MyBatis.
•	Messaging Queues: Experience with Kafka, RabbitMQ, or ActiveMQ.
•	Enterprise Architecture: Solid understanding of design patterns (Singleton, Factory, Observer) and SOLID principles.
Bonus Points
•	Experience with Cloud-native development (Spring Cloud).
•	Familiarity with CI/CD pipelines (Jenkins, GitLab CI).
•	Knowledge of front-end basics (React or Angular) to better collaborate with UI teams.
''']

In [7]:
apps={0:['''
Alex Rivera
Summary: Highly skilled backend architect with over eight years of experience building high-concurrency systems using Python and cloud-native technologies.
Experience
•	At TechStream Solutions, Alex spearheaded the migration to a FastAPI microservices architecture, improving system latency by 40% for five million active users.
•	They consistently mentor junior developers while managing complex integrations with Celery, Redis, and various AWS services.
Education
•	Alex holds a Master of Science in Computer Science from Stanford University, specializing in distributed systems.
•	Their academic background provided a rigorous foundation in algorithmic complexity and large-scale data management.
''',
'''
Jordan Smith
Summary: Dedicated Senior Backend Engineer with six years of experience focusing on RESTful API design and database optimization within the Python ecosystem.
Experience
•	During their tenure at Global Connect, Jordan led the backend development of a flagship e-commerce platform using Django and MySQL.
•	They were responsible for implementing robust testing suites with Pytest and optimizing database queries to handle peak seasonal traffic.
Education
•	Jordan earned a Bachelor’s degree in Software Engineering from the University of Michigan.
•	Their coursework emphasized object-oriented programming and secure software development lifecycle practices.
''',
'''
Sam Taylor
Summary: Professional Python Developer with four years of experience dedicated to writing clean, testable code for scalable web applications.
Experience
•	Sam worked at Startup Hub where they utilized Django and PostgreSQL to build and deploy multiple customer-facing features.
•	They successfully integrated third-party payment gateways and managed CI/CD pipelines to ensure consistent deployment cycles.
Education
•	Sam graduated with a Bachelor of Science in Information Technology from Georgia Tech.
•	They also participated in several specialized coding intensives focused on full-stack development and cloud computing.
''',
'''
Dr. Casey Wong
Summary: Accomplished Data Scientist with five years of professional experience who is currently pivoting into heavy backend engineering roles.
Experience
•	At Insight Analytics, Casey developed complex data pipelines and machine learning models using Pandas, Scikit-learn, and NumPy.
•	While they have built internal tools using Flask, their primary focus has been on data extraction rather than enterprise-scale architecture.
Education
•	Casey earned a Ph.D. in Statistics from the University of California, Berkeley, where they published research on predictive modeling.
•	Their doctoral studies involved extensive mathematical programming and advanced data visualization techniques.
''',
'''
Morgan Lee
Summary: Enthusiastic developer with two years of professional experience building modern web components in fast-paced startup environments.
Experience
•	Working at AppLaunch, Morgan assisted in building lightweight APIs using FastAPI and managed document-based data in MongoDB.
•	They played a key role in developing interfaces between systems while participating in daily agile stand-ups.
Education
•	Morgan holds a Bachelor’s degree in Computer Science from the University of Texas at Austin.
•	During their undergraduate career, they completed a high-impact internship focused on mobile application development and web security.
''',
'''
Jamie Clark
Summary: Seasoned Software Engineer with five years of experience primarily focused on enterprise-level Java and modern frontend frameworks.
Experience
•	At Enterprise Corp, Jamie built robust microservices using Spring Boot and developed dynamic user interfaces with React and JavaScript.
•	Although they have used Python for basic automation scripts, their professional career has been centered almost entirely around the Java ecosystem.
Education
•	Jamie earned a Bachelor of Science in Computer Science from the University of Washington.
•	Their degree program focused heavily on enterprise software architecture, compiler design, and systems-level programming.
'''
],
1:['''
Elena Vance
Summary: Highly accomplished Senior Java Engineer with over nine years of experience designing high-throughput microservices and leading digital transformation projects in the fintech sector.
Experience
•	At FinTech Global, Elena led the transition of a monolithic payment system into a Spring Boot microservices architecture, achieving a 50% improvement in deployment frequency.
•	She serves as a technical lead, mentoring a team of eight developers while maintaining 99.99% uptime for services processing over two million daily transactions.
Education
•	Elena holds a Master of Science in Computer Science from the Massachusetts Institute of Technology (MIT).
•	Her graduate research focused on distributed systems and advanced cryptography for secure financial transactions.
''',
'''
Marcus Chen
Summary: Expert Java Developer with seven years of experience specializing in the Spring ecosystem and high-performance backend systems for enterprise-scale platforms.
Experience
•	At RetailOne, Marcus designed and optimized RESTful APIs using Spring Boot and Hibernate, reducing average query response times by 30% through advanced caching strategies.
•	He managed the integration of Kafka for real-time inventory updates and collaborated with the DevOps team to implement automated CI/CD pipelines.
Education
•	Marcus earned a Bachelor of Science in Software Engineering from the University of Waterloo.
•	His degree included a heavy emphasis on systems design, multithreaded programming, and relational database management.
''',
'''
Sarah Jenkins
Summary: Capable Backend Developer with four years of experience focused on building robust Java applications and implementing secure data protection protocols.
Experience
•	Working at HealthSync, Sarah developed core backend modules using Spring Boot and Spring Security to ensure HIPAA compliance across all medical data services.
•	She successfully migrated several legacy components to Spring Data JPA and contributed to the development of a internal developer portal to streamline API documentation.
Education
•	Sarah graduated with a Bachelor of Science in Computer Science from Purdue University.
•	She also holds an Oracle Certified Professional (OCP) Java SE 11 Programmer certification, demonstrating deep knowledge of the language's core features.
''',
'''
David Miller
Summary: Seasoned Software Developer with 12 years of experience primarily focused on enterprise-level J2EE applications and monolithic architecture within the insurance industry.
Experience
•	At Legacy Life Insurance, David spent over a decade maintaining and scaling large-scale Java EE applications running on IBM WebSphere.
•	While he has deep expertise in Core Java and SOAP-based web services, his experience with modern Spring Boot microservices and cloud-native development is limited to recent internal pilot projects.
Education
•	David holds a Bachelor’s degree in Management Information Systems from the University of Florida.
•	His education focused on the intersection of business logic and software systems, with additional certifications in IBM middleware technologies.
''',
'''
Priya Sharma
Summary: High-potential Junior Developer with a strong academic foundation in Java and a passion for learning modern microservices architecture and cloud technologies.
Experience
•	During a six-month internship at CloudPath, Priya assisted in the development of basic Java services and gained hands-on experience with JUnit testing and Git version control.
•	She developed a personal project—a library management system—using Spring Boot and MySQL, which she currently maintains on GitHub to showcase her coding standards.
Education
•	Priya recently earned her Bachelor of Science in Computer Science from the University of Illinois Urbana-Champaign.
•	Her academic career was highlighted by a senior capstone project that utilized Java to simulate complex network traffic patterns.
''']}

In [8]:
for i in range(len(jobs)):
  p=''
  for a in apps[i]:
    p=p+a+'\n'
  prompt = "FOR THE FOLLOWING POSITION: "+jobs[i]+ " \n WHICH CANDIDATE BELOW IS BEST QUALIFIED? PLEASE ANSWER CONCISELY."+p
  print(prompt)
  response = client.models.generate_content(model=GCP_MODEL,contents=prompt)
  print("\n =====> :\n", response.text)
  print('------------------------------------------------')

FOR THE FOLLOWING VACANCY: 
Senior Python Developer
Location: [Remote/Hybrid/City]
Type: Full-time
About the Role
We are looking for a Senior Python Developer to join our core engineering team. In this role, you will be responsible for building high-quality, scalable backend services and integrating complex data systems. You’ll work closely with our data scientists and frontend teams to turn complex requirements into elegant, functional code.
Key Responsibilities
•	Architect & Build: Design and maintain scalable, low-latency, and highly available Python applications.
•	API Design: Build and document RESTful or GraphQL APIs using frameworks like FastAPI or Django.
•	Integration: Connect backend services with external web services and third-party APIs.
•	Optimization: Identify and fix performance bottlenecks and bugs.
•	Mentorship: Conduct code reviews and help junior developers grow their technical skills.
Required Skills & Qualifications
•	Expert Python Knowledge: 5+ years of experienc