<a href="https://colab.research.google.com/github/rahulsinghai/Akka-Cookbook/blob/master/AI_deepseek_resume.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Load CV or resume from a Microsoft word file and convert it to a markdown string

!pip install python-docx
import docx
import io
import re

def extract_text_from_docx(file_path):
    """Extracts text from a Microsoft Word (.docx) file, handling tables.
    Args:
        file_path (str): The path to the .docx file.
    Returns:
        str: The extracted text from the document as a single string.
    """
    doc = docx.Document(file_path)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    for table in doc.tables:
      for row in table.rows:
        for cell in row.cells:
          full_text.append(cell.text)
    return '\n'.join(full_text)

def convert_text_to_markdown(text):
    """Converts plain text to markdown format with basic structure.
    Args:
        text (str): The input text as a single string.
    Returns:
        str: The converted text in markdown format.
    """
    markdown_text = ""
    lines = text.split('\n')
    for line in lines:
        line = line.strip()
        if not line:
           continue
        if line.isupper(): # If line is ALL CAPS, make it a heading
            markdown_text += f"\n## {line}\n"
        elif re.match(r'^- ', line): # if the line starts with a dash, treat it as a list
             markdown_text += f"* {line[2:].strip()}\n"
        else:
            markdown_text += f"{line}\n"
    return markdown_text

def convert_word_to_markdown(file_path):
    """Loads a .docx file, extracts text, and converts it to markdown.
    Args:
        file_path (str): The path to the .docx file.
    Returns:
        str: The converted text in markdown format, or None if there is an error
    """
    try:
      extracted_text = extract_text_from_docx(file_path)
      markdown_text = convert_text_to_markdown(extracted_text)
      return markdown_text
    except Exception as e:
      print(f"An error occurred: {e}")
      return None

# Example usage:
# Ensure that the word file is uploaded to colab
file_path = 'your_resume.docx' # Replace with your .docx file name

markdown_output = convert_word_to_markdown(file_path)
if markdown_output:
    print(markdown_output)

    # Optionally save the markdown output to a file
    with open("output.md", "w") as f:
        f.write(markdown_output)
    print("Markdown output saved to output.md")


In [3]:
from openai import OpenAI
from google.colab import userdata

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=userdata.get('deepseekapikey'),
)

md_resume = """
# Rahul Singhai

## ML Platform | Data Platform | Data Engineering

**Email:** singrahu@gmail.com
**Mobile:** +44-(0)744-896-3495

---

## Professional Summary

- 20 years of experience in:
  - Data Platforms, Data Engineering & Analytics, Big Data & Data Warehouse as designer & developer roles
  - Large-scale distributed processing architectures: CEP, enterprise cache, low-latency data platforms
  - Enterprise search engine servers using Elasticsearch, Splunk, and SolrCloud
  - Developing technical specifications, architectures based on business requirements and functional analysis
  - Coordinating with product owners, functional analysts, data providers, and vendors
- 3 years of experience as Scrum Master, planning & running sprints, backlogs, iterations & daily ceremonies

---

## Educational Qualifications

- **Master of Technology (Information Technology)**, IIT Bombay (2005-07) - **CPI:** 9.54/10
- **Bachelor of Engineering (Computer Science & Eng.)**, HCET Jabalpur (2000-04) - **80.72%**

---

## Technologies

**ML Platform:** AWS SageMaker, Feast, Qdrant, Kubeflow, PyFlink
**Data Platform:** BigQuery, Snowflake, Spark, Kafka, Streams, Glue Schema Registry, Elasticsearch, Logstash, Kibana, Looker, Postgres, MySQL
**Data Engineering:** Airflow, Airbyte, DataHub, Soda Core, dbt
**Cloud:** AWS, GCP, Azure, K8s, Docker, Openshift, Rancher, EKS, Route53, CloudFormation, Rekognition, Cognito, Beanstalk, CloudWatch
**Languages:** Python, Java, Scala, Akka, JRuby, C++, Shell Scripting
**DevOps:** ArgoCD, Terragrunt, Terraform, Helm Chart, Rundeck, Ansible, Jenkins, Gradle, SBT, Maven
**Source Code Analyzers:** SonarQube, Checkmarx, Fortify, Veracode, FindBugs, SCCT, PMD

---

## Certifications & Courses

- **GCP Professional Cloud Architect**, 2023
- **Airflow Developer**, 2023
- **AWS Developer**, 2018
- **Elasticsearch Developer**, 2016
- **Big Data Analysis with Scala and Spark**, 2017
- **Reactive Programming**, 2015
- **Functional Programming Scala**, 2014
- **Python**, 2014
- **Java**, 2005
- **IBM DB2**, 2005

---

## Professional Experience

### **Viator (TripAdvisor) – Principal Data Engineer**
**Duration:** Nov 2023 – Present
**Project:** ML & Data Platform, Data Engineering, Data Analytics

#### Key Responsibilities and Achievements:

- **Strategic Contributions:**
  - Defined ML platform & data platform vision, roadmaps, and long-term strategy
  - Led hiring processes, onboarding senior engineers, and refining technical interview standards

- **Machine Learning Platform Implementation:**
  - Designed PoCs for ML platforms using SageMaker, Feast, PyFlink, Kubeflow, MLflow, and Seldon Core
  - Tested Snowflake & BigQuery for offline feature extraction, Qdrant & ScyllaDB for online feature extraction

- **Data Platform Modernization:**
  - Proposed and gained approval for transitioning to Lakehouse architecture leveraging Apache Iceberg
  - Introduced Medallion patterns for structured data workflows

- **Cross-Team Collaboration:**
  - Provided architectural direction for multiple Scrum teams
  - Integrated real-time data ingestion workflows into dashboards and reports
  - Enhanced observability and governance using Acryl DataHub

- **Mentorship and Knowledge Sharing:**
  - Conducted mentorship sessions, structured knowledge-sharing initiatives, and technical documentation

- **Data Engineering Expertise:**
  - Optimized data pipelines, reducing latencies by 25%
  - Designed high-performance pipelines supporting petabyte-scale workloads using Flink, Kafka, and Airflow
  - Ensured data accuracy and reliability using Soda Core and Flink validation

---

### **Santander Bank, UK – Technical Lead (Contract)**
**Duration:** Dec 2017 – Nov 2023
**Project:** Visibility & Observability Platform

#### Key Responsibilities:

- Designed & developed distributed processing architectures for real-time ingestion, monitoring, and analytics
- Led enterprise-wide data fabric initiatives including data streaming, governance, DevOps & cloud transformations
- Managed Hadoop, Spark, Elasticsearch & Kafka clusters
- Migrated RDBMS to NoSQL using Kafka Streams for real-time CDC
- Built real-time anomaly detection and forecasting using machine learning models
- Developed data pipelines using Airflow, Logstash, NiFi & Flume
- Led data governance efforts using Apache Atlas & ILM policies for data retention

---

### **JD Sports, UK – Lead Engineer (Contract)**
**Duration:** Jul 2021 – Jun 2022
**Project:** Server-side CAPI

#### Key Responsibilities:

- Architected Event Driven Architecture for integrating customer events from 55+ websites/apps to Meta, TikTok, Snapchat & Awin
- Built Kafka pipelines with Schema Registry for validation and evolution
- Developed CI/CD automation using Terraform, Jenkins & ArgoCD
- Configured Kubernetes workloads, secrets, load balancers & Helm charts
- Implemented monitoring & alerting with Prometheus, Grafana, and New Relic

---

### **Barclays Bank, UK – Senior BigData Tech Lead**
**Duration:** Apr 2015 – Dec 2017
**Project:** Fusion (Chief Security Office)

#### Key Responsibilities:

- Managed Big Data security & risk analytics initiatives, reporting to the Director of Chief Data Office
- Led design & implementation of Hadoop, Spark, Kafka & Elasticsearch clusters
- Developed real-time fraud detection systems using Spark Streaming
- Built ETL/data ingestion pipelines using Akka streaming, NiFi & Logstash
- Served as Scrum Master, establishing Agile best practices and process improvements

---

### **Previous Roles**

- **Anomaly42, UK – Scala Developer (Feb 2015 – Apr 2015)**
- **DNV GL, UK – Full Stack Developer (Oct 2012 – Jan 2015)**
- **Creative Virtual, UK – Veracode Security Analyst (Sep 2012 – Oct 2012)**
- **SAP (Sybase) – Senior Software Engineer (Aug 2007 – Sep 2012)**
- **Tata Consultancy Services / Morgan Stanley – J2EE Developer (Oct 2004 – Jul 2005)**

---

## Patents

- **“Methods & Systems for Monitoring Server Cloud”**, US Patent 1933.1280000, 2010
- **“Dynamically Injecting Behaviour in Flex View Components”**, US Patent 1933.1320000, 2010

---

## Publications

- **“Intelligent Vehicular Transportation System (InVeTraS)”**, ATNAC 2007, Christchurch, New Zealand
- **“IntelliCarTS: Intelligent Car Transportation System”**, 15th IEEE LANMAN 2007, Princeton NJ, USA

---

## Awards & Achievements

- **Employee of the Month** – Barclays & Sybase
- **Boss of SOC** – Splunk Live, June 2017
- **Outstanding Achievement Award** – Barclays, Q1 2016
- **Innovator Award** – Sybase, 2011

---

"""


job_description = """
Head of Data Platform

THE ROLE

Head of Platform Data

The Head of Platform Data will work alongside the VP(s) of Product Engineering to create a world-class SaaS platform experience for our clients. The frameworks and solutions they shape will be responsible for driving real-time, embedded platform insights to expand on our extreme observability strategy for our treasury clients. This role is not about traditional Business Intelligence (BI). Instead, it focuses on harnessing data in real time to influence customer/platform behaviour to ensure that the right money is in the right place at the right time.

Our SaaS platform is based on .Net, hosted in Azure. We use Databricks on AWS for our current BI-focused data warehouse.

Applicants with the following systems experience will be preferred:



YOUR RESPONSIBILITIES

Real-Time Data Strategy:
Develop and implement a comprehensive strategy for real-time data processing and analysis to support our product development teams as they work to improve our delivered platform experience with relevant real-time treasury and payment insights.
Collaborate with product squads to embed data insights directly into the platform, enabling real-time decision-making for end-users.
Lead initiatives to integrate advanced analytics and machine learning algorithms into the platform for predictive insights.
Technology Stack Leadership:
Stay abreast of the latest technologies and tools in data analytics, ensuring the platform's data infrastructure remains cutting-edge and scalable.
Collaborate with the engineering team to implement and optimize data pipelines for real-time data processing.
Cross-Functional Collaboration:
Work closely with business stakeholders, product managers, UX/UI designers, and software engineers to understand business requirements and translate them into actionable data insights to support our merchants in their day-to-day interaction with our platform.
Foster a collaborative culture that values data-driven decision-making across all departments.
Performance Monitoring and Optimization:
Deliver production-quality, robust data platforms and pipelines to ensure that our data products are available 24x7 to our global customers.
Establish KPIs for platform performance and user engagement, implementing monitoring systems to track and analyze these metrics in real-time.
Develop strategies to optimize platform performance based on real-time insights and user behaviour.


WHAT YOU NEED :)

Hands-on knowledge and experience of relevant embedded data frameworks. These are examples, but we are keen to hear from candidates who have strong opinions on what a great stack will deliver:
Real-Time Data: Apache Kafka, Flink, Confluent, Redis
Fast Analytics: Amazon Redshift, Azure Functions
ETL: AWS Glue, Azure Data Factory
Data warehouse: Databricks on AWS
Demonstrated success in implementing real-time data solutions and leveraging them to enhance user experiences.
Proven experience (4+ years) in an engineering leadership role, ready to build a small team of experts that will provide enabling frameworks to our product squads.
Excellent stakeholder management and ability to work with entrepreneurs, product owners, sales, and clients.
"""

# prompt (assuming md_resume and job_desciption have been defined)
prompt = f"""
I have a curriculam vitae (CV) formatted in Markdown and a job description. \
Please generate a 2 page Resume adapting my CV to better align with the job requirements while \
maintaining a professional tone. Tailor my skills, experiences, and \
achievements to highlight the most relevant points for the position. \
Ensure that my resume still reflects my unique qualifications and strengths \
but emphasizes the skills and experiences that match the job description.

### Here is my resume in Markdown:
{md_resume}

### Here is the job description:
{job_description}

Please generate the resume to:
- Use keywords and phrases from the job description.
- Adjust the bullet points under each role to emphasize relevant skills and achievements.
- Make sure my experiences are presented in a way that matches the required qualifications.
- Maintain clarity, conciseness, and professionalism throughout.

Return the created resume in Markdown format.
"""

# make api call
response = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "<YOUR_SITE_URL>", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "<YOUR_SITE_NAME>", # Optional. Site title for rankings on openrouter.ai.
  },
  model="deepseek/deepseek-r1:free",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
  ],
  temperature = 0.25
)

# extract response
resume = response.choices[0].message.content

# Save markdown in resume variable to a file
with open('resume.md', 'w') as f:
    f.write(resume)

KeyboardInterrupt: 

In [2]:
# prompt: convert markdown text in "result" variabe to HTML and then to PDF using  markdown and pdfkit libraries, respectively.

!pip install markdown
!pip install pdfkit
import markdown
import pdfkit

# Assuming the 'resume' variable contains the markdown text
html_resume = markdown.markdown(resume)

# Configure pdfkit to use a specific wkhtmltopdf executable path if needed
# config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')

# Convert HTML to PDF
pdfkit.from_string(html_resume, 'resume.pdf') #, configuration=config)
print("resume.pdf has been created")



NameError: name 'resume' is not defined

Collecting pdfkit
  Downloading pdfkit-1.0.0-py3-none-any.whl.metadata (9.3 kB)
Downloading pdfkit-1.0.0-py3-none-any.whl (12 kB)
Installing collected packages: pdfkit
Successfully installed pdfkit-1.0.0


NameError: name 'resume' is not defined