### 🧠 **Automated Resume Parser**
**Project Goal:** Automatically extract structured information (name, skills, education, etc.) from resumes (PDF/DOCX) and store it in a PostgreSQL database.

#### 📌 **Technologies Used**
- Python
- spaCy
- PDFPlumber
- python-docx
- Flask (for optional API integration)
- PostgreSQL

In [None]:
import spacy
import pdfplumber
import docx
import psycopg2
import re

### 📄 **Function to Extract Text from PDF**

In [None]:
def extract_text_from_pdf(file_path):
    text = ""
    with pdfplumber.open(file_path) as pdf:
        for page in pdf.pages:
            text += page.extract_text()
    return text

### 📄 **Function to Extract Text from DOCX**

In [None]:
def extract_text_from_docx(file_path):
    doc = docx.Document(file_path)
    return "\n".join([para.text for para in doc.paragraphs])

### 🧠 **NLP-Based Info Extraction using spaCy**

In [None]:
nlp = spacy.load("en_core_web_sm")

def extract_entities(text):
    doc = nlp(text)
    entities = {"PERSON": [], "ORG": [], "EDUCATION": [], "SKILLS": []}

    for ent in doc.ents:
        if ent.label_ == "PERSON":
            entities["PERSON"].append(ent.text)
        elif ent.label_ == "ORG":
            entities["ORG"].append(ent.text)
    return entities

### ⚙️ **Custom Skill Extraction (Regex/Keyword Match)**

In [None]:
def extract_skills(text, skill_set):
    skills_found = []
    for skill in skill_set:
        if re.search(rf"\b{re.escape(skill)}\b", text, re.IGNORECASE):
            skills_found.append(skill)
    return list(set(skills_found))

skills_db = ['Python', 'Machine Learning', 'SQL', 'Flask', 'NLP']

### 🛢️ **Store Extracted Data in PostgreSQL**

In [None]:
def store_in_db(data):
    conn = psycopg2.connect(
        dbname="resume_db",
        user="postgres",
        password="your_password",
        host="localhost",
        port="5432"
    )
    cursor = conn.cursor()

    insert_query = """
    INSERT INTO resumes (name, skills, education, organization)
    VALUES (%s, %s, %s, %s)
    """
    cursor.execute(insert_query, (data['PERSON'][0], 
                                  ", ".join(data['SKILLS']), 
                                  ", ".join(data['EDUCATION']), 
                                  ", ".join(data['ORG'])))

    conn.commit()
    cursor.close()
    conn.close()

### ✅ **End-to-End Execution**

In [None]:
file_path = "sample_resume.pdf"  # or .docx
text = extract_text_from_pdf(file_path)
entities = extract_entities(text)
entities["SKILLS"] = extract_skills(text, skills_db)

store_in_db(entities)

### 🔍 **Output Example**

In [None]:
print(entities)