In [1]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


This code imports various libraries and tools used for natural language processing (NLP) and text analysis. `pandas` and `numpy` help with data manipulation and numerical operations. `string` provides access to common string operations, while `nltk` (Natural Language Toolkit) is used for tokenizing text (breaking it into words), lemmatizing (reducing words to their base forms), and filtering out common stop words (e.g., "the", "and"). `wordnet` is a lexical database for finding word meanings and relationships. `pos_tag` tags words with their part of speech (like noun or verb). The `TfidfVectorizer` converts text into numerical features based on word frequency and importance, and `cosine_similarity` measures how similar two text documents are based on these features.

In [2]:
import pandas as pd
import numpy as np
import string
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from nltk.corpus import wordnet
from nltk import pos_tag
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
df1 = pd.read_excel("/content/drive/MyDrive/LLM phd/Job Description.XLSX")

In [7]:
df1.head()

Unnamed: 0,Job Title,Job Description
0,Business Intelligence (BI) Developer,"Design, develop, and maintain BI solutions by ..."
1,Business Intelligence (BI) Developer,Develop and maintain business intelligence sol...
2,Business Intelligence (BI) Developer,"Design, implement, and maintain BI solutions b..."
3,Business Intelligence (BI) Developer,Develop and optimize BI solutions by analyzing...
4,Business Intelligence (BI) Developer,"Design, develop, and deploy BI solutions by ga..."


In [6]:
df1.head()["Job Description"]

Unnamed: 0,Job Description
0,"Design, develop, and maintain BI solutions by ..."
1,Develop and maintain business intelligence sol...
2,"Design, implement, and maintain BI solutions b..."
3,Develop and optimize BI solutions by analyzing...
4,"Design, develop, and deploy BI solutions by ga..."


### Data preparation

In [8]:
dfp = df1.drop_duplicates(subset='Job Description')
dfp.head()

Unnamed: 0,Job Title,Job Description
0,Business Intelligence (BI) Developer,"Design, develop, and maintain BI solutions by ..."
1,Business Intelligence (BI) Developer,Develop and maintain business intelligence sol...
2,Business Intelligence (BI) Developer,"Design, implement, and maintain BI solutions b..."
3,Business Intelligence (BI) Developer,Develop and optimize BI solutions by analyzing...
4,Business Intelligence (BI) Developer,"Design, develop, and deploy BI solutions by ga..."


In [9]:
dfp.shape

(180, 2)

In [10]:
dfp = dfp.dropna()
dfp.shape

(180, 2)

In [14]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

### Preprocessing and tokenization

This code defines a function called `my_tokenizer` that processes a piece of text to prepare it for analysis. It first breaks the text into individual words and tags each word with its part of speech (like noun or verb). Then, it removes common words (stopwords) and punctuation. After that, it lemmatizes each remaining word, which means reducing words to their base form (like changing "running" to "run"). To do this correctly, it uses the word’s part of speech to determine the right base form. The result is a list of cleaned and lemmatized words ready for further text analysis.

In [15]:
stopwords_list = stopwords.words('english')

lemmatizer = WordNetLemmatizer()

def my_tokenizer(doc):
    words = word_tokenize(doc)

    pos_tags = pos_tag(words)

    non_stopwords = [w for w in pos_tags if not w[0].lower() in stopwords_list]

    non_punctuation = [w for w in non_stopwords if not w[0] in string.punctuation]

    lemmas = []
    for w in non_punctuation:
        if w[1].startswith('J'):
            pos = wordnet.ADJ
        elif w[1].startswith('V'):
            pos = wordnet.VERB
        elif w[1].startswith('N'):
            pos = wordnet.NOUN
        elif w[1].startswith('R'):
            pos = wordnet.ADV
        else:
            pos = wordnet.NOUN

        lemmas.append(lemmatizer.lemmatize(w[0], pos))

    return lemmas

In [18]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [20]:
import nltk
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [24]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...


True

### Feature Generation

This code sets up a `TfidfVectorizer` using the custom `my_tokenizer` function to process text. The `TfidfVectorizer` is used to convert a collection of job descriptions (from a DataFrame called `dfp`) into a matrix of numerical values. Each number represents the importance of a word in each job description, adjusted for how common the word is across all descriptions. The `fit_transform` method creates this matrix from the text data. Finally, `print(tfidf_matrix.shape)` outputs the size of this matrix, showing how many job descriptions and unique terms are included.

In [25]:
tfidf_vectorizer = TfidfVectorizer(tokenizer=my_tokenizer)
tfidf_matrix = tfidf_vectorizer.fit_transform(tuple(dfp['Job Description']))
print(tfidf_matrix.shape)

(180, 2373)


In [26]:
import joblib

# Save TF-IDF vectorizer
tfidf_vectorizer_filename = '/content/drive/MyDrive/LLM phd/tfidf_vectorizer_model.joblib'
joblib.dump(tfidf_vectorizer, tfidf_vectorizer_filename)

# Save cosine similarity model (tfidf_matrix)
tfidf_matrix_filename = '/content/drive/MyDrive/LLM phd/tfidf_matrix_model.joblib'
joblib.dump(tfidf_matrix, tfidf_matrix_filename)

['/content/drive/MyDrive/LLM phd/tfidf_matrix_model.joblib']

In [27]:
# Function to use the saved models
def load_models(tfidf_vectorizer_filename, tfidf_matrix_filename):
    loaded_tfidf_vectorizer = joblib.load(tfidf_vectorizer_filename)
    loaded_tfidf_matrix = joblib.load(tfidf_matrix_filename)
    return loaded_tfidf_vectorizer, loaded_tfidf_matrix

In [31]:
loaded_tfidf_vectorizer, loaded_tfidf_matrix = load_models(tfidf_vectorizer_filename, tfidf_matrix_filename)

In [33]:
def ask_question_using_saved_models(job_description):
    query_vect = loaded_tfidf_vectorizer.transform([job_description])
    similarity = cosine_similarity(query_vect, loaded_tfidf_matrix)
    max_similarity = np.argmax(similarity, axis=None)

    print('Your Description of Job:', job_description)
    print('Closest Description found:', dfp.iloc[max_similarity]['Job Description'])
    print('Similarity: {:.2%}'.format(similarity[0, max_similarity]))
    print('Job Position Recommended:', dfp.iloc[max_similarity]['Job Title'])


In [34]:
ask_question_using_saved_models("We are looking for a Business Intelligence (BI) Developer to design, implement, and manage BI solutions that facilitate data-driven decision-making. This role involves translating business requirements into technical specifications, creating and deploying data integration processes and ETL workflows, and developing interactive dashboards and reports using BI tools such as Power BI or Tableau. The BI Developer will also optimize SQL queries for efficient data retrieval, analyze data trends to uncover business opportunities and risks, and collaborate with cross-functional teams to ensure BI solutions align with business needs. A Bachelor’s or Master’s degree in Computer Science, Information Systems, or a related field is required, along with strong SQL skills, experience with relational databases, and familiarity with BI tools and data warehousing concepts. The ideal candidate will have excellent analytical, problem-solving, and communication skills, with the ability to work independently and as part of a team.")

Your Description of Job: We are looking for a Business Intelligence (BI) Developer to design, implement, and manage BI solutions that facilitate data-driven decision-making. This role involves translating business requirements into technical specifications, creating and deploying data integration processes and ETL workflows, and developing interactive dashboards and reports using BI tools such as Power BI or Tableau. The BI Developer will also optimize SQL queries for efficient data retrieval, analyze data trends to uncover business opportunities and risks, and collaborate with cross-functional teams to ensure BI solutions align with business needs. A Bachelor’s or Master’s degree in Computer Science, Information Systems, or a related field is required, along with strong SQL skills, experience with relational databases, and familiarity with BI tools and data warehousing concepts. The ideal candidate will have excellent analytical, problem-solving, and communication skills, with the abi

Tinker API for testing

In [53]:
import numpy as np
import ipywidgets as widgets
from IPython.display import display
from sklearn.metrics.pairwise import cosine_similarity

def recommend_job(description):
    if not description:
        return "Please enter a Description of the Job."

    # Transform the input description and compute similarity
    query_vect = loaded_tfidf_vectorizer.transform([description])
    similarity = cosine_similarity(query_vect, loaded_tfidf_matrix)
    max_similarity = np.argmax(similarity, axis=None)

    # Generate the output text
    output_text = f'Closest Description found: {dfp.iloc[max_similarity]["Job Description"]}\n' \
                  f'Similarity: {similarity[0, max_similarity]:.2%}\n' \
                  f'Job Position Recommended: {dfp.iloc[max_similarity]["Job Title"]}'
    return output_text

def on_button_click(b):
    description = text_box.value
    output_text = recommend_job(description)
    output_box.value = output_text

def on_refresh_click(b):
    # Clear the text box and output box
    text_box.value = ""
    output_box.value = ""

# Create widgets
text_box = widgets.Textarea(
    description='Enter Job Description:',
    layout=widgets.Layout(width='70%', height='200px')
)

button = widgets.Button(
    description="Ask for Job Position Recommendation",
    style={'button_color': '#7CF5FF'},
    layout=widgets.Layout(margin='20px 0 20px 0')  # Top margin of 20px
)
button.on_click(on_button_click)

output_box = widgets.Textarea(
    description='Output:',
    layout=widgets.Layout(width='70%', height='200px')
)

refresh_button = widgets.Button(
    description="Refresh",
    style={'button_color': '#FFFBE6'},
    layout=widgets.Layout(margin='20px 0 20px 0')  # Top margin of 20px
)
refresh_button.on_click(on_refresh_click)

# Display widgets
display(text_box, button, output_box, refresh_button)


Textarea(value='', description='Enter Job Description:', layout=Layout(height='200px', width='70%'))

Button(description='Ask for Job Position Recommendation', layout=Layout(margin='20px 0 20px 0'), style=ButtonS…

Textarea(value='', description='Output:', layout=Layout(height='200px', width='70%'))

Button(description='Refresh', layout=Layout(margin='20px 0 20px 0'), style=ButtonStyle(button_color='#FFFBE6')…

In [63]:
pip install firebase-admin



In [64]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


### Firebase Console Setup

In [65]:
import firebase_admin
from firebase_admin import credentials, db

# Path to your Firebase service account key JSON file
cred = credentials.Certificate('/content/drive/MyDrive/LLM phd/feedback-job-recommendation-firebase-adminsdk-n6zig-3863863b60.json')
firebase_admin.initialize_app(cred, {
    'databaseURL': 'https://feedback-job-recommendation-default-rtdb.firebaseio.com/'
})

# Initialize a reference to the database
ref = db.reference('/feedback')


Save Feedback of user

In [80]:
import firebase_admin
from firebase_admin import credentials, db
import ipywidgets as widgets
from IPython.display import display

def initialize_firebase():
    # Check if Firebase is already initialized
    if not firebase_admin._apps:
        cred = credentials.Certificate('/content/drive/MyDrive/LLM phd/feedback-job-recommendation-firebase-adminsdk-n6zig-3863863b60.json')
        firebase_admin.initialize_app(cred, {
            'databaseURL': 'https://feedback-job-recommendation-default-rtdb.firebaseio.com/'
        })
    return db.reference('/feedback')

feedback_ref = initialize_firebase()

def submit_feedback(b):
    user_email = feedback_title_box.value
    user_feedback = feedback_text_box.value

    if user_email and user_feedback:
        feedback_ref.push({
            'user_email': user_email,
            'feedback': user_feedback
        })
        print(f"Feedback for {user_email}: {user_feedback}")
        feedback_title_box.value = ""
        feedback_text_box.value = ""
    else:
        print("Both User email address and Feedback are required.")

feedback_title_box = widgets.Text(
    description='User Email Address:',
    layout=widgets.Layout(width='70%')
)

feedback_text_box = widgets.Textarea(
    description='Feedback:',
    layout=widgets.Layout(width='70%', height='100px')
)

feedback_button = widgets.Button(
    description="Submit Feedback",
    style={'button_color': '#4CAF50'},
    layout=widgets.Layout(margin='20px 0 20px 0')  # Top margin of 20px
)
feedback_button.on_click(submit_feedback)

display(feedback_title_box, feedback_text_box, feedback_button)


Text(value='', description='User Email Address:', layout=Layout(width='70%'))

Textarea(value='', description='Feedback:', layout=Layout(height='100px', width='70%'))

Button(description='Submit Feedback', layout=Layout(margin='20px 0 20px 0'), style=ButtonStyle(button_color='#…

Feedback for f: ff


See privious feedback

In [81]:
import firebase_admin
from firebase_admin import credentials, db
import ipywidgets as widgets
from IPython.display import display

# Initialize Firebase if not already done
def initialize_firebase():
    if not firebase_admin._apps:
        cred = credentials.Certificate('/content/drive/MyDrive/LLM phd/feedback-job-recommendation-firebase-adminsdk-n6zig-3863863b60.json')
        firebase_admin.initialize_app(cred, {
            'databaseURL': 'https://feedback-job-recommendation-default-rtdb.firebaseio.com/'
        })
    return db.reference('/feedback')

# Initialize the feedback reference
feedback_ref = initialize_firebase()

def load_feedback():
    # Retrieve all feedback entries
    feedbacks = feedback_ref.get()

    # Prepare feedbacks for display
    feedback_texts = []
    if feedbacks:
        for key, feedback in feedbacks.items():
            user_email = feedback.get('user_email', 'Unknown')
            feedback_text = feedback.get('feedback', 'No feedback provided')
            feedback_texts.append(f"Email: {user_email}\nFeedback: {feedback_text}\n{'-'*40}")
    else:
        feedback_texts.append("No feedback available.")

    # Update feedback_display_box with the feedbacks
    feedback_display_box.value = "\n".join(feedback_texts)

# Widget to display previous feedbacks
feedback_display_box = widgets.Textarea(
    description='Previous Feedbacks:',
    layout=widgets.Layout(width='70%', height='300px'),
    disabled=True
)

# Load feedbacks when initializing
load_feedback()

# Display the feedback display widget
display(feedback_display_box)


Textarea(value='Email: ssss\nFeedback: aaaaaaaa\n----------------------------------------\nEmail: af\nFeedback…

Testing the APP

In [77]:
import firebase_admin
from firebase_admin import credentials, db
import ipywidgets as widgets
from IPython.display import display
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

class FirebaseManager:
    _app_initialized = False
    _db_ref = None

    @staticmethod
    def initialize_firebase():
        if not FirebaseManager._app_initialized:
            try:
                # Initialize Firebase app only if not already initialized
                cred = credentials.Certificate('paste the path of your json file')
                firebase_admin.initialize_app(cred, {
                    'databaseURL': 'paste your database url'
                })
                FirebaseManager._app_initialized = True
            except ValueError as e:
                # Handle the case where the app is already initialized
                print("Firebase app already initialized:", e)

        if FirebaseManager._db_ref is None:
            FirebaseManager._db_ref = db.reference('/job_data')
        return FirebaseManager._db_ref

# Initialize Firebase
job_data_ref = FirebaseManager.initialize_firebase()

def submit_job_data(description, job_title):
    if description and job_title:
        job_data_ref.push({
            'job_description': description,
            'job_title': job_title
        })
        print("Thank You for using our APP")
    else:
        print("Both Job Description and Job Title are required.")

def ask_question_using_saved_models(job_description):
    query_vect = loaded_tfidf_vectorizer.transform([job_description])
    similarity = cosine_similarity(query_vect, loaded_tfidf_matrix)
    max_similarity = np.argmax(similarity, axis=None)

    closest_description = dfp.iloc[max_similarity]["Job Description"]
    job_title = dfp.iloc[max_similarity]["Job Title"]

    output_prompt = (f'Closest Description found: {closest_description}\n'
                     f'Similarity: {similarity[0, max_similarity]:.2%}\n'
                     f'Job Position Recommended: {job_title}')

    # Save job description and job title to Firebase
    submit_job_data(job_description, job_title)

    # Update output_box with result
    output_box.value = output_prompt

def refresh_fields(b):
    # Clear the text box and output box
    text_box.value = ""
    output_box.value = ""

# Define the widgets
text_box = widgets.Textarea(
    description='Enter Job Description:',
    layout=widgets.Layout(width='70%', height='200px')
)

button = widgets.Button(
    description="Ask for Job Position Recommendation",
    style={'button_color': '#7CF5FF'},
    layout=widgets.Layout(margin='20px 0 20px 0')  # Top margin of 20px
)
button.on_click(lambda b: ask_question_using_saved_models(text_box.value))

output_box = widgets.Textarea(
    description='Output:',
    layout=widgets.Layout(width='70%', height='200px')
)

refresh_button = widgets.Button(
    description="Refresh",
    style={'button_color': '#FFFBE6'},
    layout=widgets.Layout(margin='20px 0 20px 0')  # Top margin of 20px
)
refresh_button.on_click(refresh_fields)

# Display widgets
display(text_box, button, output_box, refresh_button)


Firebase app already initialized: The default Firebase app already exists. This means you called initialize_app() more than once without providing an app name as the second argument. In most cases you only need to call initialize_app() once. But if you do want to initialize multiple apps, pass a second argument to initialize_app() to give each app a unique name.


Textarea(value='', description='Enter Job Description:', layout=Layout(height='200px', width='70%'))

Button(description='Ask for Job Position Recommendation', layout=Layout(margin='20px 0 20px 0'), style=ButtonS…

Textarea(value='', description='Output:', layout=Layout(height='200px', width='70%'))

Button(description='Refresh', layout=Layout(margin='20px 0 20px 0'), style=ButtonStyle(button_color='#FFFBE6')…

Job data saved successfully.
