<div style= "font-size:18px; background-color: #4094f7; border: 1px solid ##E1AFD1
; padding: 15px; border-radius: 8p">
  <b>Ready to unlock the secrets of Gemini? Let's embark on your API key journey 👀! </b>
</div>


<a href="https://ibb.co/xhg80bN">
    <img src="https://i.ibb.co/6YtJxSk/Build-with-Gemini-dk-16-9-1-width-1200-format-webp.webp" alt="gemini" style="width: 100%; max-width: 900px; border: 0;">
</a>

# <span style="color: #4094f7; font-family:'Handlee', cursive; font-size: 24px; "> Welcome to this notebook ✨ </span>

<div style="font-family: 'Arial', cursive; font-size: 15px; color: #FFAACF;">
  Where we dive into the powerful capabilities of the Gemini API for AI and data science. Here, you'll discover how the Gemini API can elevate your interview performance with practical questions tailored to your experience level and real-life scenarios. With expert answers provided,and also recommended courses to further enhance your skills, this guide is perfect for advancing your AI models and optimizing data workflows, equipping you with the tools and knowledge needed for success.<br><br>
<span style="color: #4094f7; font-family: 'Handlee', cursive; font-size: 20px; ">    Enjoy the Ride 🔍! </span>
</div>


# <span style="color:#4094f7; font-size: 22px; ">01. Install required libraries</span>

<div style="color: #FFAACF; font-size: 16px;">
    
langchain library facilitates the development of language model-powered applications by providing tools for chaining together large language models with other services and utilities. <br> 
    
Sentence_transformers library is used for embedding sentences into high-dimensional vectors, enabling tasks such as semantic search, clustering, and classification through pre-trained models.
</div>


In [1]:
!pip install langchain
!pip install sentence_transformers



# <span style="color:#4094f7; font-size: 22px;">02. Import libraries</span>


In [2]:
# Import required libraries
from kaggle_secrets import UserSecretsClient
from IPython.display import Markdown
import textwrap
import pandas as pd
import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer
from langchain.prompts import PromptTemplate
import google.generativeai as genai
import torch


# <span style="color:#4094f7; font-size: 22px;">03. Gemini API Key Initialization</span>

<span style="font-size:16px; color:#FFAACF;">
    
    Just follow the steps ↴ 
    
1. Go to [the Google AI Studio](https://ai.google.dev/aistudio)
2. Sign in with your Google account
3. Navigate to the API section
4. Click on "Get API key"
5. Follow the prompts to create a new project or select an existing one
6. Your API key will be generated and displayed 
</span>

In [3]:
# Initialize API key for Google Gemini API
user_secrets = UserSecretsClient()
apiKey = user_secrets.get_secret("GEMINI_API_KEY")
genai.configure(api_key = apiKey)

# <span style="color:#4094f7; font-size: 22px;">04. Load Dataset </span>

<div style="font-size:16px; color:#FFAACF;">
    
About Dataset:<br>
    
A Data Science interview calls for a rigorous interview process where the candidates are judged on various aspects such as technical and programming skills, knowledge of methods, and clarity of basic concepts.<br>
    
In this dataset, you can find three categories ['fresh', 'intermediate', 'senior'] of question base data.
</div>


In [4]:
import pandas as pd 
# Load the dataset
train_df = pd.read_csv('/kaggle/input/data-science-interview-question-data/datascience_questions.csv')
train_df.shape

(80, 3)


# <span style="color:#4094f7; font-size: 22px;">05. Display Dataset </span>


In [5]:
train_df.head()

Unnamed: 0,Sr. no,Category,Questions
0,1,fresh,What Are the Different Types of Machine Learning?
1,2,fresh,"What is Overfitting, and How Can You Avoid It?"
2,3,fresh,What is ‘training Set’ and ‘test Set’ in a Mac...
3,4,fresh,How Do You Handle Missing or Corrupted Data in...
4,5,fresh,How Can You Choose a Classifier Based on a Tra...


In [6]:
category_list = train_df['Category '].unique().tolist()
print(category_list)

['fresh', 'intermediate', 'senior']


In [7]:
sampled_df = train_df.sample(n=10)

# Iterate through the sampled DataFrame and print questions and categories
for index, row in sampled_df.iterrows():
    print('Question Level:', row['Category '])
    print(row['Questions'])
    print('-' * 60)

Question Level: intermediate
What are Loss Function and Cost Functions? Explain the key Difference Between them?
------------------------------------------------------------
Question Level: senior
What is a Box-Cox transformation?
------------------------------------------------------------
Question Level: senior
Both being Tree-based Algorithms, how is Random Forest different from Gradient Boosting Machine (GBM)?
------------------------------------------------------------
Question Level: fresh
What are Recommender Systems?
------------------------------------------------------------
Question Level: senior
What is ROC Curve and what does it represent?
------------------------------------------------------------
Question Level: fresh
What is Clustering?
------------------------------------------------------------
Question Level: senior
In Machine Learning, for how many classes can Logistic Regression be used?
------------------------------------------------------------
Question Level: 

# <span style="color:#4094f7; font-size: 22px;">06.  Text Cleaning </span>

<span style="font-size:16px; color:#FFAACF;">
    
1. Removing the specified end-of-sentence token.
2. Removing double asterisks (**).
3. Removing the pad token.
4. Replacing double spaces with single spaces.
5. Stripping leading and trailing spaces. 
</span>

In [8]:
def clean_text(txt, EOS_TOKEN):
    """Clean text by removing specific tokens and redundant spaces."""
    txt = (txt
           .replace(EOS_TOKEN, "")  
           .replace("**", "")      
           .replace("<pad>", "") 
           .replace("  ", " ")    
          ).strip()                
    return txt

# <span style="color:#4094f7; font-size: 22px;">07. Answer Generation </span>

In [9]:
def generate_answer_with_api(question):
    """Generate an answer using the external API."""
    try:
        model = genai.GenerativeModel('gemini-pro')
        response = model.generate_content(question)  
        answer = response.text  
        return clean_text(answer, '')  
    except Exception as e:
        print(f"Error generating answer with API: {e}")
        return "Sorry, there was an error generating the answer."


# <span style="color:#4094f7;  font-size: 22px;">08. Recommendations Generation </span>

<div style="font-size:16px; color:#FFAACF;">
Utilizes the Gemini API to provide unique and relevant resource recommendations based on a technical question. It delivers either courses or articles that aid in understanding the topic.
</div>


In [10]:
def generate_recommendations_with_api(question):
    """Generate exactly three resource recommendations using the external API."""
    try:
        model = genai.GenerativeModel('gemini-pro')
        prompt = (f"Based on the following technical question, provide exactly three unique and relevant "
                  "courses without links if it’s a course; just mention the website and the name of the course, "
                  "or articles that would help in understanding the topic, without any repetition:\n\n"
                  f"Question: {question}")
        response = model.generate_content(prompt) 
        
        return response.text
    except Exception as e:
        print(f"Error generating recommendations with API: {e}")
        return "Sorry, there was an error generating recommendations."

# <span style="color:#4094f7; font-size: 22px;">09. Smart Interview Generator</span>


In [11]:
def technical_interview(Category):
    """Conduct an interview based on experience level."""
    # Normalize the input Category
    category_mapping = {
        'fresh': ['fresh', 'junior', 'entry level'],
        'intermediate': ['intermediate', 'mid', 'mid level'],
        'senior': ['senior']
    }

    # Find the matching category
    normalized_category = None
    for key, values in category_mapping.items():
        if Category.lower() in values:
            normalized_category = key
            break

    if normalized_category is None:
        print("Invalid Category. Please choose from 'fresh', 'intermediate', or 'senior'.")
        return
    
    # Filter questions based on the normalized Category
    filtered_questions = train_df[train_df['Category '] == normalized_category]
    
    # Sample a number of questions based on the normalized Category
    if normalized_category == 'fresh':
        num_questions = 3
    elif normalized_category == 'intermediate':
        num_questions = 5
    elif normalized_category == 'senior':
        num_questions = 7
    
    selected_questions = filtered_questions.sample(n=min(num_questions, len(filtered_questions)))
    
    # Conduct the interview
    for index, row in selected_questions.iterrows():
        question = row['Questions']
        print(f"Question: {question}")
        
        # Get answer from  API
        answer = generate_answer_with_api(question)
        print("\nAnswer:")
        print(answer)
        
        # Provide recommended courses or articles
        recommendations = generate_recommendations_with_api(question)
        print("\nRecommended Resources:")
        print(recommendations)
        
        print("\n" + "-"*50 + "\n")

# <span style="color:#4094f7; font-size: 22px;">10. Interview Time  </span>


<div style="font-size:16px; color:#FFAACF;">
   And Now!!<br>
    
Enter your experience level 👀 and get ready with impactful interview questions and answers✨
</div>


In [12]:
Category = 'junior' #input("Enter the Your Experience level (e.g., junior, mid, senior): ")
technical_interview(Category)

Question: What is randonm forest and how can you differentiate with other classification algorithm

Answer:
Random Forest Algorithm

Random Forest is a supervised machine learning algorithm that combines multiple decision trees to improve prediction accuracy and prevent overfitting. It works as follows:

1. Bootstrapping: A random sample (with replacement) is drawn from the original training set multiple times.
2. Decision Tree Construction: A decision tree is built using each bootstrap sample. Tree growth is limited to a random subset of features at each split.
3. Voting: Each data point is predicted by each of the decision trees constructed. The majority prediction becomes the final prediction of the Random Forest.

Differentiation from Other Classification Algorithms

1. Support Vector Machines (SVMs)

* Similarities: Both are supervised classification algorithms that can handle non-linear data.
* Differences:
  * SVMs find the optimal decision boundary using a kernel function, whil