# 1. Text Cosine Similarity

Given two strings, compute the cosine similarity between their word frequency vectors. Cosine similarity is defined as:
Cosine Similarity=Dot Product of A and BMagnitude of A×Magnitude of B
Cosine Similarity=Magnitude of A×Magnitude of BDot Product of A and B​

Write a function to compute the cosine similarity between two strings. If either string is empty, return 0.

In [9]:
# Import libraries
from collections import Counter
import math

def compute_cosine_similarity(str1, str2):
    """Get the cosine similarity of two strings"""
    freq1 = Counter(str1.lower().split())
    freq2 = Counter(str2.lower().split())
    
    # Get the intersection of the two strings
    dot_prodduct = sum(freq1[word] * freq2[word] for word in freq1.keys() & freq2.keys())
    
    # Mangitude of the two strings
    mag1 = math.sqrt(sum(v ** 2 for v in freq1.values()))
    mag2 = math.sqrt(sum(v ** 2 for v in freq2.values()))
    
    # Cosine similarity
    return round(dot_prodduct / (mag1 * mag2), 3)
    

# Example usage
str1 = "TikTok is a popular platform"
str2 = "TikTok platform is gaining popularity"
print("Cosine Similarity:", compute_cosine_similarity(str1, str2))


Cosine Similarity: 0.6


# 2. Predict Search Query Result Using Logistic Regression

You are given a dataset of user search queries and whether the user clicked on the top result (0 or 1). Write a Python function to train a logistic regression model to predict if a user will click on the top result. Assume the dataset is provided as a CSV file with the following format:

In [10]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

def train_logistic_regression(file_path):
    # Load the dataset
    data = pd.read_csv(file_path)
    
    # Features and labels
    X = data[['query_length', 'word_frequency']]
    y = data['click']
    
    # Split the data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train logistic regression model
    model = LogisticRegression()
    model.fit(X_train, y_train)
    
    # Predict on test set
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    # Output coefficients
    coefficients = model.coef_[0]
    intercept = model.intercept_[0]
    
    print(f"Logistic Regression Coefficients: {coefficients}, Intercept: {intercept}")
    print(f"Test Accuracy: {accuracy}")

# Example usage
train_logistic_regression("./data/dataset.csv")


Logistic Regression Coefficients: [-0.31305418  0.90310142], Intercept: 1.3397208660916742
Test Accuracy: 0.6666666666666666


# 3. Optimize Search Ranking Using Median of User Scores

Write a function that calculates the median relevance score from a list of user ratings for search results. Your task is to sort the list and compute the median efficiently.

In [11]:
def calculate_median(ratings):
    # Sort the ratings
    ratings.sort()
    
    # Calculate the median
    n = len(ratings)
    if n % 2 == 1:
        return ratings[n // 2]
    else:
        mid = n // 2
        return (ratings[mid - 1] + ratings[mid]) / 2

# Example usage
ratings = [4.5, 3.0, 5.0, 2.0, 4.0]
print("Median:", calculate_median(ratings))


Median: 4.0


# 4. Neural Network Basics

Implement a simple neural network forward pass for a single layer with the following inputs:

    Input features: [0.5, 1.5]
    Weights: [0.6, 0.2]
    Bias: 0.1
    Activation function: Sigmoid

In [12]:
import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def forward_pass(inputs, weights, bias):
    # Compute weighted sum
    weighted_sum = sum(i * w for i, w in zip(inputs, weights)) + bias
    
    # Apply activation function
    output = sigmoid(weighted_sum)
    return output

# Example usage
inputs = [0.5, 1.5]
weights = [0.6, 0.2]
bias = 0.1
print("Output of the neural network:", forward_pass(inputs, weights, bias))


Output of the neural network: 0.6681877721681662


# 5. Clustering Search Queries

Implement K-means clustering to group similar search queries based on their length and frequency of specific keywords.

Example Dataset:

In [15]:
from sklearn.cluster import KMeans
import pandas as pd

def kmeans_clustering(file_path, k):
    # Load data
    data = pd.read_csv(file_path)
    
    # Train K-means
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(data)
    
    # Assign clusters
    data['Cluster'] = kmeans.labels_
    return data

# Example usage
file_path = "./data/dataset2.csv"
result = kmeans_clustering(file_path, k=2)
print(result)


   query_length  keyword_frequency  Cluster
0             3               0.25        1
1             5               0.15        0
2             4               0.30        0
3             2               0.40        1
4             6               0.10        0


# Find Missing Number

In [1]:
def find_missing_number(nums):
    n = len(nums) + 1
    total = n * (n + 1) // 2
    return total - sum(nums)

# Test the function
nums = [1, 2, 4, 5, 6]
print("Missing number:", find_missing_number(nums))

Missing number: 3
