# CP Recommendation System


## Content Based filtering

First let's start with content based filtering
Here, we'll be concatenating the tags into a single string and run cosine similarity based on text to find the similarity between this particular problem and other problems


In [5]:
#importing stuff
import requests
import json
from pandas import json_normalize

#load data
tag_set = set()
data = requests.get("https://codeforces.com/api/problemset.problems").json()
problems_json = data["result"]["problems"]

problems = json_normalize(problems_json)
print(problems.isnull().sum())

contestId       0
index           0
name            0
type            0
rating        166
tags            0
points       2099
dtype: int64


As we can see, one of the columns has the name index which is overriding the original column "index", so let's rename it to something else.

In [6]:
column_list = list(problems.columns)
column_list[1] = "ID"
problems.columns = column_list
print(problems.isnull().sum())

contestId       0
ID              0
name            0
type            0
rating        166
tags            0
points       2099
dtype: int64


Now, lets find the different tags that are available in codeforces

In this code cell, we'll be finding all the various different tags avaialable in codeforces and print them in an output file.

In [4]:
# getting the tags
for problem in problems:
    if "tags" in problem:
        for tag in problem["tags"]:
            if not tag in tag_set:
                tag_set.add(tag)

# for printing the tags
with open("output.txt", "w") as out:
    for tag in tag_set:
        print(tag, file=out)

### Get Data From Users
Now, its time to get the data from the users  
The below code gets the submissions of one particular user.

In [7]:
def get_user_submissions(handle):
    start, count = (1, 999)

    user_url = "https://codeforces.com/api/user.status?handle={}&from={}&count={}"
    user_url = user_url.format(handle, start, count)
    user_data = requests.get(user_url).json()

    submissions = user_data["result"]
    df = json_normalize(submissions)
    print(df[df["verdict"] == "OK"]["problem.name"] )
    return df

# df_submission = get_user_submissions("infnite_coder")

Time to get the user dataset, we'll collect the data of all users who participated in atleast one contest.

In [8]:
def get_users_into_csv():
    user_url = "https://codeforces.com/api/user.ratedList"
    user_data = requests.get(user_url).json()

    user_data = user_data["result"]

    df = json_normalize(user_data)
    return df

df_user = get_users_into_csv()

In [9]:
df_user_sliced = df_user[['handle', 'country', 'rank', 'rating', 'maxRating']]
print(df_user_sliced.head())

              handle        country                   rank  rating  maxRating
0            tourist        Belarus  legendary grandmaster    3707       3822
1  Retired_MiFaFaOvO          Samoa  legendary grandmaster    3681       3681
2               Benq  United States  legendary grandmaster    3672       3797
3          Radewoosh         Poland  legendary grandmaster    3655       3720
4             ksun48         Canada  legendary grandmaster    3547       3654


In [8]:
# print(df_user_sliced[df_user_sliced['handle'] == 'infnite_coder'])

### Cosine Similarity
Now we are gonna start our project with content based filtering with cosine similarity

In [9]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def combine_features(row):
    return row["type"]+" "+ " ".join(row["tags"])

problems["combined_features"] = problems.apply(combine_features, axis=1)

count_matrix = CountVectorizer().fit_transform(problems["combined_features"])
cosine_sim = cosine_similarity(count_matrix)

In [None]:
def index_from_name(name):
    return problems[problems.name == name].index

problem_name = "Armchairs"
index_of_prob = index_from_name(problem_name).values[0]
tgs = problems.iloc[index_of_prob].tags

# print(problems[problems["tags"]])