Scrape Questions and metadata


In [1]:
from leetscrape.GetQuestionsList import GetQuestionsList
from leetscrape.GetQuestionInfo import GetQuestionInfo
from leetscrape.utils import (
    combine_list_and_info,
    get_all_questions_body,
)
import pandas as pd
import pickle
from sqlalchemy import create_engine, update
from dotenv import dotenv_values


Query the question list


In [4]:
ls = GetQuestionsList()
ls.scrape()  # Scrape the list of questions
ls.to_csv(directory_path="../data_test/")  # Save the scraped tables to a directory


Note: The default ALL_JSON_URL might be out-of-date. Please update it by going to https://leetcode.com/problemset/all/ and exploring the Networks tab for a query returning all.json.
Scraping companies ... Done
Scraping questions list ... Done
Extracting question topics ... Done
Getting Categories ... Done
Scraping Topic Tags ... Done
Extracting question category ... Done


Query Individual question info such as the body, test cases, constraints, hints, code stubs, etc.


In [1]:
from leetscrape.GetQuestionInfo import GetQuestionInfo
qi = GetQuestionInfo(titleSlug="two-sum")
qi.scrape()

1. two-sum
Hints:
    0. A really brute force way would be to search for all possible pairs of numbers but that would be too slow. Again, it's best to try out brute force solutions for just for completeness. It is from these brute force solutions that you can come up with optimizations.
    1. So, if we fix one of the numbers, say <code>x</code>, we have to scan the entire array to find the next number <code>y</code> which is <code>value - x</code> where value is the input parameter. Can we change our array somehow so that this search becomes faster?
    2. The second train of thought is, without changing the array, can we use additional space somehow? Like maybe a hash map to speed up the search?
SimilarQuestions: [15, 18, 167, 170, 560, 653, 1099, 1679, 1711, 2006, 2023, 2200, 2351, 2354, 2367, 2374, 2399, 2395, 2441, 2465]
Given an array of integers `nums` and an integer `target`, return
*indices of the two numbers such that they add up to `target`*.

You may assume that each input 

In [2]:
# # This table can be generated using the previous commnd using
# # questions_info = ls.questions
# questions_info = pd.read_csv("../data/questions.csv")

# # Scrape question body
# questions_body_list = get_all_questions_body(
#     questions_info["titleSlug"].tolist(),
#     questions_info["paidOnly"].tolist(),
#     save_to="../data/questionBody.pickle",
# )

# # Save to a pandas dataframe
# questions_body = pd.DataFrame(
#     questions_body_list
# ).drop(columns=["titleSlug"])
# questions_body["QID"] = questions_body["QID"].astype(int)


""" Run the above code stub once and save the data as a pickle file. Using this data
from now on since the above code stub is time consuming. """

with open("../data/questionBody.pickle", "rb") as f:
    data = pickle.load(f)
questions_body = pd.DataFrame(data).drop(columns=["titleSlug"])
questions_body["QID"] = questions_body["QID"].astype(int)


Create a new dataframe with all the questions with their metadata and body information.


In [6]:
questions = combine_list_and_info(
    info_df=questions_body, list_df=ls.questions, save_to="../data/all.json"
)


Data Upload to SUPABASE table|s


In [7]:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from dotenv import dotenv_values

config = dotenv_values("../../.env")


In [22]:
engine = create_engine(
    f"postgresql://{config['SUPABASE_USERNAME']}:{config['SUPABASE_PASSWORD']}@{config['SUPABASE_HOSTNAME']}:{config['SUPABASE_PORT']}/{config['SUPABASE_DBNAME']}",
    echo=True,
)
questions.to_sql(con=engine, name="questions", if_exists="append", index=False)
ls.topicTags.to_sql(con=engine, name="topic_tags", if_exists="append", index=False)
ls.categories.to_sql(con=engine, name="categories", if_exists="append", index=False)
ls.companies.to_sql(con=engine, name="companies", if_exists="append", index=False)
ls.questionTopics.to_sql(
    con=engine, name="question_topics", if_exists="append", index=True, index_label="id"
)
ls.questionCategory.to_sql(
    con=engine,
    name="question_category",
    if_exists="append",
    index=True,
    index_label="id",
)


Generate questions and test cases

In [1]:
from leetscrape.GenerateCodeStub import GenerateCodeStub, parse_args

fcs = GenerateCodeStub(qid=10)
fcs.generate_code_stub_and_tests()

('aa', 'a', False), ('aa', 'a*', True), ('ab', '.*', True) s, p, output
Code stub save to q_0010_regularExpressionMatching.py
Test file written to test_q_0010_regularExpressionMatching.py.py


In [43]:
import re
test = """
Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".
"""
m = re.search("Output: (.*)\n", test.replace("true", "True").replace("false", "False"))
parse_args(
    m.group(0).replace("Output: ", "Output= ").replace("\n", "")
)


{'Output': True}

Extract solutions from py file and upload them to the database

In [1]:
from leetscrape.ExtractSolutions import ExtractSolutions, upload_solutions
from sqlalchemy import create_engine
from dotenv import dotenv_values
config = dotenv_values("../../.env")

In [7]:

eng = create_engine(
    f"postgresql://{config['SUPABASE_USERNAME']}:{config['SUPABASE_PASSWORD']}@{config['SUPABASE_HOSTNAME']}:{config['SUPABASE_PORT']}/{config['SUPABASE_DBNAME']}",
    echo=False,
)
es = ExtractSolutions("D:\code\Leetcode\Leetcode-solutions\questions\q_0007_reverseInteger.py")
sols = es.extract()
# upload_solutions(eng,7, sols)

In [9]:
sols[0]['docs']['args']

[{'args': ['param', 'x (int)'],
  'description': 'A signed 32-bit integer whose digits need to be reversed.',
  'arg_name': 'x',
  'type_name': 'int',
  'is_optional': False,
  'default': None}]