This project was conducted by Anh Ta, Erin McGowan, and Maksat Kuanyshbay as a part of the curriculum for the Machine Learning (CSCI-GA.2565-001) course at the New York University Courant Institute of Mathematical Sciences.
-
HUPD_metadata_preliminary_analysis.ipynb: This file contains the code that was used to conduct our preliminary analysis of the 25,000 patents we sampled from the larger HUPD dataset to determine if the “filing date,” “examiner art unit,” “ipc label”, “foreign,” “small entity indicator,” and “aia first to file” metadata variables were actually correlated with patent acceptance rate.
-
BERT_model_benchmark.ipynb: An implementation of BERT fine-tuned on the Harvard USPTO Patent Dataset, which we used as a benchmark for our PatentLLM model.
-
PatentLLM.ipynb: Our hierarchical transformer-based model for patent acceptance prediction, trained on a subset of the Harvard USPTO Patent Dataset.
-
PatentLLM_with_Metadata.ipynb: An augmented version of our hierarchical transformer-based model for patent acceptance prediction that incorporates metadat variables, trained on a subset of the Harvard USPTO Patent Dataset.