This project aims to search uspto patents data, to find patents similar to a given patent.
Refer to the data approach
This stage of the project can be marked as 3.5/5 complete. We have normalised data. We have made a wrapper python class which can be used for direct population to PGSQL.
A sample is created with minimal tuning.Accuracy may not be great currently. Refer to get_similar
To improve accuracy, one needs to do the following.
- Create a validation procedure.
- Create a cluster model, then a topic model and possibly within the existing patent classifications. One needs to iterate to see whats the best approach based on the specifics of the patent data as such.
conda env create -f dev_env.yml
source activate patentsearch