The field of digital identity innovation has grown significantly over the last 30 years, with over 6,000 technology patents registered worldwide. However, many questions remain about who controls and owns our digital identity and intellectual property and, ultimately, where the future of digital identity is heading.
To investigate this further, this research mines digital identity patents and explores core themes such as identity, systems, privacy, security, and emerging fields like blockchain, financial transactions, and biometric technologies. Utilizing natural language processing (NLP) methods, including part-of-speech tagging, clustering, topic classification, noise reduction, and lemmatisation techniques. Finally, the research employs graph modelling and statistical analysis to discern inherent trends and forecast future developments.
The findings significantly contribute to the digital identity landscape, identifying key players, emerging trends, and technological progress. This research serves as a valuable resource for academia and industry stakeholders, aiding in strategic decision-making and investment in emerging technologies and facilitating navigation through the dynamic realm of digital identity technologies.
This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.
The following prerequistes are required:
- Jupyter notebook
- Python
- Python packages:
In this section we will outline key commands used for the analysis. Note that more comments are available in the notebook itself
- Output Patent Leaders
The following command will output the top 12 entities by patent count over the given dataframeoutput_leaders(wd)
- Display Entity
This code will display patent counts for the entire dataset, period 1996-2020 and post 2020. In addition it will output the first and last patents within the dataframedisplay_entity(wdfull, "Diebold", [], [])
- Display Entity with Keywords
This code will display patent count metrics for each of the provided key term search wordsdisplay_entity(wdfull, "Diebold", ["privacy","trust"], [])
- Display Entity with Additional Ranges
This code will allow you to specify a subrange e.g. in this case years that Kim Cameron was on staffdisplay_entity(wdfull, "Microsoft", ["transaction", "game", "privacy", "trust"], [("Kim Cameron", 1999, 2011, [])])
- Output Patents for Entity
This code will output a list of patents with patent hyperlinks by entity nameprint_all_patents(wdfull, "Microsoft")
- Model Keywords
This code will output base statistics for a search term including percentage movement, mean, std, error and will also output a chart with trend over timeselected_words = {'certificate','storage','privacy','communication','encryption'} # Example set of words
colors = generate_dark_colors(len(selected_words))
plot_graph(selected_words, 'TF-IDF','Security',3, -0.00002)
Matthew Comb - matthew.comb@linacre.ox.ac.uk
Project Link: https://github.com/oxford-mc/patent-mining