This repository contains Python recipes to process and topic model the City-Data.com Corpus (Omizo, 2023).
Ryan M. Omizo
Python code to calculate topical diversity is from Terragni (2023). All rights reserved.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (n.d.). Optimizing Semantic Coherence in Topic Models.
Omizo, R. (2023). City-Data.com Corpus [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10086354.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., & Cournapeau, D. (n.d.). Scikit-learn: Machine Learning in Python. MACHINE LEARNING IN PYTHON.
Řehůřek, R., & Sojka, P. (2011). Gensim—statistical semantics in python. Retrieved from genism.org.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
Terragni, S. (2023). A collection of Topic Diversity measures for topic modeling. [Python]. https://github.com/silviatti/topic-model-diversity (Original work published 2020)