This repository contains the code and related resources for the project focused on the computational representation of cellular lines using text mining techniques.
The project aims to derive a computational representation of cellular lines by leveraging data from the Cellosaurus database and PubMed. The methodology involves text mining techniques, feature extraction, and the construction of a dendrogram to represent the relationships between various cell lines.
data/
: Directory containing raw data files from Cellosaurus and PubMed.dataret.ipynb
: Code for extracting data from the databases.dataproc.ipynb
: Code for processing retrieved data and constructing the hierarchical representation of cell lines.
- Clone the repository:
git clone https://github.com/ivan-carrera/engproc2023.git