Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 937 Bytes

README.md

File metadata and controls

19 lines (13 loc) · 937 Bytes

Computational Representation of Cellular Lines: A Text Mining Approach

This repository contains the code and related resources for the project focused on the computational representation of cellular lines using text mining techniques.

Overview

The project aims to derive a computational representation of cellular lines by leveraging data from the Cellosaurus database and PubMed. The methodology involves text mining techniques, feature extraction, and the construction of a dendrogram to represent the relationships between various cell lines.

Contents

  • data/: Directory containing raw data files from Cellosaurus and PubMed.
  • dataret.ipynb: Code for extracting data from the databases.
  • dataproc.ipynb: Code for processing retrieved data and constructing the hierarchical representation of cell lines.

Setup

  1. Clone the repository:
    git clone https://github.com/ivan-carrera/engproc2023.git