Skip to content

Explotarory Data Analysis on the cancer hospital records database, a dataset formed from data generated by several hospital institutions located in the State of São Paulo (Brazil).

License

Notifications You must be signed in to change notification settings

skepticalchemist/Cancer-hospital-records

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cancer hospital records database

The cancer hospital records (RHC) database is formed from data generated by several hospital institutions located in the State of São Paulo (Brazil) being coordinated by the Fundação Oncocentro de São Paulo (FOSP). The dataset is well documented, but this is only available in Portuguese.

The RHC:

  • started its activities on 01/01/2000;
  • has quarterly update;
  • had their records anonymized;
  • is formed by analytical cases (those who arrived at the institution without treatment, diagnosed or not);

RHC objectives:

  • know and improve the care provided to cancer patients during all stages of treatment;
  • expand access to the state base and offer the possibility of performing specific tabulations and analyzes.

Pediatric tumors differ in location and morphology to tumors in adults and can be analyzed according to the International Classification of Cancer in Childhood (CICI), which defines the largest diagnostic groups.

The above information was adapted from the FOSP.

Objective

Understand the data and make it more machine learnable.

This joint project was started at Information Technology Center Renato Archer.

Team members:

Contents

Notebooks are located in the /notebooks directory:

  • 1_Glossary.ipynb: short description in English of the dataset variables
  • 2_EDA.ipynb: presents an exploratory data analysis (EDA) of the dataset

The /src directory contains some python code:

  • format_conversion.py: helps to convert the dataset from .dbf to .csv
  • utility.py: contains some functions that help to explore the dataset

How to use it

  1. Create a virtual environment of your choice
  2. Clone this github repository
  3. Install project depencies:
pip install -r requeriments.txt
  1. Inside the repository, create a new directory called data
  2. Download the dataset from the FOSP website and store it in the /data
  3. Convert the dataset from .dbf (original format) to .csv format
    • The python code and dataset are located in the subfolders /usr and /data, respectively.
    • To convert the file just run the following command on a terminal paying attention to the respective file paths:
python format_conversion.py --input "dbf file name" --output  "csv file name"

About

Explotarory Data Analysis on the cancer hospital records database, a dataset formed from data generated by several hospital institutions located in the State of São Paulo (Brazil).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published