Skip to content

Analysis of Clinical Trial Dataset using Dataframes on PySpark

Notifications You must be signed in to change notification settings

quadrantofsola/PySpark_Dataframes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Analysis of Clinical Trial Dataset - Dataframes implementation on Pyspark.

The Clinical Dataset is a 2021 dataset by Mesh and Pharma.

I've answered questions such as;

-The number of studies in the dataset.

-The types of studies in the dataset and count of each type.

-The top 5 conditions with their frequency.

-The most frequent roots.

-The 10 most common sponsors that are not pharmaceutical companies.

-The number of completed studies each month in a given year.

About

Analysis of Clinical Trial Dataset using Dataframes on PySpark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published