Skip to content

Analysis of Clinical Trial Dataset using PySpark RDD implementation.

Notifications You must be signed in to change notification settings

quadrantofsola/PySpark_RDD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

##Analysis of Clinical Trial Dataset using Pyspark RDD Implementation.

The Clinical Dataset is a 2021 dataset by Mesh and Pharma.

I've answered questions such as;

-The number of studies in the dataset.

-The types of studies in the dataset and count of each type.

-The top 5 conditions with their frequency.

-The most frequent roots.

-The 10 most common sponsors that are not pharmaceutical companies.

-The number of completed studies each month in a given year.

About

Analysis of Clinical Trial Dataset using PySpark RDD implementation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published