Data Cleaning & Transformation of Rheeza Pharmaceuticals data with PySpark

Rheeza Clinics & Pharmaceuticals conducted a research on Naproxen - a drug that has been claimed to normalize the blood pressure of teens and young adults. The trial was carried out in three of their clinic branches on over 2000 individuals of mixed genders from ages 14 - 22 between February and May 2021. This trial involved two groups - The in-active (Placebo) and active (Naproxen) groups to test the effect of the actual drug (Naproxen). Records of all procedures were kept; and extracted from the storage for the ML Engineers to develop algorithms for the next stages of the experiment. It was discovered that the dataset needed some Engineering to be performed on it for easier access by the Clinicians & ML Engineers.

Dataset

json

Tools

Python Pyspark

Initial Schema

        root
        |-- ageofparticipant: long (nullable = true)
        |-- clinician: struct (nullable = true)
        |    |-- branch: string (nullable = true)
        |    |-- name: string (nullable = true)
        |    |-- role: string (nullable = true)
        |-- drug_used: string (nullable = true)
        |-- experimentenddate: string (nullable = true)
        |-- experimentstartdate: string (nullable = true)
        |-- noofhourspassedatfirstreaction: long (nullable = true)
        |-- result: struct (nullable = true)
        |    |-- conclusion: string (nullable = true)
        |    |-- sideeffectsonparticipant: string (nullable = true)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
clinician		clinician
ml_engineers		ml_engineers
dataset.json		dataset.json
index.ipynb		index.ipynb
readme.MD		readme.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clinician

clinician

ml_engineers

ml_engineers

dataset.json

dataset.json

index.ipynb

index.ipynb

readme.MD

readme.MD

Repository files navigation

Data Cleaning & Transformation of Rheeza Pharmaceuticals data with PySpark

Dataset

Tools

Initial Schema

About

Releases

Packages

Languages

razurpenet/data_cleaning_with_pyspark

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning & Transformation of Rheeza Pharmaceuticals data with PySpark

Dataset

Tools

Initial Schema

About

Resources

Stars

Watchers

Forks

Languages