Skip to content
View samuel-ntsua's full-sized avatar

Block or report samuel-ntsua

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
samuel-ntsua/README.md

Hi there, I'm Samuel Ntsua

Analyst at UNC-Chapel Hill

I am enthusiastic about data, around which I formulate reliable and rational arguments to transform business rules and concepts from often ambiguous and incomplete instruction into a working programming logic. At my current job, I harvest, move, transform, and store data while automating the process. I write scripts in bash, PowerShell, python, SQL, and Stata to build multi-panel and hierarchical datasets out of administrative data and survey sampling data. I am seeking an opportunity to join a data team at mid-career level as Data Scientist, Data Engineer, or Machine Learning Engineer to propel the team's efforts and challenge myself in a production environment.

Skills

Deep Learning | R (Programming Language) | A/B Testing | MySQL | PostgreSQL | Python (Programming Language) | Amazon Web Services (AWS) | Amazon Dynamodb | Amazon S3 | Amazon API Gateway | STATA | SAS Programming | Linux/Bash/SSH | Rsync | Globus | Hadoop | Apache Spark | Git |Tableau Desktop | Anaconda(Jupyter,Spyder,Pandas,R-RStudio,dplyr) | Machine Learning | Big Data Analytics | Data Analysis | Linear Regression | Data Collection | Statistical Modeling | Microeconometrics

Connect With Me

github linkedin

trophy

Top Langs

Anurag's GitHub stats

GitHub metrics

GitHub streak stats

Profile views

Stock Exchange Data Analysis using Big-Data tools such as Hadoop, HIVE and Sqoop.

Readme Card

Objectives

  • To use HIVE and Sqoop features for data engineering or analysis and sharing the actionable insights.

Technology/Techniques Used

  • python3 mysql hiveQL hue-api hadoop-hdfs sqoop-import

DataScience_Capstone_Project

Readme Card

Objectives

  • Predict whether or not a patient has diabetes , based on certain diagnostic measurements included in the dataset.
  • Build a model to accurately predict whether the patients in the dataset have diabetes or not.

Technology/Techniques Used

  • Pandas NumPy machine-learning-algorithms scikit-learn xgboost missing-values analysis dimensionality reduction seaborn-plots extratrees GitLab

Mercedes-Benz Greener Manufacturing

Readme Card

Objectives

  • Used Xgboost to narrow down features, yet get a good prediction of vehicule safety standard, thus reducing the time a Mercedes-Benz spends on the test bench.

Technology/Techniques Used

  • Pandas NumPy machine-learning-algorithms scikit-learn xgboost label encoder dimensionality reduction seaborn-plots GitLab

Data Science with R Programming

Readme Card

Objectives

  • To record the patient statistics, the agency wants to find the age category of people who frequent the hospital and has the maximum expenditure.
  • In order of severity of the diagnosis and treatments and to find out the expensive treatments, the agency wants to find the diagnosis related group that has maximum hospitalization and expenditure.
  • To make sure that there is no malpractice, the agency needs to analyze if the race of the patient is related to the hospitalization costs.
  • To properly utilize the costs, the agency has to analyze the severity of the hospital costs by age and gender for proper allocation of resources. Since the length of stay is the crucial factor for inpatients, the agency wants to find if the length of stay can be predicted from age, gender, and race.
  • To perform a complete analysis, the agency wants to find the variable that mainly affects the hospital costs.

Technology/Techniques Used

  • r-programming-language/rstudio supervised learning linear regression GitLab

DataScience_with_Python

Readme Card

Objectives

Technology/Techniques Used

  • Pandas NumPy supervised learning linear regression scikit-learn xgboost seaborn-plots GitLab

Tableau_project

Readme Card

Objectives

Compute and display a Country's economic growth indicator as well as the percentage of it's population who purchased life insurance.

Technology/Techniques Used

  • Tableau public growth-kpi linear-trend kpi-dashboard data merge statistical measures computation

Pinned Loading

  1. DataScience_Capstone_Project DataScience_Capstone_Project Public

    Data Science and Machine Learning using Python and R

    Jupyter Notebook

  2. Big-Data_Hadoop_and_Spark_Developer Big-Data_Hadoop_and_Spark_Developer Public

    Build a data pipeline (using hadoop-hdfs, sqoop, hiveql) for data analysis out of an ambiguous and incomplete instruction.

  3. Machine_Learning_with_Python Machine_Learning_with_Python Public

    Reduce the time a Mercedes-Benz spends on the test bench.

    Jupyter Notebook

  4. DataScience_with_R-programming DataScience_with_R-programming Public

    Data Science and Machine Learning using Python and R

    Jupyter Notebook 1

  5. Tableau_project Tableau_project Public

    Compute and display a Country's economic growth indicator as well as the percentage of it's population who purchased life insurance.

  6. DataScience_with_Python DataScience_with_Python Public

    Used Python Programming Language for Data Science Project.