I am enthusiastic about data, around which I formulate reliable and rational arguments to transform business rules and concepts from often ambiguous and incomplete instruction into a working programming logic. At my current job, I harvest, move, transform, and store data while automating the process. I write scripts in bash, PowerShell, python, SQL, and Stata to build multi-panel and hierarchical datasets out of administrative data and survey sampling data. I am seeking an opportunity to join a data team at mid-career level as Data Scientist, Data Engineer, or Machine Learning Engineer to propel the team's efforts and challenge myself in a production environment.
Deep Learning | R (Programming Language) | A/B Testing | MySQL | PostgreSQL | Python (Programming Language) | Amazon Web Services (AWS) | Amazon Dynamodb | Amazon S3 | Amazon API Gateway | STATA | SAS Programming | Linux/Bash/SSH | Rsync | Globus | Hadoop | Apache Spark | Git |Tableau Desktop | Anaconda(Jupyter,Spyder,Pandas,R-RStudio,dplyr) | Machine Learning | Big Data Analytics | Data Analysis | Linear Regression | Data Collection | Statistical Modeling | Microeconometrics
- To use HIVE and Sqoop features for data engineering or analysis and sharing the actionable insights.
python3
mysql
hiveQL
hue-api
hadoop-hdfs
sqoop-import
- Predict whether or not a patient has diabetes , based on certain diagnostic measurements included in the dataset.
- Build a model to accurately predict whether the patients in the dataset have diabetes or not.
Pandas
NumPy
machine-learning-algorithms
scikit-learn
xgboost
missing-values analysis
dimensionality reduction
seaborn-plots
extratrees
GitLab
- Used Xgboost to narrow down features, yet get a good prediction of vehicule safety standard, thus reducing the time a Mercedes-Benz spends on the test bench.
Pandas
NumPy
machine-learning-algorithms
scikit-learn
xgboost
label encoder
dimensionality reduction
seaborn-plots
GitLab
- To record the patient statistics, the agency wants to find the age category of people who frequent the hospital and has the maximum expenditure.
- In order of severity of the diagnosis and treatments and to find out the expensive treatments, the agency wants to find the diagnosis related group that has maximum hospitalization and expenditure.
- To make sure that there is no malpractice, the agency needs to analyze if the race of the patient is related to the hospitalization costs.
- To properly utilize the costs, the agency has to analyze the severity of the hospital costs by age and gender for proper allocation of resources. Since the length of stay is the crucial factor for inpatients, the agency wants to find if the length of stay can be predicted from age, gender, and race.
- To perform a complete analysis, the agency wants to find the variable that mainly affects the hospital costs.
r-programming-language/rstudio
supervised learning
linear regression
GitLab
Pandas
NumPy
supervised learning
linear regression
scikit-learn
xgboost
seaborn-plots
GitLab
Compute and display a Country's economic growth indicator as well as the percentage of it's population who purchased life insurance.
Tableau public
growth-kpi
linear-trend
kpi-dashboard
data merge
statistical measures computation