Skip to content

Some of the statistics, machine learning, and economics research projects I worked on at BYU and CMU

Notifications You must be signed in to change notification settings

mpudil/projects

Repository files navigation

Statistics, Machine Learning, and Economics Projects

This repository is a (more-or-less) comprehensive list of the projects I have worked on as a student in statistics and data science at BYU and CMU since 2017. All the projects are tagged with the following topical designations:

  1. Statistical Computing Algorithms, data structures, recursion, object-oriented programming, web-scraping
  2. Statistical Modeling Projects involving statistical modeling
  3. Machine Learning and NLP Machine learning and NLP projects: constructing ML algorithms from scratch, dimensionality reduction, unsupervised, and supervised learning.
  4. Economics Projects Various projects related to economics at BYU, Cambridge, and CMU
  5. SQL SQL practice and challenges. Note that most of my experience with SQL comes from my internship with OrderBoard in summer 2019.

Featured projects:

ProjectLanguage(s)Method(s)Description
English ProficiencyRNLP, PCA, Random Forest, ShinyDetermine probability of individual passing the TOEFL exam. Includes GUI interface for student to write essay and PowerPoint.
Random ForestPython, SQLRandom Forest, object-oriented programmingCreate Python Random Forests and SQL decision trees from scratch
Sound of MusicRMixed models, hierarchical modelingDetermine factors that affect how people interpret music genre. Includes paper.

All projects:

Statistical Computing

ProjectLanguage(s)Method(s)DescriptionDate
Closest PairRDivide and conquerDetermine closest pair of points from given setNov 2019
Das BlinkenlightsRData structures, recursion, command line wrapperModular arithmetic problem with cowsOct 2019
Tree BuilderRRecursion, object-oriented programmingBuild binary classification treeSep 2019
Web ScrapingRWeb-scraping, regular expressions, automatic e-mailWeb-scrape Walmart and Glassdoor websitesJuly 2019

Statistical Modeling

ProjectLanguage(s)Method(s)DescriptionDateIncludes Paper
Sound of MusicRMixed modelsDetermine factors that affect how people interpret music genreNov 2019Yes
Particulate MatterRLogistic mixed-effects, ROCDetermine effectiveness of particulate matter detectorsApr 2019Yes
Macular DegenerationRLongitudinal MLR, optimDetermine causes of age-related macular degenerationApr 2019
Land AnalysisRSpatial modeling, imputationDetermine effects of increased temperature; Create and map temperature at locations impeded by cloud coverageMar 2019
Food ExpendituresRGLS, fixing heteroskedasticityEstimate effect of income on eating outMar 2019
Statistics PedagogyRGLSDetermine relevance of class activities on student gradesFeb 2019Yes
Game of ThronesRTime series (SARIMA)Predict Game of Thrones viewershipFeb 2019
GreenhouseR, SASLinear regressionDetermine effect of various gases on average global temperatureFeb 2019
Climate ChangeRTime series (SARIMA)Predict climate change for next 5 yearsFeb 2019
AvalancheR, SASPoisson RegressionModel the number of avalanches in UtahJan 2019
Student GradesSASData summarization in SASCreate reports for student grades in SASDec 2018
Myocardial InfarctionRGLM, ROC/AUCDetermine causes of heart attacksNov 2018
Cardiovascular HealthRLongitudinal modelsDetermine causes of TachycardiaNov 2018
BirthweightsRLinear regression, cross validationDetermine factors that lead to a change in baby birthweightSep 2018
STEMRLogistic mixed-effects, ROCDetermine influencers of whether or not students remain in STEM majorsSep 2018

Machine Learning and NLP

ProjectLanguage(s)Method(s)DescriptionDate
English ProficiencyRNLP, PCA, Random Forest, ShinyDetermine probability of individual passing the TOEFL exam. Includes GUI interface for student to write essay.Jan 2020
StylometricsRNLP, PCA, Random ForestDetermine distinguishability of authors in Book of MormonDec 2019
Information RetrievalRNLP, PCAUse bag of words to search and cluster text dataOct 2019
Dimensionality ReductionPythonHierarchical clustering, t-SNE, clusteringClassify written numbers (MNIST)Nov 2018
PovertyPythonLogistic regression, Naive Bayes, Random Forest, K-Nearest NeighborsDetermine causes of poverty in Costa RicaNov 2018
Housing PricesPythonSGD, Lasso, Kernel Ridge, K Nearest Neighbors, feature engineering, train-test-splitPredict Housing Prices (Supervised learning)Oct 2018

Economics Projects

ProjectLanguage(s)Method(s)DescriptionDateIncludes Paper
Per Capita IncomeRLinear regression, feature engineeringDetermine socioeconomic factors that affect per-capita incomeSep 2019Yes
Cost of HomeschoolingStataLogistic regression, fixed effectsDetermine effect of maternal education on odds of child being homeschooled (working paper)Apr 2018Yes
Crime and DivorceStataLinear regression, fixed effectsExplore differences in the divorce and crime rate in the U.S. and U.K. (working paper)July 2017Yes (paper only)

SQL

ProjectDescription (all in SQL)Date
CRUDCreate, Read, Update, and Delete (“CRUD”) in SQLOct 2019
Science Forums QueryingPerform calculations and work with data from ScienceForums.net in SQLNov 2019

About

Some of the statistics, machine learning, and economics research projects I worked on at BYU and CMU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published