In this repository, I put selected data analysis projects I have done for real-world data by using different machine learning or deep learning models as well as hypothesis testing or confidence interval to test statistical significance.
To complete each project, I mainly use R, MySQL, or Python, and some useful libraries in Python, such as NumPy (a library for working with arrays of data), Pandas (a library for providing high-performance, easy-to-use data structures and data analysis tools), SciPy (a library of techniques for numerical and scientific computing), Matplotlib (a library for making graphs) and Seaborn (a higher-level interface to Matplotlib that can be used to simplify many graphing tasks), Statsmodels (a library that implements many statistical techniques), nltk (for natural language processing), and sckit-learn (for machine learning). In addition, I implemented a learning algorithm "from scratch" by using a Tensorflow framework and coded them in object oriented programming.
-
DNA Sequences Analysis - Genomatics (Text Mining, Algorithm, Python)
-
Predicting house pricing in Seattle - sales and marketing (Regression, Tensorflow, Object Oriented Programming, hypothesis testing, Python)
-
Regression analysis with National Health and Nutrition Examination Survey data - health science (Regression Analysis, statsmodels, hypothesis testing, Python)
-
Predicting sentiment from product reviews - sales (Binary Classification, Text Mining, sckit-learn, Python)
-
Study Whether Housing Prices Are Effected By Recessions - marketing (Python, Hypothesis Testing)
-
Birth Study - health science (R, Hypothesis Testing)
-
Studying Test Relationships from Dognition Database (Queries by MySQL, Database Modeling)
-
Spelling Recommender - Product (Natural Language Processing, NLTK, Python)
This work by Hsuan-Hao Fan is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.