Here are some short posts I've authored on basic aspects of data science.
Stacked generalization in scikit learn
H-bGBT (Say that three times quickly!): New in sci-kit learn 0.21.0
reddit2rehab: Binary Classification Model: Using requests, beautiful soup and the PushShift API, I captured 1000s of Reddit posts in subreddits devoted to either active substance use or recovery from addiction, and trained multiple machine-learning (ML) models to predict which subset (active use vs addiction recovery) a piece of writing came from. The best model had a 96% success rate. Use case: Addiction medicine and counselling.
Fighting Fire with Firepower Multinomial Classification Model: An academic project with two peers, Mariam Javed and Wayne Chan. We explored a popular dataset regarding forest fires and the literature surrounding it, and developed a model to classify wildfires by size (a binning of acres burned per fire). An example of building multiple models and tuning their hyperparameters using gridsearch, our best model performed at a rate matching related experiments found in the literature. Use case: Disaster preparedness, resource allocation.
#readMoreCanlit **Recommender system (**work-in-progress): A recommender system intended to promote reading of Canadian literature, it takes input from the user regarding preferred books and book types, and returns recommendations of Canadian books that are similar, including author, title, description and cover art. Use case: publishing industry marketing, online and bricks-and-mortar bookstores.