All the files used in my NCAA Men's Basketball modeling, predictions, bracketology, and Ivy League simulations.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Model_2.0.R: Control panel for everything NCAA Hoops related that I do for the Yale Undergraduate Sports Analytics Group (YUSAG). Everything is controlled with this script, including creating the prediction model, making the YUSAG Bracket, and simulating Ivy League games. Note that large chunks of code are commented out. These code chunks correspond to the upcoming 2017-18 college basketball season, and will be uncommented as soon as the NCAA releases the official 2017-18 schedule (and the appropriate data is obtained). Be sure to follow my NCAA Basketball coverage this season on twitter @YaleSportsGroup and @recspecs730.

ncaa_hoop_scraper.R: An algorithim to scrape game schedule/result data from the NCAA website. This script is mostly copied (with slight modifications) from a scrape written by Prof. Jay Emerson and used in STAT 230: Introductory Data Analysis (Spring 2016).

3.0_Files/: A collection of files that are essentially the "inner workings" of everything done in this project. Every script in this directory contains functions. Those functions are executed in the master file, Model_3.0.R.

  • Ivy_Sims.R: Simulates the Ivy League Basketball season in order to estimate playoff odds and cacluates the "Playoff Swing Factor" of each conference game.
  • bracketology.R: Assembles the YUSAG Bracket.
  • helpers.R: A file with miscellanious functions used throughout the project.
  • powerrankings.R: Computes the YUSAG NCAA Power Rankings.
  • season_shift.R: Code used to create ncaa_season_shifts.pdf, which is used to generate the GIF seen here.
  • record_evaluator.R: Examines the quality of each team's resume by computing Quality Wins (as recently redefinined by the NCAA tournament selection committee), Strength of Record, and Wins Above Bubble.
  • rpi.R: Predictions end of season RPI for each team.
  • tourney_sim.R: Function for simulating college basketball tournaments with parameters left to the user. The user specifies teams (from best seed to worst seed), along with a vector of seeds. Note that after games have been played, the seeds vector must be entered in the order of highest possible seed for each given slot. For example, if we have quarterfinals where the matchups are 1 vs. 9, 2 vs. 7, 3 vs. 14, and 4 vs. 12, we'd set seeds = (1, 2, 3, 4, 12, 14, 7, 9), as 5, 6, 7, 8 are the "chalk" seeds occupied by 12, 14, 7, and 9 in this hypothetical 15 team tournament. The user must also enter the number of single byes, the number of double_byes, the number of simulations to run nsims, and a parameter for home court advantage, hca. If the tournament is played at a neutral site, set hca = NA. If the higher seed is always given home court advantage, set hca = "seed". If one team hosts the tournament (even if not the top seed), set hca = INSERT_TEAM_NAME.
  • Bracketology/: Collection of .csv files used in YUSAG Bracketology.
    • bids.csv: Table of tournament bids broken down by conference.
    • bracket.csv: The final bracket produced for YUSAG Bracketology.
    • bracket_math.csv: Table of bracket metrics for all 351 Division-1 teams. See YUSAG Bracket Math for more.
    • bubble.csv: Bracket metrics for the first 16 teams missing the field as at-large bids.
    • resumes.csv: Subset of bracket metrics (resume evaluation, strength of record, wins above bubble) produced by record_evaluator.R.
    • rpi.csv: Projected end of season RPI for each team. Produced by rpi.R.
    • historical/: A collection of files used to predict NCAA Tournament seed from the various metrics in this directory.
  • Info/: A collection of information used to adjust model weights and determine postseason status.
    • conferences.csv: List of teams with their conference, postseason eligibility status, and elimination status from automatic bid contention.
    • mins.csv: Percentage of team's 2016-17 minutes returning during the 2017-18 season. Acquired from Bart Tovrik.
    • recruiting.csv: 247Sports recruiting scores for each team's incoming freshman class.
    • transfers.csv: Data on transfers eligible to play in the 2017-18 season, pulled from
  • History/: Model prediction history, with actual score differentials and predicted score differentials from a week in advance.
    • 2017_18_history.csv: 2017-18 prediction history.
  • Power_Rankings/: Collection of .csv files produced by powerrankings.R.
    • Powerrankings.csv: Ranking of all 351 teams by YUSAG Coefficient.
    • conf_summary.csv: Ranking of the 32 Division 1 conferences, by median YUSAG coefficient.
    • pr_by_conf.csv: Ranking of teams by YUSAG Coefficient, sorted by conference. For more, click here.
  • Predictions/:
    • playoffs.csv: Ivy League playoff odds.
    • psf.csv: Playoff Swing Factor for most recent week of Ivy League conference games.
  • Results/: Complete NCAA Basketball schedule/results through a given date. Sub-directories indicate the year/season, with .csv files given in NCAA_Hoops_results_day_month_year.csv format. NCAA_Hoops_results_6_29_2017.csv is the complete results file for the 2016-17 season. NCAA_Hoops_results_10_30_2017.csv gives the preseason schedule for the 2017-18 season. There is also a folder of results for the 2015-16 season, with one .csv of results from the end of that season.
  • Takes power ranking and bracketology .csv files and coverts them to the HTML seen in html for use on the YUSAG Website.