Skip to content

rocioxl/nba-data-insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NBA data insights

Task description

  1. Create a Python 3.* that parses the NBA statistics provided in the attached files
  2. Dump the statistics into a MySQL database in a normalized format
  3. Create a user facing functionality to retrieve the following data points:
    1. The best player in terms of productivity for each week of the selected season (each point/rebound/assist counts the same)
    2. Prediction of a match result between two teams (the prediction model is up to you to create, the more interesting, the best)
  4. The program can be web facing (FLASK) or command line only

Project Skeleton

In a first approach, our ML models will be consumed through API Rest or through CLI. Based on this idea, the file structure is as follows

├─ requirements.txt   <- Python library dependency
├─ README.md          <- The top-level README for this project.
├─ makefile           <- Shortcuts
├─ src                <- Implemented python modules
├─ models             <- AI generated models
├─ eda                <- Generated notebooks for exploratory data analysis
└─ data               <- Used data

How to replicate the enviroment?

  • Python Version: Python 3.8.*
  • Enviroment:
  1. Replicate python enviroment with requirements.txt
    # using pip
    $ pip install -r requirements.txt
    
    # using conda
    $ conda create --name <env_name> --file requirements.txt
    
  2. Export enviroment variables
    export MYSQL_USER=?
    export MYSQL_PASSWORD=?
    export MYSQL_ROOT_PASSWORD=?
    export MYSQL_DATABASE=?
    export MYSQL_PORT=?
    export MYSQL_HOST=?
    
    NOTE: These variables are defined in database.conf file

Database

  1. Copy the database.conf file to src/mysql_db folder

  2. Create a mysql container

    $ docker-compose --file src/mysql_db/docker-compose.yml up  --build -d
    
  3. Dump data to Database

    $ cd src/server
    $ python populate_database.py
    
  • MAKE shortcuts
        $  make start-db
        $  make drop-db
    

Server

  1. Run flask server application
    $ export FLASK_APP=src/server/server.py
    $ flask run
    
  2. Postam collection with examples https://www.getpostman.com/collections/5d81d74ebf90f6a7649b
  • MAKE shortcuts
        $  make run-server
    

NOTE: The content of the folders model and data, and the file database.config are given by request. rocio.x.linares95@gmail.com.

Prediction Models

The goal of this model is predict if the home team of a game is going to win:

```
INPUT:{
        "GAME_ID": [<autogenerated_str>],     #  "20400425"
        "GAME_DATE_EST": [<date>],            #  "2003-12-30"
        "TEAM_ID_home": [<team_id>],          #  1610612759
        "TEAM_ID_away": [<team_id>],          #  1610612747
    }
```

```json
OUTPUT:{
       {"HOME_TEAM_WINS_PREDICTION": <prediction>}     #1 - YES | 0 - NO
```

Features

Two set of features were tested. These all features were extracted from rankings and games statistics.

  • 40 features
  • 41 features (extended)

See more details at this notebook: nba_features_extraction.ipynb

Experiment #1. Sklearn Classifiers Benchmarking

Sklearn tested classifiers:

  • Naive Bayes - Bernoulli
  • Nearest Neighbors
  • Decision Tree
  • Random Forest
  • Neural Net
  • AdaBoost
  • Stratified Gradient Descent - log
  • Stratified Gradient Descent - modified_huber

Classifiers benchmark: initial features

extended features

See more details at this notebooks: nba_sklearn_model.ipynb , nba_sklearn_model_extended.ipynb

Experiment #2. Pytorch Classifier Model

See more details at this notebooks: nba_pytorch_model.ipynb , nba_pytorch_model_extended.ipynb

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published