- Create a Python 3.* that parses the NBA statistics provided in the attached files
- Dump the statistics into a MySQL database in a normalized format
- Create a user facing functionality to retrieve the following data points:
- The best player in terms of productivity for each week of the selected season (each point/rebound/assist counts the same)
- Prediction of a match result between two teams (the prediction model is up to you to create, the more interesting, the best)
- The program can be web facing (FLASK) or command line only
In a first approach, our ML models will be consumed through API Rest or through CLI. Based on this idea, the file structure is as follows
├─ requirements.txt <- Python library dependency
├─ README.md <- The top-level README for this project.
├─ makefile <- Shortcuts
├─ src <- Implemented python modules
├─ models <- AI generated models
├─ eda <- Generated notebooks for exploratory data analysis
└─ data <- Used data
- Python Version: Python 3.8.*
- Enviroment:
- Replicate python enviroment with
requirements.txt
# using pip $ pip install -r requirements.txt # using conda $ conda create --name <env_name> --file requirements.txt
- Export enviroment variables
NOTE: These variables are defined in
export MYSQL_USER=? export MYSQL_PASSWORD=? export MYSQL_ROOT_PASSWORD=? export MYSQL_DATABASE=? export MYSQL_PORT=? export MYSQL_HOST=?
database.conf
file
-
Copy the
database.conf
file tosrc/mysql_db
folder -
Create a mysql container
$ docker-compose --file src/mysql_db/docker-compose.yml up --build -d
-
Dump data to Database
$ cd src/server $ python populate_database.py
- MAKE shortcuts
$ make start-db $ make drop-db
- Run flask server application
$ export FLASK_APP=src/server/server.py $ flask run
- Postam collection with examples https://www.getpostman.com/collections/5d81d74ebf90f6a7649b
- MAKE shortcuts
$ make run-server
NOTE: The content of the folders model
and data
, and the file database.config
are given by request. rocio.x.linares95@gmail.com.
The goal of this model is predict if the home team of a game is going to win:
```
INPUT:{
"GAME_ID": [<autogenerated_str>], # "20400425"
"GAME_DATE_EST": [<date>], # "2003-12-30"
"TEAM_ID_home": [<team_id>], # 1610612759
"TEAM_ID_away": [<team_id>], # 1610612747
}
```
```json
OUTPUT:{
{"HOME_TEAM_WINS_PREDICTION": <prediction>} #1 - YES | 0 - NO
```
Two set of features were tested. These all features were extracted from rankings
and games
statistics.
- 40 features
- 41 features (extended)
See more details at this notebook: nba_features_extraction.ipynb
Sklearn tested classifiers:
Naive Bayes - Bernoulli
Nearest Neighbors
Decision Tree
Random Forest
Neural Net
AdaBoost
Stratified Gradient Descent - log
Stratified Gradient Descent - modified_huber
See more details at this notebooks: nba_sklearn_model.ipynb , nba_sklearn_model_extended.ipynb
See more details at this notebooks: nba_pytorch_model.ipynb , nba_pytorch_model_extended.ipynb