-
Notifications
You must be signed in to change notification settings - Fork 1
/
project outline.txt
19 lines (15 loc) · 1.26 KB
/
project outline.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Project: Anomaly Detection
What is it?
Assemble daily player stats from NBA. From that, we find anomalies, or outliers of stats and highlight them.
Full definition of outliers
The "interquartile range", abbreviated "IQR", is just the width of the box in the box-and-whisker plot. That is, IQR = Q3 – Q1.
An outlier is any value that lies more than one and a half times the length of the box from either end of the box. That is, if a data point is below Q1 – 1.5×IQR or above Q3 + 1.5×IQR, it is viewed as being too far from the central values to be reasonable. (http://www.purplemath.com/modules/boxwhisk3.htm)
How do we accomplish the whole thing?
0. Get the base data.
1. Scrape daily stats from ESPN.
2. Store it in the updated local csv files, which contain the aggregated daily stats of each player.
3. For each player, and for each stat category, determine if the current data is an outlier/anomaly by calculating the IQR.
4. Record players that have outlier(s) and the specific stats.
Remaining problems that need to be considered:
Treat players lacking game log urls on ESPN differently. fixed. Right now we just skip them.
Players' teams change! Think of a better way to represent historical data. (But then it doesn't matter for now since they're "historical"?)