# Variables in Statistics

In [1]:
import pandas as pd

In [3]:
wnba = pd.read_csv('datasets\wnba.csv')

In [4]:
wnba.head()

Unnamed: 0,Name,Team,Pos,Height,Weight,BMI,Birth_Place,Birthdate,Age,College,...,OREB,DREB,REB,AST,STL,BLK,TO,PTS,DD2,TD3
0,Aerial Powers,DAL,F,183,71.0,21.200991,US,"January 17, 1994",23,Michigan State,...,6,22,28,12,3,6,12,93,0,0
1,Alana Beard,LA,G/F,185,73.0,21.329438,US,"May 14, 1982",35,Duke,...,19,82,101,72,63,13,40,217,0,0
2,Alex Bentley,CON,G,170,69.0,23.875433,US,"October 27, 1990",26,Penn State,...,4,36,40,78,22,3,24,218,0,0
3,Alex Montgomery,SAN,G/F,185,84.0,24.543462,US,"December 11, 1988",28,Georgia Tech,...,35,134,169,65,20,10,38,188,2,0
4,Alexis Jones,MIN,G,175,78.0,25.469388,US,"August 5, 1994",23,Baylor,...,3,9,12,12,7,0,14,50,0,0


In [5]:
wnba.columns

Index(['Name', 'Team', 'Pos', 'Height', 'Weight', 'BMI', 'Birth_Place',
       'Birthdate', 'Age', 'College', 'Experience', 'Games Played', 'MIN',
       'FGM', 'FGA', 'FG%', '15:00', '3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB',
       'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PTS', 'DD2', 'TD3'],
      dtype='object')

### Print only the Objects

In [9]:
for column in wnba:
    if wnba[column].dtype == 'object':
        print(column)

Name
Team
Pos
Birth_Place
Birthdate
College
Experience


### Print only the numeric

In [10]:
for column in wnba:
    if wnba[column].dtype != 'object':
        print(column)

Height
Weight
BMI
Age
Games Played
MIN
FGM
FGA
FG%
15:00
3PA
3P%
FTM
FTA
FT%
OREB
DREB
REB
AST
STL
BLK
TO
PTS
DD2
TD3


### Quantitative vs. Qualitative variables

Variables that describe qualities are called qualitative variables or **categorical variables**.

In [11]:
variables = {'Name': 'qualitative',
             'Team': 'qualitative',
             'Pos': 'qualitative',
             'Height': 'quantitative',
             'BMI': 'quantitative',
             'Birth_Place': 'qualitative',
             'Birthdate': 'quantitative',
             'Age': 'quantitative',
             'College': 'qualitative',
             'Experience': 'quantitative',
             'Games Played': 'quantitative',
             'MIN': 'quantitative',
             'FGM': 'quantitative',
             'FGA': 'quantitative',
             '3PA': 'quantitative',
             'FTM': 'quantitative',
             'FTA': 'quantitative',
             'FT%': 'quantitative',
             'OREB': 'quantitative',
             'DREB': 'quantitative',
             'REB': 'quantitative',
             'AST': 'quantitative',
             'PTS': 'quantitative'}

### The Nominal Scale

The system of rules that define how each variable is measured is called scale of measurement or, less often, **level of measurement**. There are different scales of measurement: 
- nominal 
- ordinal
- interval
- ratio

In [16]:
nominal_scale = sorted(['Name','Team','Birth_Place','Pos','College'])
nominal_scale

['Birth_Place', 'College', 'Name', 'Pos', 'Team']

### The Ordinal Scale

Common examples of variables measured on ordinal scales include ranks. Other common examples include measurements of subjective evaluations that are generally difficult or near to impossible to quantify with precision.

**True or False?**

1. Using the Height_labels variable only, we can tell whether player Kiah Stokes is taller than Riquna Williams. 
2. We can measure the height difference between Kiah Stokes and Riquna Williams using the Height_labels variable. 
3. The Height_labels and the College variables are both measured on an ordinal scale.
4. The Games Played variable is not measured on an ordinal scale.
5. The Experience variable is measured on an ordinal scale. 
6. The Height_labels variable is qualitative because is measured using words.

1. True
2. False
3. False
4. True
5. False
6. False

In [10]:
wnba['Experience'].head()

0     2
1    12
2     4
3     6
4     R
Name: Experience, dtype: object

### Internal Scale vs. Ratio Scale

A variable measured on a scale that preserves the order between values, and have well-defined intervals using real numbers, is an example of a variable measured either on an **interval scale**, or on a **ratio scale**. 
What sets apart ratio scales from interval scales is the nature of the zero point. On a ratio scale, the zero point means no quantity. On an interval scale, however, the zero point doesn't indicate the absence of a quantity. It actually indicates the presence of a quantity.

In [16]:
interval = sorted(['Birthdate','Weight_deviation'])

In [17]:
ratio = sorted(['Height', 'Weight', 'BMI', 'Age', 'Experience', 'Games Played', 'MIN', 'FGM', 'FGA', 'FG%', '15:00','3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB', 'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PTS', 'DD2', 'TD3'])

### Continuous variables vs. Discrete variables

In [18]:
ratio_interval_only = {'Height': 'continuous',
                       'Weight': 'continuous',
                       'BMI': 'continuous',
                       'Age': 'continuous',
                       'Games Played': 'discrete',
                       'MIN': 'continuous',
                       'FGM': 'discrete',
                       'FGA': 'discrete',
                       'FG%': 'continuous',
                       '3PA': 'discrete',
                       '3P%': 'continuous',
                       'FTM': 'discrete',
                       'FTA': 'discrete',
                       'FT%': 'continuous',
                       'OREB': 'discrete',
                       'DREB': 'discrete',
                       'REB': 'discrete',
                       'AST': 'discrete',
                       'STL': 'discrete',
                       'BLK': 'discrete',
                       'TO': 'discrete',
                       'PTS': 'discrete',
                       'DD2': 'discrete', 
                       'TD3': 'discrete',
                       'Weight_deviation': 'continuous'}

### Real Limits

In [19]:
bmi = {21.200991370000001: [21.2009913700000005, 21.2009913700000015],
       21.329437550000002: [21.3294375500000015, 21.3294375500000025],
       23.875432530000001: [23.8754325300000005, 23.8754325300000015],
       24.543462380000001: [24.5434623800000005, 24.5434623800000015],
       25.46938776: [25.469387755, 25.469387765]}

### Summary

In [20]:
from IPython.display import Image

In [21]:
Image("Levels of measurement.JPG")

<IPython.core.display.Image object>

In [22]:
Image("Interval vs Ratio.JPG")

<IPython.core.display.Image object>