## Quantitative variable and Qualitative variables

Variables in statistics can describe either quantities, or qualities.

Quantitative variables: Variables that describe how much there is of something.

- Describe quantities
- Use numbers, the numbers are actual quantities
- Use words, the words express a quantity

Example: Height - 6 feet, Tall, Short
    
Qualitative variables or categorical variables: Qualitative variables describe what or how something is. 
    
- Describe qualities
- Use numbers, the numbers are NOT actual quantities
- Use words
    
Example: Name, Team, College    


In [1]:
import pandas as pd
wnba = pd.read_csv('/Users/brindhamanivannan/Desktop/data-projects/datasets/wnba.csv')
wnba.head()

Unnamed: 0,Name,Team,Pos,Height,Weight,BMI,Birth_Place,Birthdate,Age,College,...,OREB,DREB,REB,AST,STL,BLK,TO,PTS,DD2,TD3
0,Aerial Powers,DAL,F,183,71.0,21.200991,US,"January 17, 1994",23,Michigan State,...,6,22,28,12,3,6,12,93,0,0
1,Alana Beard,LA,G/F,185,73.0,21.329438,US,"May 14, 1982",35,Duke,...,19,82,101,72,63,13,40,217,0,0
2,Alex Bentley,CON,G,170,69.0,23.875433,US,"October 27, 1990",26,Penn State,...,4,36,40,78,22,3,24,218,0,0
3,Alex Montgomery,SAN,G/F,185,84.0,24.543462,US,"December 11, 1988",28,Georgia Tech,...,35,134,169,65,20,10,38,188,2,0
4,Alexis Jones,MIN,G,175,78.0,25.469388,US,"August 5, 1994",23,Baylor,...,3,9,12,12,7,0,14,50,0,0


In [2]:
wnba.columns

Index(['Name', 'Team', 'Pos', 'Height', 'Weight', 'BMI', 'Birth_Place',
       'Birthdate', 'Age', 'College', 'Experience', 'Games Played', 'MIN',
       'FGM', 'FGA', 'FG%', '15:00', '3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB',
       'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PTS', 'DD2', 'TD3'],
      dtype='object')

In [3]:
variables = {'Name': 'qualitative', 
             'Team': 'qualitative', 
             'Pos': 'qualitative', 
             'Height': 'quantitative', 
             'BMI': 'quantitative',
             'Birth_Place': 'qualitative', 
             'Birthdate': 'quantitative', 
             'Age': 'quantitative', 
             'College': 'qualitative', 
             'Experience': 'quantitative',
             'Games Played': 'quantitative', 
             'MIN': 'quantitative', 
             'FGM': 'quantitative', 
             'FGA': 'quantitative',
             '3PA': 'quantitative', 
             'FTM': 'quantitative', 
             'FTA': 'quantitative', 
             'FT%': 'quantitative', 
             'OREB': 'quantitative', 
             'DREB': 'quantitative',
             'REB': 'quantitative', 
             'AST': 'quantitative', 
             'PTS': 'quantitative'}

## Scales of measurement

Nominal, Ordinal, Interval and Ratio

The characteristics of each scale pivot around three main questions:

- Can we tell whether two individuals are different?
- Can we tell the direction of the difference?
- Can we tell the size of the difference?

## The Nominal Scale

The Team variable is an example of a variable measured on a nominal scale. For any variable measured on a nominal scale:

- We can tell whether two individuals are different or not (with respect to that variable).
- We can't say anything about the direction and the size of the difference.
- We know that it can only describe qualities.

When a qualitative variable is described with numbers, the principles of the nominal scale still hold. We can tell whether there's a difference or not between individuals, but we still can't say anything about the size and the direction of the difference.

In [4]:
print(wnba.columns)

Index(['Name', 'Team', 'Pos', 'Height', 'Weight', 'BMI', 'Birth_Place',
       'Birthdate', 'Age', 'College', 'Experience', 'Games Played', 'MIN',
       'FGM', 'FGA', 'FG%', '15:00', '3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB',
       'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PTS', 'DD2', 'TD3'],
      dtype='object')


In [5]:
nominal_scale = ['Name', 'Team', 'Pos', 'Birth_Place', 'College']
nominal_scale = sorted(nominal_scale)
print(nominal_scale)

['Birth_Place', 'College', 'Name', 'Pos', 'Team']


In [6]:
print(wnba["Height"].unique())

[183 185 170 175 188 178 180 196 193 198 173 191 206 201 168 165]


## The Ordinal Scale

Height_labels = ['tall' 'short' 'medium']

The new Height_labels variable show labels like "short", "medium", or "tall". 

By examining the values of this new variable, we can tell whether two individuals are different or not. But, unlike in the case of a nominal scale, we can also tell the direction of the difference. Someone who is assigned the label "tall" has a bigger height than someone assigned the label "short".

However, we still can't determine the size of the difference. This is an example of a variable measured on an ordinal scale.

Common examples of variables measured on ordinal scales include ranks: ranks of athletes, of horses in a race, of people in various competitions, etc.

Other common examples include measurements of subjective evaluations that are generally difficult or near to impossible to quantify with precision. For instance, when answering a survey about how much they like a new product, people may have to choose a label between "It's a disaster, I hate it", "I don't like it", "I like it a bit", "I really like it", "I simply love it".

## The Interval and Ratio Scales

A variable measured on a scale that preserves the order between values and has well-defined intervals using real numbers is an example of a variable measured either on an interval scale, or on a ratio scale.

Example: Height measured using real numbers. 70 cms

In practice, variables measured on interval or ratio scales are very common.

Examples :

- Height measured with a numerical unit of measurement - 6 inches, 70 cms
- Weight measured with a numerical unit of measurement - 50 grams
- Time measured with a numerical unit of measurement -  60 seconds
- The price of various products measured with a numerical unit of measurement - 600 dollars

## The Difference Between Ratio and Interval Scales

What sets apart ratio scales from interval scales is the nature of the zero point.

On a ratio scale, the zero point means no quantity. For example, the Weight variable is measured on a ratio scale, which means that 0 grams indicate the absence of weight.

On an interval scale, however, the zero point doesn't indicate the absence of a quantity. It actually indicates the presence of a quantity.

In [None]:
Weight_deviation - describes by how many kilograms the weight of a player is different than the average weight of the players in our dataset. 

![Screen%20Shot%202022-05-26%20at%208.28.17%20PM.png](attachment:Screen%20Shot%202022-05-26%20at%208.28.17%20PM.png)

If a player had a value of 0 for our Weight_deviation variable (which is measured on an interval scale), that wouldn't mean the player has no weight. Rather, it'd mean that her weight is exactly the same as the mean. The mean of the Weight variable is roughly 78.98 kg, which means that the zero point in the Weight_deviation variable is equivalent to 78.98 kg.

On the other side, a value of 0 for the Weight variable, which is measured on a ratio scale, indicates the absolute absence of weight.

Another important difference between the two scales is given by the way we can measure the size of the differences.

On a ratio scale, we can quantify the difference in two ways. One way is to measure a distance between any two points by simply subtracting one from another. The other way is to measure the difference in terms of ratios.

For example, by doing a simple subtraction using the data in the table above, we can tell that the difference (the distance) in weight between Clarissa dos Santos and Alex Montgomery is 5 kg. In terms of ratios, however, Clarissa dos Santos is roughly 1.06 (the result of 89 kg divided by 84 kg) times heavier than Alex Montgomery. To give a straightforward example, if player A had 90 kg and player B had 45 kg, we could say that player A is two times (90 kg divided by 45 kg) heavier than player B.

On an interval scale, however, we can measure meaningfully the difference between any two points only by finding the distance between them (by subtracting one point from another). If we look at the weight deviation variable, we can say there's a difference of 5 kg between Clarissa dos Santos and Alex Montgomery. However, if we took ratios, we'd have to say that Clarissa dos Santos is two times heavier than Alex Montgomery, which is not true.

In [7]:
print(wnba.columns)

Index(['Name', 'Team', 'Pos', 'Height', 'Weight', 'BMI', 'Birth_Place',
       'Birthdate', 'Age', 'College', 'Experience', 'Games Played', 'MIN',
       'FGM', 'FGA', 'FG%', '15:00', '3PA', '3P%', 'FTM', 'FTA', 'FT%', 'OREB',
       'DREB', 'REB', 'AST', 'STL', 'BLK', 'TO', 'PTS', 'DD2', 'TD3'],
      dtype='object')


In [8]:
interval = ['Birthdate', 'Weight_deviation']
ratio = ['15:00', '3P%', '3PA', 'AST', 'Age', 'BLK', 'BMI', 'DD2', 'DREB', 'Experience', 
         'FG%', 'FGA', 'FGM', 'FT%', 'FTA', 'FTM', 'Games Played', 'Height', 'MIN', 
         'OREB', 'PTS', 'REB', 'STL', 'TD3', 'TO', 'Weight']

- Interval: Zero point indicates presence of a quantity. 
- Birthdate is interval because Jan1,1 (year 1) is the zero point here. It indicates beginning of the year.

- Weight_deviation: Zero point means the weight is same as the mean weight. So it is a presence of a quantity.

## Common Examples of Interval Scales

In practice, variables measured on an interval scale are relatively rare. Below we discuss two examples that are more common.

Generally, points in time are indicated by variables measured on an interval scale. Let's say we want to indicate the point in time of the first manned mission on the Moon. If we want to use a ratio scale, our zero point must be meaningful and denote the absence of time. For this reason, we'd basically have to begin the counting at the very beginning of time.

There are many problems with this approach. One of them is that we don't know with precision when time began (assuming time actually has a beginning), which means we don't know how far away in time we are from that zero point.

To overcome this, we can set an arbitrary zero point, and measure the distance in time from there. Customarily, we use the Anno domini system where the zero point is arbitrarily set at the moment Jesus was born. Using this system, we can say that the first manned mission on the Moon happened in 1969. This means that the event happened 1968 years after Jesus' birth (1968 because there's no year 0 in the Anno domini system).

![Screen%20Shot%202022-05-26%20at%208.34.47%20PM.png](attachment:Screen%20Shot%202022-05-26%20at%208.34.47%20PM.png)

Another common example has to do with measuring temperature. In day to day life, we usually measure temperature on a Celsius or a Fahrenheit scale. These scales are examples of interval scales.

Because temperature is measured on an interval scale, we need to avoid quantifying the difference in terms of ratio. For example, 0°C or 0°F are arbitrarily set zero points and don't indicate the absence of temperature. If 0°C or 0°F were meaningful zero points, temperatures below 0°C or 0°F wouldn't be possible. But we know that we can go way below 0°C or 0°F.

If yesterday was 10°C, and today is 20°C, we can't say that today is twice as hot as yesterday. We can say, however, that today's temperature is 10°C more compared to yesterday.

Temperature can be measured on a ratio scale too, and this is done using the Kelvin scale. 0 K (0 Kelvin) is not set arbitrarily, and it indicates the lack of temperature. The temperature can't possibly drop below 0 K.

![Screen%20Shot%202022-05-26%20at%208.35.40%20PM.png](attachment:Screen%20Shot%202022-05-26%20at%208.35.40%20PM.png)

## Discrete and Continuous Variables

Generally, if there's no possible intermediate value between any two adjacent values of a variable, we call that variable discrete(countable).

Examples: counts of people in a class, a room, an office, a country, a house
  
Generally, if there's an infinity of values between any two values of a variable, we call that variable continuous(measurable) .

Example: weight, height

In [9]:
ratio_interval_only = {'Height':'continuous', 
                       'Weight': 'continuous', 
                       'BMI': 'continuous', 
                       'Age': 'continuous', 
                       'Games Played': 'discrete', 
                       'MIN': 'continuous', 
                       'FGM': 'discrete',
                       'FGA': 'discrete', 
                       'FG%': 'continuous', 
                       '3PA': 'discrete', 
                       '3P%': 'continuous', 
                       'FTM': 'discrete', 
                       'FTA': 'discrete', 
                       'FT%': 'continuous',
                       'OREB': 'discrete', 
                       'DREB': 'discrete', 
                       'REB': 'discrete', 
                       'AST': 'discrete', 
                       'STL': 'discrete', 
                       'BLK': 'discrete', 
                       'TO': 'discrete',
                       'PTS': 'discrete', 
                       'DD2': 'discrete', 
                       'TD3': 'discrete', 
                       'Weight_deviation': 'continuous'}