### Variable Statistics

### Introduction & Definition
We'll focus on understanding the structural parts of a dataset, and how they're measured.
Whether a sample or a population, a dataset is generally an attempt to describe correctly a relatively small part of the world. 
Other datasets might attempt to describe the stock market, patient symptoms, stars from galaxies other than ours, movie ratings, customer purchases, and all sorts of other things.

The things we want to describe usually have a myriad of properties. A human, for instance, besides the property of being a human, can also have properties like height, weight, age, name, hair color, gender, nationality, whether they're married or not, whether they have a job or not, etc.
In practice, we limit ourselves to the properties relevant to the questions we want to answer, and to the properties that we can actually measure.

Each row describes an individual having a series of properties: name, team, position on the field, height, etc. For most properties, the values vary from row to row. All players have a height, for example, but the height values vary from player to player.

The properties with varying values we call **variables**. The height property in our dataset is an example of a variable. In fact, all the properties described in our dataset are variables.

A row in our dataset describes the actual values that each variable takes for a given individual.

### Quantitative & Qualitative Variables
Variables in statistics can describe either **quantities**, or **qualities**.
Generally, a variable that describes how much there is of something describes a quantity, and, for this reason, it's called a **quantitative variable**.
Usually, quantitative variables describe a quantity using real numbers, but there are also cases when words are used instead. Height, for example,can be described using real numbers, like in our dataset, but it can also be described using labels like "tall" or "short".

The Name, Team, and College variables describe for each individual a quality, that is, a property that is not quantitative. Variables that describe qualities are called **qualitative variables** or **categorical variables**. Generally, qualitative variables describe what or how something is.
Usually, qualitative variables describe qualities using words, but numbers can also be used. For instance, the number of a player's shirt or the number of a racing car are described using numbers. The numbers don't bear any quantitative meaning though, they are just names, not quantities.

![title](./img/1_V.png)

You can find useful documentation about each variable [here](https://www.basketball-reference.com/about/glossary.html) and [here](https://www.kaggle.com/jinxbe/wnba-player-stats-2017).

In [None]:
# For each of the variables selected, indicate whether it's quantitative or qualitative.
import pandas as pd
wnba = pd.read_csv('wnba.csv')

variables = {'Name': 'qualitative', 'Team': 'qualitative', 'Pos': 'qualitative', 'Height': 'quantitative', 'BMI': 'quantitative',
             'Birth_Place': 'qualitative', 'Birthdate': 'quantitative', 'Age': 'quantitative', 'College': 'qualitative', 'Experience': 'quantitative',
             'Games Played': 'quantitative', 'MIN': 'quantitative', 'FGM': 'quantitative', 'FGA': 'quantitative',
             '3PA': 'quantitative', 'FTM': 'quantitative', 'FTA': 'quantitative', 'FT%': 'quantitative', 'OREB': 'quantitative', 'DREB': 'quantitative',
             'REB': 'quantitative', 'AST': 'quantitative', 'PTS': 'quantitative'}

### Scales Of Measurement
The amount of information a variable provides depends on its nature (whether it's quantitative or qualitative), and on the way it's measured.

For instance, if we analyze the Team variable for any two individuals:

    We can tell whether or not the two individuals are different from each other with respect to the team they play.
    But if there's a difference:
        We can't tell the size of the difference.
        We can't tell the direction of the difference - we can't say that team A is greater or less than team B.

On the other side, if we analyze the Height variable:

    We can tell whether or not two individuals are different.
    If there's a difference:
        We can tell the size of the difference. If player A has 190 cm and player B has 192 cm, then the difference between the two is 2 cm.
        We can tell the direction of the difference from each perspective: player A has 2 cm less than player B, and player B has 2 cm more than player A.
![title](./img/2_Variables.png)

The Team and Height variables provide different amounts of information because they have a different **nature** (one is qualitative, the other quantitative), and because they are measured differently.

The system of rules that define how each variable is measured is called **scale of measurement** or, less often, level of measurement.

In the next screens, we'll learn about a system of measurement made up of four different scales of measurement: **nominal, ordinal, interval, and ratio**. As we'll see, the characteristics of each **scale pivot** around three main questions:

    Can we tell whether two individuals are different?
    Can we tell the direction of the difference?
    Can we tell the size of the difference?


### The Nominal Scale
