# Demoing

Data structures for ordinal/ranked pairs of variables.

In [18]:
import numpy as np
import pandas as pd
import rankingscompare as rc

## Conjoint, no ties

Though some of these options involve storing ranks instead of values, I would prefer to store values and let this package to the conversion of values into ranks. This list is not exhaustive, these are just some of the methods that I thought of:

1. Two separate lists, of equal length, where position x in each list represents a paried item. Elements in the list may be either values to be ranked, or the ranks themselves. **Two lists (or vectors) of values is the most common for statistics on two variables.**
2. A list of lists or tuples, where each list/tuple in the list represents a paired item. Elements in the lists/tuples may be values or ranks.
3. Two separate lists, of equal length, of objects, where position x in each list represents rank x + 1 (Python lists are 0-indexed).
4. Same as above, but a list of lists/tuples
5. List of lists/tuples that contain 3 entries: the item, the first value/rank, and the second value/rank (basically a table)
6. The above, but as a Pandas DataFrame, which is more commonly used to store tabular data for data analysis than a list of lists/tuples (this is just a guess)
7. Dictionary of item (key) to a list/tuple of values for Variable1, Variable2 (values)
8. Dictionary of rank (key) to a list/tuple of items for Variable1, Variable2 (values)

Option 1 is my favorite. Options 5 and 6 make a lot of sense because values for ordinal variables are typically stored in tables.

In [15]:
# Option 1, elements are values to be converted to ranks
a = [15, 10, 5, 6]
b = [23, 24, 25, 26]

# Option 2, elements are values to be converted to ranks
ranks = [
    [15, 23],  # item a
    [10, 24],  # item b
    [5, 25],   # item c
    [6, 26]    # item d
]

# Option 3
a = ['a', 'b', 'd', 'c']
b = ['d', 'c', 'b', 'a']

# Option 4
ranks = [
    ['a', 'd'],  # rank 1
    ['b', 'c'],  # rank 2
    ['d', 'b'],  # rank 3
    ['c', 'a']   # rank 4
]

# Option 5
ranks = [
    ['a', 15, 23],  # item a
    ['b', 10, 24],  # item b
    ['c', 5, 25],   # item c
    ['d', 6, 26]    # item d
]

# Option 6
ranks = pd.DataFrame(ranks, columns = ['Item', 'Variable1', 'Variable2'])

# Option 7
ranks = {
    'a': [15, 23],  # item a
    'b': [10, 24],
    'c': [5, 25],
    'd': [6, 26]
}

# Option 8
ranks = {
    1: ['a', 'd'],  # rank 1
    2: ['b', 'c'],
    3: ['d', 'b'],
    4: ['c', 'a']
}

## Conjoint, ties

The above options all work with ties, except for options 3, 4 and 8.

The question then becomes how to represent ties? For statistics like `rc.tau_b`, we can convert values to ranks using `rc.to_rank` and use the midrank approach of assigning a group of ties the mean of the tied ranks, and can then use those lists of ranks for computation. This becomes a little more complicated later on when we consider non-conjoint data.

We can modify approaches 3, 4 and 8 to incorporate lists of ties, but I hate these:

9. Option 9 - Modify Option 3 by creating separate lists for each variable specifying which items are tied for each variable.

10. Option 10 - Modify Option 4 by creating separate lists for each variable specifying which items are tied for each variable.

11. Option 11 - Modify Option 8 by still having ranks as keys, but now elements 0 and 1 of the lists/tuples that are values in the dictionary are lists/tuples of items at the rank of the key from Variable1 and Variable 2, respectively.

In [13]:
# Option 9
a = ['a', 'b', 'd', 'c']
b = ['d', 'c', 'b', 'a']
a_ties = [['b', 'd']]  # b and d are tied in variable a
b_ties = [['b', 'a']]  # b and a are tied in variable b

# Option 10
ranks = [
    ['a', 'd'],  # higher ranks (1, 2, ...)
    ['b', 'c'],  
    ['d', 'b'],  
    ['c', 'a']   # lower ranks (4, 3, ...)
]
a_ties = [['b', 'd']]  # b and d are tied in variable a
b_ties = [['b', 'a']]  # b and a are tied in variable b

# Option 11
ranks = {
    1:   [['a']     , ['d']],
    2:   [None      , ['c']],
    2.5: [['b', 'd'], None],
    3.5: [None      , ['b', 'a']],
    4:   [['c']     , None]
}

## Non-conjoint, no ties

Option 3 works as-is.

The other options may be modified to work with non-conjoint data by adding in `None` values to represent cases where an item appears in one set but not the other. Adding `None` is slightly awkward ... the only benefit of it over using Options 3 as-is is that when we consider ties, Options 3 won't work, while these modified Options will work.

In [11]:
# Modified Option 1 - adding an item in a and not b, and vice versa
a = [15, 10, 5,  6 , None, 2]
b = [23, 24, 25, 26, 4,    None]

# Modified Option 2
ranks = [
    [15, 23],   # item a
    [10, 24],   # item b
    [5, 25],    # item c
    [6, 26],    # item d
    [None, 4],  # item e, only in Variable2
    [2, None]   # item f, only in Variable1
]

# etc ...

## Non-conjoint, ties

The best choices...

In [None]:
# Modified Option 1 - adding an item in a and not b, and vice versa
a = [15, 10, 5,  6 , None, 2]
b = [23, 24, 25, 26, 4,    None]

# Modified Option 5
ranks = [
    ['a', 15, 23],  # item a
    ['b', 10, 24],  # item b
    ['c', 5, 25],   # item c
    ['d', 6, 26],   # item d
    ['e', None, 4], # item e - var2 only
    ['f', 2, None]  # item f - var1 only
]

# Modified Option 6
ranks = pd.DataFrame(ranks, columns = ['Item', 'Variable1', 'Variable2'])