# Crosstabulation challenge

In [1]:
# You may want to use this one.
from itertools import product

# Don't change this cell; just run it.
import numpy as np  # The array library.

import pandas as pd
# Safe setting for Pandas.  Needs Pandas version >= 1.5.
pd.set_option('mode.copy_on_write', True)

# The OKpy testing system.
from client.api.notebook import Notebook
ok = Notebook('crosstab_challenge.ok')

Assignment: crosstab challenge
OK, version v1.18.1



## The challenge

Write a function that accepts a crosstab type DataFrame, and returns the corresponding row=observation DataFrame.

Here's an example crosstabulation:

In [2]:
bystander_tab = pd.DataFrame([[64, 165],
                              [7, 44]],
                             columns=['bystander', 'rescuer'],
                             index=['No', 'Yes'])
bystander_tab

Unnamed: 0,bystander,rescuer
No,64,165
Yes,7,44


This is a crosstabulation for which the raw material was an *observations* table / DataFrame.  That observations table had one row per person, two columns, where the first column has "Yes" or "No" for whether that person was a member of a political party, and the second has "bystander" or "rescuer".

Here is the long-hand version of that, from the "noble_politics" page:

In [3]:
label_pairs = pd.DataFrame([['No', 'bystander'],
                            ['Yes', 'bystander'],
                            ['No', 'rescuer'],
                            ['Yes', 'rescuer']],
                           columns=['party_yn', 'respondent'])
both_cols = np.concatenate([bystander_tab['bystander'],
                            bystander_tab['rescuer']])
label_indices = np.repeat([0, 1, 2, 3], both_cols)
people = label_pairs.loc[label_indices].reset_index(drop=True)
people

Unnamed: 0,party_yn,respondent
0,No,bystander
1,No,bystander
2,No,bystander
3,No,bystander
4,No,bystander
...,...,...
275,Yes,rescuer
276,Yes,rescuer
277,Yes,rescuer
278,Yes,rescuer


In [7]:
# This corresponds to the input crosstab data frame.
pd.crosstab(people['party_yn'], people['respondent'])

respondent,bystander,rescuer
party_yn,Unnamed: 1_level_1,Unnamed: 2_level_1
No,64,165
Yes,7,44


Your job is to make a general function, `xtab2obs`, that accepts a crosstabulation data frame, and returns the corresponding observations data frame.

The columns should be called 'label0', 'label0'.

Your function should work with any number of rows (labels) and columns in the input data frame.  The example above as 2 row labels and 2 column labels, but your function should work for $m$ rows and $n$ columns, where $m$, $n$ can be any number.

In [5]:
def xtab2obs(tab_df):
    # Your instructor's solution was 5 lines.
    # Your code here
    pairs = list(product(tab_df.columns, tab_df.index,))
    df = pd.DataFrame(pairs, columns=["label1", "label0"])
    counts = np.concatenate([tab_df[i] for i in tab_df.columns])
    label_indices = np.repeat(np.arange(len(pairs)), counts)
    obs_df = df.loc[label_indices].reset_index(drop=True)

    return obs_df

In [8]:
x = xtab2obs(bystander_tab)
print(x)
pd.crosstab(x["label0"],x["label1"])

        label1 label0
0    bystander     No
1    bystander     No
2    bystander     No
3    bystander     No
4    bystander     No
..         ...    ...
275    rescuer    Yes
276    rescuer    Yes
277    rescuer    Yes
278    rescuer    Yes
279    rescuer    Yes

[280 rows x 2 columns]


label1,bystander,rescuer
label0,Unnamed: 1_level_1,Unnamed: 2_level_1
No,64,165
Yes,7,44


In [9]:
def test_tab(in_tab):
    obs = xtab2obs(in_tab)
    assert len(obs) == np.sum(np.array(in_tab))
    xtab = pd.crosstab(obs['label0'], obs['label1'])
    assert xtab.equals(in_tab)

In [10]:
test_tab(bystander_tab)

In [11]:
fake_tab = pd.DataFrame([[10, 15, 20],
                         [3, 7, 9],
                         [1, 12, 19],
                         [2, 22, 9]],
                        columns=['col0', 'col1', 'col2'],
                        index=list('ABCD'))
fake_tab

Unnamed: 0,col0,col1,col2
A,10,15,20
B,3,7,9
C,1,12,19
D,2,22,9


In [16]:
test_tab(fake_tab)

## Done.

Congratulations, you're done with the assignment!  Be sure to:

- **run all the tests** (the next cell has a shortcut for that).
- **Save and Checkpoint** from the `File` menu.

In [17]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]