# Crosstabulation, but more general

In [None]:
# Don't change this cell; just run it.
import numpy as np  # The array library.

import pandas as pd
# Safe setting for Pandas.  Needs Pandas version >= 1.5.
pd.set_option('mode.copy_on_write', True)

# The OKpy testing system.
from client.api.notebook import Notebook
ok = Notebook('crosstab_for.ok')

## A more general form

The challenge is to write a general cell that will go from a Counts DataFrame,
to an Observations DataFrame.

Here's our example Counts DataFrame.

In [None]:
bystander_tab = pd.DataFrame([[64, 165],
                              [7, 44]],
                             columns=['bystander', 'rescuer'],
                             index=['No', 'Yes'])
bystander_tab

This is a cross-tabulation for which the raw material was an *observations*
table / DataFrame.  That observations table had one row per person, two
columns, where the first column has "Yes" or "No" for whether that person was a
member of a political party, and the second has "bystander" or "rescuer".

Here is the long-hand version of that, from the "noble_politics" page:

In [None]:
row_lists = (
    # The No rows
    [['No', 'bystander']] * bystander_tab.loc['No', 'bystander'] +
    [['No', 'rescuer']] * bystander_tab.loc['No', 'rescuer'] +
    # The Yes rows
    [['Yes', 'bystander']] * bystander_tab.loc['Yes', 'bystander'] +
    [['Yes', 'rescuer']] * bystander_tab.loc['Yes', 'rescuer']
)
people = pd.DataFrame(row_lists,
                      columns=['party_yn', 'respondent'])
people

We showed that, indeed, crosstabulating the observations DataFrame replicates
the original Counts DataFrame.

In [None]:
# This corresponds to the input crosstab data frame.
pd.crosstab(people['party_yn'], people['respondent'])

Your job is to make a general cell, that works on a counts DataFrame called
`counts_df`.

From `counts_df`, it should generate an observations DataFrame called
`observations`.

The columns of `observations` should be called 'label0', 'label0'.

Your cell should work with any number of rows (labels) and columns in the
input data frame.  The example above as 2 row labels and 2 column labels, but
your cell should work for $m$ rows and $n$ columns, where $m$, $n$ can be any
number.

To do this, you may want to use *nested* `for` loops.   For example, consider
the following:

In [None]:
for row_label in ['one', 'two']:
    for col_label in ['A', 'B']:
        pair = [row_label, col_label]
        print(pair)

Then have a look at the structure of the code we used above, to see if you can
use a for loop like this to work on any counts table.

Let's start by setting `counts_df` to have our example `bystander_tab`.

In [None]:
counts_df = bystander_tab

But - when you are ready (see below), then try uncommenting the code cell below
(by deleting the `#` signs), to make a different, and more difficult
`counts_df`.

In [None]:
# counts_df = pd.DataFrame([[10, 15, 20], [3, 7, 9], [1, 12, 19], [2, 22, 9]],
#                        columns=['col0', 'col1', 'col2'],
#                        index=list('ABCD'))

Now, in the next cell, write some code that will work for any $m$ by $n$ counts
table, with any number of rows or columns.


In [None]:
#- Your code here
...
observations = ...
# Show the first rows of the result
observations[:10]

Check the result with a crosstab of the two columns:

In [None]:
pd.crosstab(observations['label0'], observations['label1'])

When that is working for the original 2 x 2 table, try uncommenting the more
difficult table above, and rerunning.

## Done.

Congratulations, you're done with the assignment!  Be sure to:

- **run all the tests** (the next cell has a shortcut for that).
- **Save and Checkpoint** from the `File` menu.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]