<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

In [None]:
from scipy.stats.contingency import crosstab

Given the lists `a` and `x`, create a contingency table that counts the
frequencies of the corresponding pairs.


In [None]:
a = ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B']
x = ['X', 'X', 'X', 'Y', 'Z', 'Z', 'Y', 'Y', 'Z', 'Z']
res = crosstab(a, x)
avals, xvals = res.elements
avals

array(['A', 'B'], dtype='<U1')

In [None]:
xvals

array(['X', 'Y', 'Z'], dtype='<U1')

In [None]:
res.count

array([[2, 3, 0],
       [1, 0, 4]])

So ``('A', 'X')`` occurs twice, ``('A', 'Y')`` occurs three times, etc.

Higher dimensional contingency tables can be created.


In [None]:
p = [0, 0, 0, 0, 1, 1, 1, 0, 0, 1]
res = crosstab(a, x, p)
res.count

array([[[2, 0],
        [2, 1],
        [0, 0]],
       [[1, 0],
        [0, 0],
        [1, 3]]])

In [None]:
res.count.shape

(2, 3, 2)

The values to be counted can be set by using the `levels` argument.
It allows the elements of interest in each input sequence to be
given explicitly instead finding the unique elements of the sequence.

For example, suppose one of the arguments is an array containing the
answers to a survey question, with integer values 1 to 4.  Even if the
value 1 does not occur in the data, we want an entry for it in the table.


In [None]:
q1 = [2, 3, 3, 2, 4, 4, 2, 3, 4, 4, 4, 3, 3, 3, 4]  # 1 does not occur.
q2 = [4, 4, 2, 2, 2, 4, 1, 1, 2, 2, 4, 2, 2, 2, 4]  # 3 does not occur.
options = [1, 2, 3, 4]
res = crosstab(q1, q2, levels=(options, options))
res.count

array([[0, 0, 0, 0],
       [1, 1, 0, 1],
       [1, 4, 0, 1],
       [0, 3, 0, 3]])

If `levels` is given, but an element of `levels` is None, the unique values
of the corresponding argument are used. For example,


In [None]:
res = crosstab(q1, q2, levels=(None, options))
res.elements

[array([2, 3, 4]), [1, 2, 3, 4]]

In [None]:
res.count

array([[1, 1, 0, 1],
       [1, 4, 0, 1],
       [0, 3, 0, 3]])

If we want to ignore the pairs where 4 occurs in ``q2``, we can
give just the values [1, 2] to `levels`, and the 4 will be ignored:


In [None]:
res = crosstab(q1, q2, levels=(None, [1, 2]))
res.elements

[array([2, 3, 4]), [1, 2]]

In [None]:
res.count

array([[1, 1],
       [1, 4],
       [0, 3]])

Finally, let's repeat the first example, but return a sparse matrix:


In [None]:
res = crosstab(a, x, sparse=True)
res.count

<COOrdinate sparse matrix of dtype 'int64'
    with 4 stored elements and shape (2, 3)>

In [None]:
res.count.toarray()

array([[2, 3, 0],
       [1, 0, 4]])