In [2]:
!pip install pysubgroup

Collecting pysubgroup
  Obtaining dependency information for pysubgroup from https://files.pythonhosted.org/packages/c5/43/25c7f35aa880c00b8662dd5a9fb588e0bcbce54a45231f567a45d182ddf9/pysubgroup-0.8.0-py3-none-any.whl.metadata
  Downloading pysubgroup-0.8.0-py3-none-any.whl.metadata (11 kB)
Downloading pysubgroup-0.8.0-py3-none-any.whl (70 kB)
   ---------------------------------------- 0.0/70.5 kB ? eta -:--:--
   ----- ---------------------------------- 10.2/70.5 kB ? eta -:--:--
   ----------------------- ---------------- 41.0/70.5 kB 487.6 kB/s eta 0:00:01
   ---------------------------------------- 70.5/70.5 kB 763.3 kB/s eta 0:00:00
Installing collected packages: pysubgroup
Successfully installed pysubgroup-0.8.0


In [3]:
import pysubgroup as ps

## Explanation
The first line imports pysubgroup package. The following lines load an example dataset (the popular titanic dataset).

Therafter, we define a target, i.e., the property we are mainly interested in (_'survived'}. Then, we define the searchspace as a list of basic selectors. Descriptions are built from this searchspace. We can create this list manually, or use an utility function. Next, we create a SubgroupDiscoveryTask object that encapsulates what we want to find in our search. In particular, that comprises the target, the search space, the depth of the search (maximum numbers of selectors combined in a subgroup description), and the interestingness measure for candidate scoring (here, the Weighted Relative Accuracy measure).

The last line executes the defined task by performing a search with an algorithm---in this case depth first search. The result of this algorithm execution is stored in a SubgroupDiscoveryResults object.

In [4]:
# Load the example dataset
from pysubgroup.datasets import get_titanic_data
data = get_titanic_data()

target = ps.BinaryTarget ('Survived', True)
searchspace = ps.create_selectors(data, ignore=['Survived'])
task = ps.SubgroupDiscoveryTask (
    data,
    target,
    searchspace,
    result_set_size=5,
    depth=2,
    qf=ps.WRAccQF())
result = ps.DFS().execute(task)

In [5]:
print(result.to_dataframe())

    quality                          subgroup  size_sg  size_dataset  \
0  0.132150                     Sex=='female'       56           156   
1  0.101331        Parch==0 AND Sex=='female'       41           156   
2  0.079142    Sex=='female' AND SibSp: [0:1[       25           156   
3  0.077663  Cabin.isnull() AND Sex=='female'       43           156   
4  0.071746   Embarked=='S' AND Sex=='female'       37           156   

   positives_sg  positives_dataset  size_complement  relative_size_sg  \
0            40                 54              100          0.358974   
1            30                 54              115          0.262821   
2            21                 54              131          0.160256   
3            27                 54              113          0.275641   
4            24                 54              119          0.237179   

   relative_size_complement  coverage_sg  coverage_complement  \
0                  0.641026     0.740741             0.259259  

## Key classes

Here is an outline on the most important classes:

Selector: A Selector represents an atomic condition over the data, e.g., age < 50. There several subtypes of Selectors, i.e., NominalSelector (color==BLUE), NumericSelector (age < 50) and NegatedSelector (a wrapper such as not selector1)
SubgroupDiscoveryTask: As mentioned before, encapsulates the specification of how an algorithm should search for interesting subgroups
SubgroupDicoveryResult: These are the main outcome of a subgroup disovery run. You can obtain a list of subgroups using the to_subgroups() or to a dataframe using to_dataframe()
Conjunction: A conjunction is the most widely used SubgroupDescription, and indicates which data instances are covered by the subgroup. It can be seen as the left hand side of a rule.

In [6]:
print(data)

     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
151          152         1       1   
152          153         0       3   
153          154         0       3   
154          155         0       3   
155          156         0       1   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                               Heikkinen, Miss. Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                                 ...     ...   ... 