In [1]:
#!pip install ssbkonf

In [2]:
# If you do not have the R packages GaussSuppression, SSBtools, and SmallCountRounding installed, 
# they will be installed locally to your machine the first time you import this package
import ssbkonf as ssb
ssb.__version__

'0.0.9'

## Hierarchies
The package contains some basic functionality from the SSBtools R package, such as some example data sets and automatically detecting hierarchical relationships in data sets.

In [3]:
data = ssb.example_data("z2")
data.head()


Unnamed: 0,region,fylke,kostragr,hovedint,ant
1,A,1.0,300.0,annet,11.0
2,B,4.0,300.0,annet,7.0
3,C,5.0,300.0,annet,5.0
4,D,5.0,300.0,annet,13.0
5,E,6.0,300.0,annet,9.0


The package also contains functionality to detect hierarchies in a data set. The output is a dictionary, where each key is a variable name, and the associated values are pandas.DataFrame. Here each row represents a hierarchical relationship: the entry in `mapsFrom` is a child of the entry in `mapsTo`. Here we can see that it also can handle non-nested hierarchies, i.e., different ways of dividing in geographical regions. The function automatically detects that the columns in `region`, `fylke` and `kostragr` have a hierarchical relationship.

In [4]:
hiers = ssb.util.find_hierarchies(data.loc[:,["region", "fylke", "kostragr"]])
hiers['region']

Unnamed: 0,mapsFrom,mapsTo,sign,level
1,A,1,1,1.0
2,A,Total,1,1.0
3,A,300,1,1.0
4,B,4,1,1.0
5,B,Total,1,1.0
6,B,300,1,1.0
7,C,5,1,1.0
8,C,Total,1,1.0
9,C,300,1,1.0
10,D,5,1,1.0


The `find_hierarchies` function also works for detecting hierarchies in multiple variables: As before, it detects that `region`, `fylke` and `kostragr` are different breakdowns of the same structure, while `hovedint` is a different hierarchy.

In [5]:
hiers = ssb.util.find_hierarchies(data.loc[:,["region", "fylke", "kostragr", "hovedint"]])

In [6]:
hiers['region'].head()

Unnamed: 0,mapsFrom,mapsTo,sign,level
1,A,1,1,1.0
2,A,Total,1,1.0
3,A,300,1,1.0
4,B,4,1,1.0
5,B,Total,1,1.0


In [7]:
hiers['hovedint']

Unnamed: 0,mapsFrom,mapsTo,sign,level
1,annet,Total,1,1.0
2,arbeid,Total,1,1.0
3,soshjelp,Total,1,1.0
4,trygd,Total,1,1.0


## Suppression
The `suppress` submodule contains the following functions for suppressing tables:
- `suppress_small_counts`: a function for primary and secondary suppression of small counts in frequency tables
- `suppress_few_contributors`: a function for primary and secondary suppression of cells with few contributors in magnitude tables
- `suppress_dominant_cells`: a function for primary and secondary suppression of dominant cells in magnitude tables.

Tables can be defined using `dim_var`, `formula`, or `hierarchies`. I suggest looking at the ["Defining Tables" vignette](https://cran.r-project.org/web/packages/GaussSuppression/vignettes/define_tables.html) in the R package GaussSuppression for more details and examples of how these can be used.

In [8]:
ssb.suppress.suppress_small_counts(data, max_n = 3, freq_var = "ant", dim_var = ["region", "fylke", "kostragr", "hovedint"])

[extend0 44*5->44*5]
GaussSuppression_anySum: ............................


Unnamed: 0,region,hovedint,ant,primary,suppressed
1,1,Total,127.0,False,False
2,1,annet,14.0,False,False
3,1,arbeid,11.0,False,False
4,1,soshjelp,64.0,False,False
5,1,trygd,38.0,False,False
...,...,...,...,...,...
96,K,Total,35.0,False,False
97,K,annet,4.0,False,True
98,K,arbeid,2.0,True,True
99,K,soshjelp,18.0,False,False


In [9]:
ssb.suppress.suppress_small_counts(data, max_n = 3, freq_var = "ant", hierarchies = hiers)

[extend0 44*3->44*3]
GaussSuppression_anySum: ............................


Unnamed: 0,region,hovedint,ant,primary,suppressed
1,1,Total,127.0,False,False
2,1,annet,14.0,False,False
3,1,arbeid,11.0,False,False
4,1,soshjelp,64.0,False,False
5,1,trygd,38.0,False,False
...,...,...,...,...,...
96,K,Total,35.0,False,False
97,K,annet,4.0,False,True
98,K,arbeid,2.0,True,True
99,K,soshjelp,18.0,False,False


In [10]:
ssb.suppress.suppress_small_counts(data, max_n = 3, freq_var = "ant", formula  = "~(region + fylke) * hovedint")

[extend0 44*4->44*4]
GaussSuppression_anySum: .........................


Unnamed: 0,region,hovedint,ant,primary,suppressed
1,Total,Total,706.0,False,False
2,A,Total,113.0,False,False
3,B,Total,55.0,False,False
4,C,Total,73.0,False,False
5,D,Total,45.0,False,False
...,...,...,...,...,...
86,8,trygd,23.0,False,False
87,10,annet,13.0,False,True
88,10,arbeid,2.0,True,True
89,10,soshjelp,50.0,False,False


## Perturbation

The `perturb` submodule currently contains only one function for perturbing frequency tables:
- `small_count_rounding`: a function for applying the small count rounding algorithm to frequency tables. It returns pandas DataFrame with the data set, containing the original and rounded frequencies. For more details, see the documentation of the `PLSroundingPublish` function in the `SmallCountRounding` R package.

 All functions take the same interface consisting of `dim_var`, `formula`, and `hierarchies` for table definitions. Future versions of `ssbkonf` will support cell-key perturbation as well.

In [11]:
ssb.perturb.small_count_rounding(data = data, freq_var="ant", dim_var =["region", "fylke", "kostragr", "hovedint"])

Unnamed: 0,region,hovedint,original,rounded,difference
1,1,Total,127.0,125.0,-2.0
2,1,annet,14.0,14.0,0.0
3,1,arbeid,11.0,11.0,0.0
4,1,soshjelp,64.0,64.0,0.0
5,1,trygd,38.0,36.0,-2.0
...,...,...,...,...,...
96,K,Total,35.0,36.0,1.0
97,K,annet,4.0,4.0,0.0
98,K,arbeid,2.0,3.0,1.0
99,K,soshjelp,18.0,18.0,0.0


In [12]:
ssb.perturb.small_count_rounding(data = data, freq_var = "ant", hierarchies = hiers)

Unnamed: 0,region,hovedint,original,rounded,difference
1,1,Total,127.0,125.0,-2.0
2,1,annet,14.0,14.0,0.0
3,1,arbeid,11.0,11.0,0.0
4,1,soshjelp,64.0,64.0,0.0
5,1,trygd,38.0,36.0,-2.0
...,...,...,...,...,...
96,K,Total,35.0,36.0,1.0
97,K,annet,4.0,4.0,0.0
98,K,arbeid,2.0,3.0,1.0
99,K,soshjelp,18.0,18.0,0.0


In [13]:
ssb.perturb.small_count_rounding(data = data, freq_var = "ant", formula = "~(region + fylke) * hovedint")

Unnamed: 0,region,hovedint,original,rounded,difference
1,Total,Total,706.0,705.0,-1.0
2,A,Total,113.0,113.0,0.0
3,B,Total,55.0,54.0,-1.0
4,C,Total,73.0,73.0,0.0
5,D,Total,45.0,43.0,-2.0
...,...,...,...,...,...
86,8,trygd,23.0,23.0,0.0
87,10,annet,13.0,13.0,0.0
88,10,arbeid,2.0,3.0,1.0
89,10,soshjelp,50.0,50.0,0.0
