This script reproduces the statistics published at trygghetsundersokelsen.no 

Please confer the following site for how to access data from the National Crime Survey:
https://trygghetsundersokelsen.no/data.html#en

For data documentation: https://data.trygghetsundersokelsen.no

Content: 
Frequencies for the whole population is calculated for all variables.
But all variables is not plotted against all demographic variables. 
For each variable, the script applies three demographic variables. 
But all demographic variables are used at some point. 
The demographic variables are as follows: 

- Age
- Gender
- Education
- Immigration
- Married
- Children
- County

**Weighting**: All frequencies are weighted using `weight_edu`. The weight is based on demographic variables (age, gender and geography) and education level.

**DISCLAIMER**: OsloMet, ideas2evidence, Frischsenteret and the Ministry of Justice and Public Security do not take any responsibility for the use and interpretation of data from the National Crime Survey. The above mentioned further take no responsibility for any negative consequences that may arise as a result of the use of this data.

In [140]:

# Data import 

import pandas as pd
import numpy as np

with open("path.txt", "r", encoding="utf8") as file:
    path = file.read().replace("\n", "")

print(path)

df = pd.read_spss(path + "Norwegian Crime Survey NSD v1.sav")

#grouping features

groups = ['aldersgruppe_vuttrekk_dsf', 'kjonn_dsf', 'utdanning_kort', 'innvandret', 'gift', 'barn', 'fylkenr_dsf']


C:\Users\ØivindSkjervheim\ideas2evidence AS\i2e - Data\Justisdepartementet\Nasjonal trygghetsundersøkelse\Data og datadokumentasjon\Datafiler\NSD\


### Recoding

Recoding of features. 

All features used are recoded/mapped so that missing-categories are NaN. In addition, some response alternatives are combined/merged.

In [214]:
immigration_mapping = {
    "No": "No",
    "I myself have immigrated to Norway": "Have immigrated to Norway",
    "Both my parents have immigrated to Norway, but I was born in Norway": "Born in Norway, but both or one of the parents immigrated",
    "Mother has immigrated to Norway, but my father and I have not immigrated": "Born in Norway, but both or one of the parents immigrated",
    "Father has immigrated to Norway, but my mother and I have not immigrated": "Born in Norway, but both or one of the parents immigrated",
    #"Not answered": NaN
}

df = df.assign(innvandret = df.innvandret.map(immigration_mapping))

df.innvandret.value_counts()

No                                                           20398
Have immigrated to Norway                                     1986
Born in Norway, but both or one of the parents immigrated     1736
Name: innvandret, dtype: int64

In [215]:
married_mapping = {
    "Yes, married/registered partner": "Married or cohabitant",
    "No, living alone": "Living alone",
    "Yes, cohabitant": "Married or cohabitant"
}

df = df.assign(gift = df.gift.map(married_mapping))

df.gift.value_counts()


Married or cohabitant    15773
Living alone              8323
Name: gift, dtype: int64

In [143]:
df['utdanning_kort'] = df['utdanning_kort'].map({
    "Higher education": "Higher education",
    "Vocational school / High school": "Vocational school / High school",
    "No education / Primary school": "No education / Primary school"
})

df.utdanning_kort.value_counts()

Higher education                   11681
Vocational school / High school     9205
No education / Primary school       2393
Name: utdanning_kort, dtype: int64

In [207]:
df = df.assign(barn = df.barn.map({
    "Yes": "Has children",
    "No": "Does not have children"
}))

df.barn.value_counts()

Has children              15252
Does not have children     8854
Name: barn, dtype: int64

In [145]:
safety1 = ['urtrygg1', 'urtrygg3']
safety2 = ['urtrygg2']
worry = ['urkrim_1', 'urkrim_2', 'urkrim_5', 'urkrim_7', 'urkrim_8', 'urkrim_9', 'urkrim_10']
exposure = ['utsibilde1', 'utsiident1', 'uttbol1', 'uttlom1', 'uttran1', 'uttruss1', 'uttsyk1'] #utvrist, utvslag

# Recode safety variables

df['urtrygg1'] = df['urtrygg1'].map({
    'Very safe': 'Very safe',
    'Pretty safe': 'Pretty safe',
    'Pretty insecure': 'Pretty insecure',
    'Very insecure': 'Very insecure'
    
})

df['urtrygg2'] = df['urtrygg2'].map({
    'Yes, I always plan so that I am as safe as possible': 'Always plan ahead',
    'Sometimes / it depends on the situation': 'Depends on the situation/Does not think about it',
    'No, I\'m not thinking about whether I could be the victim of a crime': 'Depends on the situation/Does not think about it'
    
})

df['urtrygg3'] = df['urtrygg3'].map({
    'To a large degree': 'To a large degree',
    'To some degree': 'To some degree',
    'To a small degree': 'To a small degree',
    'Not at all': 'Not at all'
})

# Recode urkrim-variables
urkrim_mapping = {
    "Quite often": "Quite often",
    "Pretty often": "Pretty often",
    "Pretty rare": "Pretty rare",
    "Quite rare": "Quite rare",
    "Never": "Never"
}

for feature in worry:
    df[feature] = df[feature].map(urkrim_mapping)

# Recode exposure-variables
## utvslag slightly different response scale on utvslag

df['utvslag'] = df['utvslag'].map({
    "Yes, (number of times this happened to you in 2020)": "Yes",
    "No": "No"
})

df['utvrist'] = df['utvrist'].map({
    "Yes, (number of times this happened to you in 2020)": "Yes",
    "No": "No"
})

# same mapping for the rest of them
exposure_mapping = {
    "Yes, (number of times)": "Yes",
    "No": "No"
}

for feature in exposure:
    df[feature] = df[feature].map(exposure_mapping)

df['uttran1'].value_counts().to_clipboard()

### Functions

`weighted_frequency` calculates the frequency of `feature`, using the education weight `weight_edu`.

`weighted_frequency` calculates the frequency of `feature` by `group`, using the education weight `weight_edu`.

In [231]:
def weighted_frequency(df, feature, print_ = False):
    series = df.groupby([feature])['weight_edu'].sum()

     # convert pd.series to pd.dataframe
    x = series.to_frame()

    # rownames to new columns
    x.index.name = feature
    x.reset_index(inplace=True)

     # rename column
    x.rename(columns={'weight_edu': 'count'}, inplace = True)

    # remove decimals
    x = x.round(0)

    # calculate percentage
    x['pct'] = x['count'] / sum(x['count'])

    if(print_):
        print(x)
    
    return x



In [232]:
def weighted_frequency_by(df, group, feature, print_ = False):
    series = df.groupby([group, feature])['weight_edu'].sum()

    # convert pd.series to pd.dataframe
    x = series.to_frame()

    # rownames to new columns
    x.index.name = feature
    x.reset_index(inplace=True)

    # rename column
    x.rename(columns={'weight_edu': 'count'}, inplace = True)

    # remove decimals
    x = x.round(0)

    # Calculate group count and percent per group
    x['group_count'] = x.groupby([group]).transform('sum', numeric_only = True)
    x['pct'] = x['count'] / x['group_count']

    if(print_):
        print(x)

    return x


#weighted_frequency_by(df, 'kjonn_dsf', 'urtrygg1')

#df.groupby('utdanning_kort').apply(weighted_average, 'urtrygg1', 'weight_edu')

#df.groupby('urtrygg1')['weight_edu'].sum()

#df.groupby(['utdanning_kort', 'urtrygg1'])['weight_edu'].sum()


### Analysis

In the following, the data visualized on www.trygghetsundersokelsen.no is reproduces.

In [204]:

# ------------------------------------------------------------------------------------------------ #
# urtrygg1: Percentage that feels unsafe if they go out alone at night in the area where they live #
# ------------------------------------------------------------------------------------------------ #

weighted_frequency(df, 'urtrygg1')

weighted_frequency_by(df, 'utdanning_kort', 'urtrygg1')

weighted_frequency_by(df, 'aldersgruppe_vuttrekk_dsf', 'urtrygg1')

weighted_frequency_by(df, 'fylkenr_dsf', 'urtrygg1')


Unnamed: 0,fylkenr_dsf,urtrygg1,count,group_count,pct
0,Agder,Pretty insecure,54.0,1324.0,0.040785
1,Agder,Pretty safe,526.0,1324.0,0.397281
2,Agder,Very insecure,36.0,1324.0,0.02719
3,Agder,Very safe,708.0,1324.0,0.534743
4,Inland,Pretty insecure,84.0,1624.0,0.051724
5,Inland,Pretty safe,567.0,1624.0,0.349138
6,Inland,Very insecure,42.0,1624.0,0.025862
7,Inland,Very safe,931.0,1624.0,0.573276
8,Møre and Romsdal,Pretty insecure,40.0,1151.0,0.034752
9,Møre and Romsdal,Pretty safe,404.0,1151.0,0.350999


In [211]:
# ------------------------------------------------------------------- #
# urtrygg2: Percentage that always plan ahead to avoid crime exposure #
# ------------------------------------------------------------------- #
weighted_frequency(df, 'urtrygg2')

weighted_frequency_by(df, 'barn', 'urtrygg2')

weighted_frequency_by(df, 'kjonn_dsf', 'urtrygg2')

weighted_frequency_by(df, 'fylkenr_dsf', 'urtrygg2')

Unnamed: 0,fylkenr_dsf,urtrygg2,count,group_count,pct
0,Agder,Always plan ahead,154.0,1377.0,0.111837
1,Agder,Depends on the situation/Does not think about it,1223.0,1377.0,0.888163
2,Inland,Always plan ahead,201.0,1706.0,0.117819
3,Inland,Depends on the situation/Does not think about it,1505.0,1706.0,0.882181
4,Møre and Romsdal,Always plan ahead,127.0,1191.0,0.106633
5,Møre and Romsdal,Depends on the situation/Does not think about it,1064.0,1191.0,0.893367
6,Nordland,Always plan ahead,128.0,1084.0,0.118081
7,Nordland,Depends on the situation/Does not think about it,956.0,1084.0,0.881919
8,Oslo,Always plan ahead,602.0,3125.0,0.19264
9,Oslo,Depends on the situation/Does not think about it,2523.0,3125.0,0.80736


In [217]:
# ----------------------------------------------------------------- #
# urtrygg3: Percentage that is worried about being exposed to crime #
# ----------------------------------------------------------------- #

weighted_frequency(df, 'urtrygg3')

weighted_frequency_by(df, 'innvandret', 'urtrygg3')

weighted_frequency_by(df, 'gift', 'urtrygg3')


Unnamed: 0,gift,urtrygg3,count,group_count,pct
0,Living alone,Not at all,1684.0,7852.0,0.214468
1,Living alone,To a large degree,269.0,7852.0,0.034259
2,Living alone,To a small degree,4374.0,7852.0,0.557056
3,Living alone,To some degree,1525.0,7852.0,0.194218
4,Married or cohabitant,Not at all,2864.0,15624.0,0.183308
5,Married or cohabitant,To a large degree,299.0,15624.0,0.019137
6,Married or cohabitant,To a small degree,9804.0,15624.0,0.627496
7,Married or cohabitant,To some degree,2657.0,15624.0,0.170059


In [220]:
# ------------------------------------------------------------ #
# urkrim_1: Percentage that worries about exposure to burglary #
# ------------------------------------------------------------ #

weighted_frequency(df, 'urkrim_1')

weighted_frequency_by(df, 'barn', 'urkrim_1')

weighted_frequency_by(df, 'barn', 'urkrim_2')

Unnamed: 0,barn,urkrim_2,count,group_count,pct
0,Does not have children,Never,2917.0,7643.0,0.381656
1,Does not have children,Pretty often,524.0,7643.0,0.068559
2,Does not have children,Pretty rare,1589.0,7643.0,0.207903
3,Does not have children,Quite often,121.0,7643.0,0.015831
4,Does not have children,Quite rare,2492.0,7643.0,0.32605
5,Has children,Never,4569.0,16026.0,0.285099
6,Has children,Pretty often,1383.0,16026.0,0.086297
7,Has children,Pretty rare,4220.0,16026.0,0.263322
8,Has children,Quite often,218.0,16026.0,0.013603
9,Has children,Quite rare,5636.0,16026.0,0.351679


In [225]:

# ------- #
# Worries #
# ------- #

for feature in worry:
    weighted_frequency(df, feature)


       urkrim_1   count       pct
0         Never  8872.0  0.373652
1  Pretty often  1690.0  0.071176
2   Pretty rare  5198.0  0.218918
3   Quite often   380.0  0.016004
4    Quite rare  7604.0  0.320249
       urkrim_2   count       pct
0         Never  7511.0  0.316439
1  Pretty often  1917.0  0.080763
2   Pretty rare  5821.0  0.245239
3   Quite often   339.0  0.014282
4    Quite rare  8148.0  0.343276
       urkrim_5    count       pct
0         Never  12797.0  0.543766
1  Pretty often    889.0  0.037775
2   Pretty rare   3505.0  0.148933
3   Quite often    213.0  0.009051
4    Quite rare   6130.0  0.260474
       urkrim_7    count       pct
0         Never  12821.0  0.546621
1  Pretty often    869.0  0.037050
2   Pretty rare   3425.0  0.146024
3   Quite often    287.0  0.012236
4    Quite rare   6053.0  0.258069
       urkrim_8    count       pct
0         Never  16114.0  0.711152
1  Pretty often    704.0  0.031069
2   Pretty rare   2031.0  0.089633
3   Quite often    275.0  0.0121

In [233]:

# -------------------------------------- #
# Worries by demographic characteristics #
# -------------------------------------- #

for feature in worry:
    for group in groups:
        weighted_frequency_by(df, group, feature, print_ = True)

   aldersgruppe_vuttrekk_dsf      urkrim_1   count  group_count       pct
0                16-29 years         Never  2576.0       5002.0  0.514994
1                16-29 years  Pretty often   255.0       5002.0  0.050980
2                16-29 years   Pretty rare   764.0       5002.0  0.152739
3                16-29 years   Quite often    91.0       5002.0  0.018193
4                16-29 years    Quite rare  1316.0       5002.0  0.263095
5                30-59 years         Never  4026.0      11609.0  0.346800
6                30-59 years  Pretty often   829.0      11609.0  0.071410
7                30-59 years   Pretty rare  2596.0      11609.0  0.223620
8                30-59 years   Quite often   188.0      11609.0  0.016194
9                30-59 years    Quite rare  3970.0      11609.0  0.341976
10               60-84 years         Never  2271.0       7132.0  0.318424
11               60-84 years  Pretty often   606.0       7132.0  0.084969
12               60-84 years   Pretty 

In [226]:
# -------- #
# Exposure #
# -------- #

for feature in exposure:
    weighted_frequency(df, feature, print_ = True)


  utsibilde1    count       pct
0         No  23470.0  0.982214
1        Yes    425.0  0.017786
  utsiident1    count       pct
0         No  23510.0  0.987483
1        Yes    298.0  0.012517
  uttbol1    count       pct
0      No  23871.0  0.991238
1     Yes    211.0  0.008762
  uttlom1    count       pct
0      No  23607.0  0.985267
1     Yes    353.0  0.014733
  uttran1    count       pct
0      No  23864.0  0.994209
1     Yes    139.0  0.005791
  uttruss1    count       pct
0       No  22770.0  0.952441
1      Yes   1137.0  0.047559
  uttsyk1    count       pct
0      No  20572.0  0.934242
1     Yes   1448.0  0.065758
  utvrist    count  pct
0      No  23269.0  1.0


In [234]:

# -------------------------------------- #
# Exposure by demographic characteristics #
# -------------------------------------- #

for feature in exposure:
    for group in groups:
        weighted_frequency_by(df, group, feature, print_ = True)

  aldersgruppe_vuttrekk_dsf utsibilde1    count  group_count       pct
0               16-29 years         No   4836.0       5073.0  0.953282
1               16-29 years        Yes    237.0       5073.0  0.046718
2               30-59 years         No  11472.0      11626.0  0.986754
3               30-59 years        Yes    154.0      11626.0  0.013246
4               60-84 years         No   7162.0       7197.0  0.995137
5               60-84 years        Yes     35.0       7197.0  0.004863
  kjonn_dsf utsibilde1    count  group_count       pct
0    Female         No  11736.0      11917.0  0.984812
1    Female        Yes    181.0      11917.0  0.015188
2      Male         No  11733.0      11978.0  0.979546
3      Male        Yes    245.0      11978.0  0.020454
                    utdanning_kort utsibilde1   count  group_count       pct
0                 Higher education         No  8011.0       8101.0  0.988890
1                 Higher education        Yes    90.0       8101.0  0.0111