Network visualisation of first names given in Belgium between 1995 and 2015
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Similarity Network of first names given in Belgium between 1995 and 2015

Data source

Belgian Federal Public Service for Economy, SMEs, Self-Employed and Energy:

Download these *.xls files in your local data folder, next to the src folder


$ python src/ -h

usage: [-h] --inFileXls INFILEXLS [INFILEXLS ...]
                                  [--sheetName SHEETNAME]
                                  [--partOfCountry PARTOFCOUNTRY]
                                  [--startNumber STARTNUMBER]
                                  [--maxNrNames MAXNRNAMES]
                                  [--simThreshold SIMTHRESHOLD]
                                  [--degreeThreshold DEGREETHRESHOLD]
                                  [--rankThreshold RANKTHRESHOLD]
                                  [--bonusMultiplier BONUSMULTIPLIER]
                                  --outFileGraphML OUTFILEGRAPHML

optional arguments:
  -h, --help            show this help message and exit
                        input file(s) with Belgian first names in MS Excel
  --sheetName SHEETNAME
                        name of the sheet - i.e. year - of interest, e.g. 2000
                        (default is 1995 through 2015)
  --partOfCountry PARTOFCOUNTRY
                        1 = whole of Belgium (DEFAULT); 2 = Brussels only;
                        3=Flanders only; 4=Wallonia only)
  --startNumber STARTNUMBER
                        rank number of the highest-ranked first name to
                        include in output graph
  --maxNrNames MAXNRNAMES
                        number of names to store in output network
  --simThreshold SIMTHRESHOLD
                        minimum inter-name similarity for a link to be created
  --degreeThreshold DEGREETHRESHOLD
                        minimum degree required for a node to be included in
                        the output graph
  --rankThreshold RANKTHRESHOLD
                        all nodes below this rank are guaranteed to be
                        included in the output graph
  --bonusMultiplier BONUSMULTIPLIER
                        the edge weights of the nodes below rankThreshold get
                        multiplied with this bonus to increase their chances
                        of survival
                        output file with Belgian first names in GraphML format

How the GraphML files were built

$ mkdir out
$ python src/ --inFileXls data/Voornamen_Jongens_1995-2015_tcm325-239464.xls \\
                                        --outFileGraphML out/firstname.graphml \\
                                        --startNumber 1 --maxNrNames 3751 \\
                                        --simThreshold 0.55 \\
                                        --degreeThreshold 2 \\
                                        --rankThreshold 100

$ python src/ --inFileXls data/Voornamen_meisjes_1995-2015_tcm325-239448.xls \\
                                        --outFileGraphML out/firstname.graphml \\
                                        --startNumber 1 \\
                                        --maxNrNames 4135 \\
                                        --simThreshold 0.55 \\
                                        --degreeThreshold 2 \\
                                        --rankThreshold 100

Visualization with Gephi

  • Load the produced GraphML file into Gephi
  • Color the nodes via Appearance - Nodes - Attributes - Choose an attribute - community
  • Arrange the nodes via Layout - Fruchterman Reingold with the following settings:
    • Area: 100000.0
    • Gravity: 0.5
    • Speed: 10.0
  • Export to PDFs via Preview - Export PDF
  • Export to interactive web page via File - Export - Sigma.js template

Result files and full story

See my blog article