Belgian Federal Public Service for Economy, SMEs, Self-Employed and Energy:
Download these *.xls files in your local data folder, next to the src folder
$ python src/build_firstname_network.py -h
usage: build_firstname_network.py [-h] --inFileXls INFILEXLS [INFILEXLS ...]
[--sheetName SHEETNAME]
[--partOfCountry PARTOFCOUNTRY]
[--startNumber STARTNUMBER]
[--maxNrNames MAXNRNAMES]
[--simThreshold SIMTHRESHOLD]
[--degreeThreshold DEGREETHRESHOLD]
[--rankThreshold RANKTHRESHOLD]
[--bonusMultiplier BONUSMULTIPLIER]
--outFileGraphML OUTFILEGRAPHML
optional arguments:
-h, --help show this help message and exit
--inFileXls INFILEXLS [INFILEXLS ...]
input file(s) with Belgian first names in MS Excel
format
--sheetName SHEETNAME
name of the sheet - i.e. year - of interest, e.g. 2000
(default is 1995 through 2015)
--partOfCountry PARTOFCOUNTRY
1 = whole of Belgium (DEFAULT); 2 = Brussels only;
3=Flanders only; 4=Wallonia only)
--startNumber STARTNUMBER
rank number of the highest-ranked first name to
include in output graph
--maxNrNames MAXNRNAMES
number of names to store in output network
--simThreshold SIMTHRESHOLD
minimum inter-name similarity for a link to be created
--degreeThreshold DEGREETHRESHOLD
minimum degree required for a node to be included in
the output graph
--rankThreshold RANKTHRESHOLD
all nodes below this rank are guaranteed to be
included in the output graph
--bonusMultiplier BONUSMULTIPLIER
the edge weights of the nodes below rankThreshold get
multiplied with this bonus to increase their chances
of survival
--outFileGraphML OUTFILEGRAPHML
output file with Belgian first names in GraphML format
$ mkdir out
$ python src/build_firstname_network.py --inFileXls data/Voornamen_Jongens_1995-2015_tcm325-239464.xls \\
--outFileGraphML out/firstname.graphml \\
--startNumber 1 --maxNrNames 3751 \\
--simThreshold 0.55 \\
--degreeThreshold 2 \\
--rankThreshold 100
$ python src/build_firstname_network.py --inFileXls data/Voornamen_meisjes_1995-2015_tcm325-239448.xls \\
--outFileGraphML out/firstname.graphml \\
--startNumber 1 \\
--maxNrNames 4135 \\
--simThreshold 0.55 \\
--degreeThreshold 2 \\
--rankThreshold 100
Visualization with Gephi
- Load the produced GraphML file into Gephi
- Color the nodes via Appearance - Nodes - Attributes - Choose an attribute - community
- Arrange the nodes via Layout - Fruchterman Reingold with the following settings:
- Area: 100000.0
- Gravity: 0.5
- Speed: 10.0
- Export to PDFs via Preview - Export PDF
- Export to interactive web page via File - Export - Sigma.js template
See my blog article