Skip to content

Latest commit

 

History

History
143 lines (122 loc) · 8.85 KB

README.org

File metadata and controls

143 lines (122 loc) · 8.85 KB

Countries with (C/X)DR studies

Leo Ferres (& others) UDD & Telefónica I+D lferres@udd.cl

Introduction

People have been asking me why CDR investigations are only carried out in “third world” countries like Chile (??!!), there’s even a bit of a more formal (but limited) study coming to the same conclusions. After seeing much work in the area and knowning this wasn’t right, I asked this twitter question, and took the Netmob 2010, 2011, 2013, 2015 and 2017 booklets of abstracts that can be found here and ran the following code on them:

Processing

#!/bin/bash
echoerr() { echo "$@" 1>&2; }
while read line; do
echoerr $line
echo $line
pdfgrep --color=always --cache -n -i "$line" netmob_abstracts_*.pdf
done < listofcountries.txt

I only worked with the oral presentation booklets (not posters or the D4D challenge). This produced

wc -l bib/countriesfound.txt

66135 bib/countriesfound.txt

using a listofcountries.txt file that I found on the internet. Many of these lines are false positives though, either because of mismatches “Mali” -> “Nor*mali*zed” or other similar effects. I then spent some time checking out the files by hand, and recording the ones that were effectively a mention of a specific country’s CDR dataset.

Results

The following table is simply an “existence” table, I’m not exhaustive, but rather would like to show which countries have been studied using these methods at least once.

ContinentCountryPageSubscribersLength (mo)YearContributor
EuropeAndorra105126429212017NetMob
America (South)Argentina10440000000?52013NetMob
EuropeAustria55??2013NetMob
EuropeBelgiumsource250000062009@leoferres
America (South)Brazil46?0.62013NetMob
America (South)Chile1261429880.52017NetMob
Asia (East)China85?62017NetMob
EuropeEstonia87488710.32015NetMob
EuropeFrancesource485000005 (2007)2018@Metti_Hoof
EuropePortugal72?62013NetMob
EuropeSpain72?62013NetMob
America (Central)Haiti74290000022015NetMob
Asia (South)India109400000032015NetMob
EuropeIreland57500000?2011NetMob
EuropeItaly56?102010NetMob
America (Central)Mexico421000000s62015NetMob
AfricaNamibia1174500000502017NetMob
AfricaSenegal1179500000122017NetMob
AsiaNepal5812900000?2017NetMob
EuropeNetherlands103?362011NetMob
EuropeNorway145509?2013NetMob
AfricaRwandasource400000/1500000562015@deaneckles
EuropeSlovenia173500012013NetMob
AsiaSri Lanka75??2017NetMob
EuropeSwitzerland11838102013NetMob
AfricaTanzania14641500042017NetMob
America (North)United States4747500022011NetMob
AsiaBangladeshsource510000032016@arutherfordium
Europe?Englandsource6500000012010@arutherfordium
AsiaPakistansource3900000072015@arutherfordium
AsiaTurkeysource3500000?2017@arutherfordium
EuropeSwitzerlandsource2700000122019@ProfDiegoPuga
America (South)Colombiasource700000062018@danielapaolotti
AfricaCote D’Ivoiresource500000052012@lbravoc

Conclusions

These are some general conclusions I glean from the table above. They are, alas, not scientific at this point, but anecdotal and would be happy to discuss them. In fact, someone should do a much more in-depth/serious study and let the community know. For now, this should suffice for me so I can just redirect some types of questions I get to this website.

  1. There’s not really a preference for non-European/developing countries, at least not in this “there is (at least one) dataset for country X” review table,
  2. the above being said, it does seem that CDR work prioritizes certain countries (Haiti, as the foremost example), but they also seem to do so for humanitarian reasons, instead of less-strict privacy laws (people will do whatever they can to help, including giving out otherwise sensitive information… these are not leaks),
  3. most of these studies analyze mobile data from their own contries rather than taking data from other countries, except maybe the D4R Challenge and Haiti datasets, which were designed for external help.

Notes

  1. This is just one conference (albeit the most prominent one, NetMob) and still, not all papers have been included, meaning I’m completely sure that there area many, many other countries/regions that have been studies using C/XDR datasets. [ NB: As more submissions trickle in, I will have to add other sources. ]
  2. Sometimes, there may be little information about a dataset in a given country, but then it has been studied further in some other paper. I have recorded the page and edition of NetMob with the most information.
  3. There might also be some points where I’ve missed a piece of information, or even a better dataset from the same region. This should not impact strongly (or logically negatively) on the fact that there exists a dataset for that region.
  4. This is of course, and by necessity, quick and dirty. Anyone can ask me for pull requests, it’d be fantastic to have a rather complete list of datasets that have been published. I might come back to this running a more exhaustive search in the Netmob pages, or I might not, but one thing that could be done is search for all instances of the word “data” and see if there are other countries that were not picked up by the countries’ restrictive regular expressions (or more likely cities as well).

Acknowledgements

I’d like to thank the following people for their Twitter replies: Esteban Moro, Martha Gonzalez, Jari Saramaki, Nuria Oliver, Erki Saluveer, Yves-Alexander de Montjoye, Alex Rutherford.

Hope it’s useful.