New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge: Tele2 invites you to try your hand at Location-Based Analysis! #78

Closed
infokujur opened this Issue May 8, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@infokujur
Member

infokujur commented May 8, 2017

Tele2 will supply hackathon participants with anonymized VLR (Visitor Location Register) data. This data describes cell switches within a 12-month period in Tele2’s 2G and 3G network – when a device connects to a Tele2 network station. By combining this data with a map of the locations of Tele2 stations, it is possible to put together approximate location-based analysis (LBA).

The data provided is obscured to protect the privacy of our customers, yet this is an exclusive chance to use a mobile operator’s data to build services on location-based data. Maybe you’d like to build an app that suggests kindergartens or schools to parents with a specific route, so they can save time on commuting? Or try your hand at traffic analysis to optimize road planning? Combining the VLR data with other data available at the event and open data available online is bound to yield interesting results.

The amount of data is large and will be accessible to participants through a Hadoop framework. More details will be revealed prior to the event.

@ardih

This comment has been minimized.

Show comment
Hide comment
@ardih

ardih May 15, 2017

Tere,

küsimus - kas andmete hulgas on ka välismaa roaming andmed, st. andmed selle kohta, kas Tele2 kliendid on viibinud Eestist väljas ja seal mobiilimastidesse ühendunud?

Või kui selliseid andmeid seal hulgas pole, siis kas on Tele2'hel teoreetiliselt see andmestik olemas, või see kuulub välismaa operaatoritele ja ligipääs puudub?

ardih commented May 15, 2017

Tere,

küsimus - kas andmete hulgas on ka välismaa roaming andmed, st. andmed selle kohta, kas Tele2 kliendid on viibinud Eestist väljas ja seal mobiilimastidesse ühendunud?

Või kui selliseid andmeid seal hulgas pole, siis kas on Tele2'hel teoreetiliselt see andmestik olemas, või see kuulub välismaa operaatoritele ja ligipääs puudub?

@miljin

This comment has been minimized.

Show comment
Hide comment
@miljin

miljin May 15, 2017

kas andmete hulgas on ka välismaa roaming andmed, st. andmed selle kohta, kas Tele2 kliendid on viibinud Eestist väljas ja seal mobiilimastidesse ühendunud?

Selle kohta andmed datasetis puuduvad, olemas ainult Eesti-sisesed liikumised.

miljin commented May 15, 2017

kas andmete hulgas on ka välismaa roaming andmed, st. andmed selle kohta, kas Tele2 kliendid on viibinud Eestist väljas ja seal mobiilimastidesse ühendunud?

Selle kohta andmed datasetis puuduvad, olemas ainult Eesti-sisesed liikumised.

@miljin

This comment has been minimized.

Show comment
Hide comment
@miljin

miljin May 19, 2017

About the cluster

  • Hortonworks cluster with 4 nodes (node1, node2, node3, node4).
  • Cluster is available from a gateway machine. HBase and Hive shells for manual interactive access are installed, custom Java code also needs to be run there.
  • During the hackathon, you will be provided with access.

You can run HBase command line client as: hbase shell
You can run Hive command line client as: beeline -u jdbc:hive2://node1:10000/ -n hive

About the data and its format

There is VLR data in the cluster and geographic mobile cell coverage maps.

Data in the cluster

  • There is one month of data in the cluster - for September 2016.
  • There are two tables in HBase - one (vlr-by-imsi) is searchable by IMSI (SIM card number), another one by mobile cell ID.
  • Data is stored in the tables in row keys, column keys and values.

More details on the format (even more details are in the Java code):

Table vlr-by-imsi (access entries by IMSI and date)

  • Row key: IMSI hash (1 byte), IMSI (8 bytes), YYYYMM (3 bytes)
  • Column key: message type (1 byte), day (1 byte), hour (1 byte), minute (1 byte), second (1 byte), millisecond (2 bytes)
  • Value: MSISDN (8 bytes), IMSI (8 bytes), IMEI (7 bytes), location area code (2 bytes), cell ID (2 bytes), event type (1 byte)

Table vlr-by-cell (access entries by Cell ID and date)

  • Row key: cell ID (2 bytes), year (2 bytes), month (1 byte), day (1 byte)
  • Column key: message type (1 byte), hour (1 byte), minute (1 byte), second (1 byte), millisecond (2 bytes)
  • Value: MSISDN (8 bytes), IMSI (8 bytes), IMEI (7 bytes), location area code (2 bytes), cell ID (2 bytes), event type (1 byte)

Date/time information in row keys stored in such a way, that a string like “201609” is first converted into an integer like 201609 and then the bytes comprising that integer are saved. In column keys, each part is stored separately in its own place.

MSISDN, IMSI and IMEI are anonymised, but in IMEI the part describing the type of the phone was preserved unchanged, you can use it.

Mobile coverage maps

  • Data is in the GeoJSON format in WGS84 coordinate system.
  • They are located on the gateway machine in the /var/coverage_maps catalog. There is also a Python script for converting WGS84 to Longitude/Latitude.
  • Each file corresponds to a particular technonogy (GSM, 3G, 4G) and transmitter frequency.
  • Files are a bit obfuscated, so cell borders are not precise and beautiful.

About the code examples

hbase_examples.zip

  • They are in Java.
  • Most of the code is self-explanatory, but there are also several useful comments and Javadocs.
  • You can build a JAR with Maven by running “mvn clean package”.
  • HBase properties in createConnection() are preconfigured for use in our cluster.

What you should prepare to build on top of the examples?

(Of course, you are free to use any language or tech if you want)

  • Java 8
  • Maven
  • Java IDE (such as IntelliJ IDEA)
  • SSH client (in case you use Windows, it is PuTTY)
  • SCP client (in case you use Windows, it is WinSCP)

miljin commented May 19, 2017

About the cluster

  • Hortonworks cluster with 4 nodes (node1, node2, node3, node4).
  • Cluster is available from a gateway machine. HBase and Hive shells for manual interactive access are installed, custom Java code also needs to be run there.
  • During the hackathon, you will be provided with access.

You can run HBase command line client as: hbase shell
You can run Hive command line client as: beeline -u jdbc:hive2://node1:10000/ -n hive

About the data and its format

There is VLR data in the cluster and geographic mobile cell coverage maps.

Data in the cluster

  • There is one month of data in the cluster - for September 2016.
  • There are two tables in HBase - one (vlr-by-imsi) is searchable by IMSI (SIM card number), another one by mobile cell ID.
  • Data is stored in the tables in row keys, column keys and values.

More details on the format (even more details are in the Java code):

Table vlr-by-imsi (access entries by IMSI and date)

  • Row key: IMSI hash (1 byte), IMSI (8 bytes), YYYYMM (3 bytes)
  • Column key: message type (1 byte), day (1 byte), hour (1 byte), minute (1 byte), second (1 byte), millisecond (2 bytes)
  • Value: MSISDN (8 bytes), IMSI (8 bytes), IMEI (7 bytes), location area code (2 bytes), cell ID (2 bytes), event type (1 byte)

Table vlr-by-cell (access entries by Cell ID and date)

  • Row key: cell ID (2 bytes), year (2 bytes), month (1 byte), day (1 byte)
  • Column key: message type (1 byte), hour (1 byte), minute (1 byte), second (1 byte), millisecond (2 bytes)
  • Value: MSISDN (8 bytes), IMSI (8 bytes), IMEI (7 bytes), location area code (2 bytes), cell ID (2 bytes), event type (1 byte)

Date/time information in row keys stored in such a way, that a string like “201609” is first converted into an integer like 201609 and then the bytes comprising that integer are saved. In column keys, each part is stored separately in its own place.

MSISDN, IMSI and IMEI are anonymised, but in IMEI the part describing the type of the phone was preserved unchanged, you can use it.

Mobile coverage maps

  • Data is in the GeoJSON format in WGS84 coordinate system.
  • They are located on the gateway machine in the /var/coverage_maps catalog. There is also a Python script for converting WGS84 to Longitude/Latitude.
  • Each file corresponds to a particular technonogy (GSM, 3G, 4G) and transmitter frequency.
  • Files are a bit obfuscated, so cell borders are not precise and beautiful.

About the code examples

hbase_examples.zip

  • They are in Java.
  • Most of the code is self-explanatory, but there are also several useful comments and Javadocs.
  • You can build a JAR with Maven by running “mvn clean package”.
  • HBase properties in createConnection() are preconfigured for use in our cluster.

What you should prepare to build on top of the examples?

(Of course, you are free to use any language or tech if you want)

  • Java 8
  • Maven
  • Java IDE (such as IntelliJ IDEA)
  • SSH client (in case you use Windows, it is PuTTY)
  • SCP client (in case you use Windows, it is WinSCP)
@infokujur

This comment has been minimized.

Show comment
Hide comment
@infokujur

infokujur Feb 2, 2018

Member

No open data here.

Member

infokujur commented Feb 2, 2018

No open data here.

@infokujur infokujur closed this Feb 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment