<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# Telecomm EDA Challenge Lab

_Author: Alex Combs (NYC) _

---

Let's do some Exploratory Data Analysis (EDA)! As a data scientist, you often may find yourself given a data set you've never seen before, and asked to do a rapid analysis. This is today's goal.

# Prompt

You work for a telecommunications company. The company has been storing metadata about customer phone usage, as part of the regular course of business. Currently, this data is sitting in an unsecured database. The company doesn't want to pay to increase their database security, because they don't think there's really anything to be learned from the metadata.

They are under pressure from "right to privacy" organizations to beef up the database security. These organizations argue that you can learn a lot about a person from their cell phone metadata.

The telecom company wants to understand if this is true, and they want your help. They will give you one person's metadata for 2014 and want to see what you can learn from it.

Working in teams, create a report revealing everything you can about the person. Prepare a presentation, with slides, showcasing your findings.


# The Data

The [person's metadata](./datasets/metadata.csv) has the following fields:

| Field Name          | Description
| ---                 | ---
| **Cell Cgi**        | cell phone tower identifier
| **Cell Tower**      | cell phone tower location
| **Comm Identifier** |	de-identified recipient of communication
| **Comm Timedate String** | time of communication
| **Comm Type	Id**  | type of communication
| **Latitude**        | latitude of communication
| **Longitude**       | longitude of communication


# Hints

This is totally open-ended! If you're totally stumped -- and only if stumped -- should you look below for prompts. As a starting point, given that you have geo-locations, consider investigating ways to display this type of information (i.e. mapping functionality).

<font color='white'>
Well for starters, he's in Australia!

Ideas for things to look into:
- where does he work?
- where does he live?
- who does he contact most often?
- what hours does he work?
- did he move?
- did he go on holiday?  If so, where did he go?
- did he get a new phone?

Challenges:
- how does he get to work?
- where does his family live?
- if he went on holiday, can you find which flights he took?
- can you guess who some of his contacts are, based on the frequency, location, time and mode (phone/text) of communications?


If you're stuck on how to map the data, you can try "basemap" or "gmplot", or anything else you find online.
</font>

In [1]:
!conda install folium --yes
!pip install folium

!pip install geopy

# pygmaps

!pip install gpxpy
import gpxpy.geo

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - folium

Current channels:

  - https://repo.continuum.io/pkgs/main/osx-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/osx-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/osx-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/osx-64
  - https://repo.continuum.io/pkgs/pro/noarch


Collecting folium
  Downloading folium-0.5.0.tar.gz (79kB)
[K    100% |████████████████████████████████| 81kB 2.7MB/s ta 0:00:011
[?25hCollecting branca (from folium)
  Downloading branca-0.2.0-py2-none-any.whl
Building wheels for collected packages: folium
  Running setup.py bdist_wheel for folium ... [?25ldone
[?25h  Stored in directory: /Users/sjohnston/Library/Caches/pip/wheels/04/d0/a0/b2b8356443364ae79743fce0b9b6a5b045f7560742129fde22
Successfully built folium
Install

In [13]:
import numpy as np
import pandas as pd

person_data = pd.read_csv('./datasets/metadata.csv')
person_data.head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569


In [14]:
person_data.shape

(10476, 7)

In [15]:
person_data['Comm Identifier'].unique().size

131

In [20]:
person_data['Latitude'].isnull()
person_data['Longitude'].isnull()

0        False
1        False
2        False
3        False
4        False
5        False
6        False
7        False
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16       False
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24       False
25       False
26       False
27       False
28       False
29       False
         ...  
10446    False
10447    False
10448    False
10449    False
10450    False
10451    False
10452    False
10453    False
10454    False
10455    False
10456    False
10457    False
10458    False
10459    False
10460    False
10461    False
10462    False
10463    False
10464    False
10465    False
10466    False
10467    False
10468    False
10469    False
10470    False
10471    False
10472    False
10473    False
10474    False
10475    False
Name: Longitude, Length: 10476, dtype: bool

In [93]:
import folium

print person_data.at[0,'Latitude']
print person_data.at[0,'Longitude']



-33.89293336
151.2022962


In [108]:
latitude = person_data.at[0,'Latitude']
longitude = person_data.at[0,'Longitude']

person_locations = folium.Map(location=[latitude, longitude])

person_locations

In [70]:
person_data['LatLong'] = list(zip(person_data.Latitude,person_data.Longitude))
person_data.head()


Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude,LatLong
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569,"(-33.88032891, 151.2056904)"
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569,"(-33.88032891, 151.2056904)"
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569,"(-33.88032891, 151.2056904)"


In [74]:
person_data['LatLong'].unique()

array([(-33.892933360000001, 151.20229619999998),
       (-33.880328910000003, 151.20569040000001),
       (-33.884171029999997, 151.20235), (-33.880240000000001, 151.20569),
       (-33.861129999999996, 151.21293),
       (-33.796610000000001, 151.27756000000002),
       (-33.796678999999997, 151.285293), (-33.88964, 151.21142),
       (-33.87829, 151.20345), (-33.886400000000002, 151.2088),
       (-33.934159999999999, 151.17938000000001),
       (-42.837620000000001, 147.50575000000001),
       (-42.843379999999996, 147.29568999999998),
       (-42.859840000000005, 147.29214999999999),
       (-42.853070000000002, 147.31531999999999),
       (-42.860599999999998, 147.45419999999999),
       (-33.866549999999997, 151.21033), (-33.87932, 151.23802000000001),
       (-33.896050000000002, 151.17963999999998),
       (-33.892330000000001, 151.21653000000001),
       (-33.878140000000002, 151.21360000000001),
       (-33.870550000000001, 151.20793), (-33.8703, 151.21010000000001),
       

In [75]:
person_data['LatLong'].nunique()

70

In [78]:
top_locations = person_data['LatLong'].value_counts().head(13)
top_locations

(-33.78815, 151.26654)         4301
(-33.88417103, 151.20235)      1084
(-42.84338, 147.29569)          723
(-33.89293336, 151.2022962)     712
(-33.88032891, 151.2056904)     563
(-42.85984, 147.29215)          501
(-33.779333, 151.276901)        465
(-33.79661, 151.27756)          454
(-33.796679, 151.285293)        231
(-42.85307, 147.31532)          197
(-33.87829, 151.20345)          161
(-36.3567, 146.7136)            112
(-33.793648, 151.263934)        106
Name: LatLong, dtype: int64

In [105]:
top_locations[1]

1084

In [106]:
latitude = person_data.at[0,'Latitude']
longitude = person_data.at[0,'Longitude']

person_locations = folium.Map(location=[latitude, longitude])
#person_locations

counter = -1
for latlong in top_locations.index:
    counter += 1
    count_of_visits = top_locations[counter]
    
    latitude = latlong[0]
    longitude = latlong[1]

    folium.Marker([latitude, longitude],
        popup=str(count_of_visits),
        icon=folium.Icon(color='green')).add_to(person_locations)

person_locations

