<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# Telecomm EDA Challenge Lab

_Author: Alex Combs (NYC) _

---

Let's do some Exploratory Data Analysis (EDA)! As a data scientist, you often may find yourself given a data set you've never seen before, and asked to do a rapid analysis. This is today's goal.

# Prompt

You work for a telecommunications company. The company has been storing metadata about customer phone usage, as part of the regular course of business. Currently, this data is sitting in an unsecured database. The company doesn't want to pay to increase their database security, because they don't think there's really anything to be learned from the metadata.

They are under pressure from "right to privacy" organizations to beef up the database security. These organizations argue that you can learn a lot about a person from their cell phone metadata.

The telecom company wants to understand if this is true, and they want your help. They will give you one person's metadata for 2014 and want to see what you can learn from it.

Working in teams, create a report revealing everything you can about the person. Prepare a presentation, with slides, showcasing your findings.


# The Data

The [person's metadata](./datasets/metadata.csv) has the following fields:

| Field Name          | Description
| ---                 | ---
| **Cell Cgi**        | cell phone tower identifier
| **Cell Tower**      | cell phone tower location
| **Comm Identifier** |	de-identified recipient of communication
| **Comm Timedate String** | time of communication
| **Comm Type	Id**  | type of communication
| **Latitude**        | latitude of communication
| **Longitude**       | longitude of communication


# Hints

This is totally open-ended! If you're totally stumped -- and only if stumped -- should you look below for prompts. As a starting point, given that you have geo-locations, consider investigating ways to display this type of information (i.e. mapping functionality).

<font color='white'>
Well for starters, he's in Australia!

Ideas for things to look into:
- where does he work?
- where does he live?
- who does he contact most often?
- what hours does he work?
- did he move?
- did he go on holiday?  If so, where did he go?
- did he get a new phone?

Challenges:
- how does he get to work?
- where does his family live?
- if he went on holiday, can you find which flights he took?
- can you guess who some of his contacts are, based on the frequency, location, time and mode (phone/text) of communications?


If you're stuck on how to map the data, you can try "basemap" or "gmplot", or anything else you find online.
</font>

In [16]:
!pip install folium
import folium
import numpy as np
import pandas as pd

from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression
from sklearn.model_selection import cross_val_score, train_test_split

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

meta = pd.read_csv('./datasets/metadata.csv')


            Cell Cgi                 Cell Tower Location  \
0      50501015388B9                          REDFERN TE   
1      50501015388B9                          REDFERN TE   
2      505010153111F                         HAYMARKET #   
3      505010153111F                         HAYMARKET #   
4          5.05E+106                         HAYMARKET #   
5      5050101532B23                         CHIPPENDALE   
6      5050101536E5E                         CHIPPENDALE   
7      5050101531F08                          REDFERN TE   
8      505010153111F                         HAYMARKET #   
9      505010153111F                         HAYMARKET #   
10     50501015388BC                          REDFERN TE   
11     50501015388BC                          REDFERN TE   
12     50501015388BC                          REDFERN TE   
13     5050101537A4A                         CHIPPENDALE   
14     505010153111F                         HAYMARKET #   
15         5.05E+106                    

[10476 rows x 7 columns]


Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569


In [18]:

print meta
meta.head()

            Cell Cgi                 Cell Tower Location  \
0      50501015388B9                          REDFERN TE   
1      50501015388B9                          REDFERN TE   
2      505010153111F                         HAYMARKET #   
3      505010153111F                         HAYMARKET #   
4          5.05E+106                         HAYMARKET #   
5      5050101532B23                         CHIPPENDALE   
6      5050101536E5E                         CHIPPENDALE   
7      5050101531F08                          REDFERN TE   
8      505010153111F                         HAYMARKET #   
9      505010153111F                         HAYMARKET #   
10     50501015388BC                          REDFERN TE   
11     50501015388BC                          REDFERN TE   
12     50501015388BC                          REDFERN TE   
13     5050101537A4A                         CHIPPENDALE   
14     505010153111F                         HAYMARKET #   
15         5.05E+106                    

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569


In [27]:

#
m = folium.Map([43, -100], tiles='cartodbpositron', zoom_start=4)


In [29]:
geo_json_data = json.load(open(us_states))
folium.GeoJson(
      geo_json_data,
    style_function=lambda feature: {
        'fillColor': step( [feature['id']]),
        'color': 'black',
        'weight': 2,
        'dashArray': '5, 5'
    }
).add_to(m)

NameError: name 'json' is not defined

In [30]:
meta

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.205690
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.205690
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.205690
5,5050101532B23,CHIPPENDALE,6bbc17070aa91e2dab7909b96c6eecbd6109ba56,4/1/14 17:36,Phone,-33.884171,151.202350
6,5050101536E5E,CHIPPENDALE,6bbc17070aa91e2dab7909b96c6eecbd6109ba56,4/1/14 17:40,Phone,-33.884171,151.202350
7,5050101531F08,REDFERN TE,7cb96eadd3ff95e25406d24794027c443c0661c5,4/2/14 19:18,Phone,-33.892933,151.202296
8,505010153111F,HAYMARKET #,de40c5c1f9249f95f7fb216931db58747afef74f,4/3/14 14:35,Phone,-33.880329,151.205690
9,505010153111F,HAYMARKET #,66f32c1163d0e597983b65c51f5a477070ad3785,4/3/14 14:36,Phone,-33.880329,151.205690


In [31]:
import folium
map_osm =folium.Map(location=[45.5236, -122.6750])

In [43]:
map_osm

In [51]:
location = meta([Latitude], [Longitude])
comm_type = meta([Comm Type])

for X in location:
    print x

SyntaxError: invalid syntax (<ipython-input-51-eae018866e92>, line 2)

In [116]:
meta.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10476 entries, 0 to 10475
Data columns (total 8 columns):
Cell Cgi                10476 non-null object
Cell Tower Location     10476 non-null object
Comm Identifier         1374 non-null object
Comm Timedate String    10476 non-null object
Comm Type               10476 non-null object
Latitude                10476 non-null float64
Longitude               10476 non-null float64
LatLong                 10476 non-null object
dtypes: float64(2), object(6)
memory usage: 654.8+ KB


In [53]:
meta.head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569


meta.info()

In [88]:
#show attribute of unique values
meta['Comm Type'].unique()

#count number of unique value n unique as onword
meta['Comm Type'].nunique()

3

In [59]:
meta[meta['Comm Type'] == 'Phone'].head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569


In [74]:
meta['LatLong'] = list(zip(meta.Latitude,meta.Longitude))

In [79]:
meta.head()

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude,LatLong
0,50501015388B9,REDFERN TE,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 9:40,Phone,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
1,50501015388B9,REDFERN TE,62157ccf2910019ffd915b11fa037243b75c1624,4/1/14 9:42,Phone,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
2,505010153111F,HAYMARKET #,c8f92bd0f4e6fb45ed7fce96fc831b283db2b642,4/1/14 13:13,Phone,-33.880329,151.20569,"(-33.88032891, 151.2056904)"
3,505010153111F,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 13:13,Phone,-33.880329,151.20569,"(-33.88032891, 151.2056904)"
4,5.05E+106,HAYMARKET #,f1a6836c0b7a3415a19a90fdd6f0ae18484d6d1e,4/1/14 17:27,Phone,-33.880329,151.20569,"(-33.88032891, 151.2056904)"


In [94]:
phone_sms = meta[meta['Comm Type'].isin(['Phone','SMS'])]
phone_internet = meta[meta['Comm Type'].isin(['Phone','Internet'])]
sms_internet = meta[meta['Comm Type'].isin(['SMS','Internet'])]
# meta[phone_sms]
phone_sms
phone_internet
sms_internet

Unnamed: 0,Cell Cgi,Cell Tower Location,Comm Identifier,Comm Timedate String,Comm Type,Latitude,Longitude,LatLong
10,50501015388BC,REDFERN TE,bc0b01860486b0f0a240ce8419d3d7553fe404ab,4/4/14 9:12,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
24,50501015388BC,REDFERN TE,bc0b01860486b0f0a240ce8419d3d7553fe404ab,4/7/14 12:31,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
25,50501015388BC,REDFERN TE,70e1f163d854d4e9b63e9a3f4056ced467567d85,4/7/14 18:18,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
28,5050101531F08,REDFERN TE,70e1f163d854d4e9b63e9a3f4056ced467567d85,4/7/14 18:39,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
29,5050101531F08,REDFERN TE,70e1f163d854d4e9b63e9a3f4056ced467567d85,4/7/14 18:39,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
30,5050101531F05,REDFERN TE,70e1f163d854d4e9b63e9a3f4056ced467567d85,4/7/14 18:54,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
31,5050101531F05,REDFERN TE,bc0b01860486b0f0a240ce8419d3d7553fe404ab,4/7/14 19:13,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
32,50501015388BB,REDFERN TE,bc0b01860486b0f0a240ce8419d3d7553fe404ab,4/7/14 21:22,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
33,5050101531F08,REDFERN TE,70e1f163d854d4e9b63e9a3f4056ced467567d85,4/8/14 20:34,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"
34,5050101531F08,REDFERN TE,70e1f163d854d4e9b63e9a3f4056ced467567d85,4/8/14 20:35,SMS,-33.892933,151.202296,"(-33.89293336, 151.2022962)"


In [114]:
phone_sms['LatLong'].value_counts()

#phone_internet['LatLong'].value_counts()

#sms_internet['LatLong'].value_counts()

(-33.89293336, 151.2022962)    392
(-33.78815, 151.26654)         372
(-33.88032891, 151.2056904)    159
(-33.88417103, 151.20235)      103
(-33.779333, 151.276901)        57
(-33.87829, 151.20345)          38
(-33.79661, 151.27756)          35
(-33.8864, 151.2088)            20
(-42.84338, 147.29569)          19
(-33.796679, 151.285293)        14
(-42.85984, 147.29215)          11
(-33.88058, 151.20046)          10
(-33.793648, 151.263934)        10
(-33.86113, 151.21293)           9
(-33.884603, 151.195643)         8
(-42.83762, 147.50575)           8
(-33.93416, 151.17938)           7
(-42.85307, 147.31532)           6
(-33.88024, 151.20569)           6
(-33.86633, 151.20469)           4
(-33.87055, 151.20793)           4
(-33.75949, 151.28135)           4
(-33.87513, 151.20584)           4
(-33.86655, 151.21033)           3
(-36.3567, 146.7136)             3
(-33.79948, 151.28934)           3
(-33.791965, 151.286589)         3
(-33.88964, 151.21142)           3
(-33.6038, 151.1639)

In [99]:
y = 'LatLong'
N = phone_sms['LatLong'].value_counts()
x = range('LatLong')
width = 1/1.5
plt.bar(x, y, width, color="blue")

TypeError: range() integer end argument expected, got str.

In [103]:
y = 'LatLong'
N = phone_sms['LatLong'].value_counts()
x = range('LatLong')
width = 1/1.5
meta.plt.bar(x, y, width, color="blue")

TypeError: range() integer end argument expected, got str.

In [118]:
import numpy as np

data = (np.random.normal(size=(100, 3)) *
        np.array([[1, 1, 1]]) +
        np.array([[48, 5, 1]])).tolist()






In [119]:
from folium.plugins import HeatMap

m = folium.Map([48., 5.], tiles='stamentoner', zoom_start=6)

HeatMap(data).add_to(m)

m.save(os.path.join('results', 'Heatmap.html'))

m

NameError: name 'os' is not defined

In [121]:
top = phone_sms['LatLong'].value_counts(10)
top.plot.bar()

<matplotlib.axes._subplots.AxesSubplot at 0x10d18bfd0>