## Communication Graph

This notebook is dedicated to exploration and pattern finding in my cell phone bills which are in PDF formats. The ultimate goal is to make a graph out of it.

After developing a pattern, I'll make a function or a class to do everything for me.

#### Exploration and Pattern Finding

The first section is just exploration.

In [1]:
# Set up.
%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import matplotlib as plt
import numpy as np
import os
import pandas as pd
import seaborn
import PyPDF2

from tmobile_bill_parser import (
    parse_bill,
    parse_multiple_bills
)

In [2]:
bill_directory = parse_multiple_bills('bills')

### Introduction

- [ ] Make all data the proper datatype.
- [ ] Separate destination into city and state.
- [ ] Numbers must be in roughly the same format.
- [x] Treat Text, Data, and Talk as separate tables or graphs.

In [3]:
text_dfs = [pd.DataFrame(bill_directory[bill_period]['Text']) for bill_period in bill_directory]
data_dfs = [pd.DataFrame(bill_directory[bill_period]['Data']) for bill_period in bill_directory]
talk_dfs = [pd.DataFrame(bill_directory[bill_period]['Talk']) for bill_period in bill_directory]

text_df = pd.concat(text_dfs).reset_index()
data_df = pd.concat(data_dfs).reset_index()
talk_df = pd.concat(talk_dfs).reset_index()

In [6]:
text_df['Amount'].value_counts()
text = text_df.drop(['Amount'], axis=1)

-    14526
Name: Amount, dtype: int64

In [7]:
data_df.info()
data_df.head()
data = data_df.drop(['Amount', 'Origin', 'Type', 'Service'], axis=1)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4632 entries, 0 to 4631
Data columns (total 7 columns):
index            4632 non-null int64
Amount           4632 non-null object
Date and time    4632 non-null object
MB               4632 non-null object
Origin           4632 non-null object
Service          4632 non-null object
Type             4632 non-null object
dtypes: int64(1), object(6)
memory usage: 253.4+ KB


Unnamed: 0,index,Amount,Date and time,MB,Origin,Service,Type
0,0,-,"12/19/16, 2:11 AM",0.0087,-,Mobile Internet,-
1,1,-,"12/19/16, 6:01 AM",0.0097,-,Mobile Internet,-
2,2,-,"12/19/16, 7:04 AM",0.4482,-,Mobile Internet,-
3,3,-,"12/19/16, 8:28 AM",0.0185,-,Mobile Internet,-
4,4,-,"12/19/16, 11:33 AM",0.0097,-,Mobile Internet,-


In [8]:
talk_df['Amount'].value_counts()
talk = talk_df.drop(['Amount', 'Type'], axis=1)

-    1774
Name: Amount, dtype: int64

In [27]:
talk.head()
text.head()
data.head()

Unnamed: 0,index,Date and time,Description,Min,Number
0,0,2016-12-19 07:19:00,Incoming,8,(334) 703-1602
1,1,2016-12-19 07:32:00,Incoming,1,(334) 833-1465
2,2,2016-12-19 08:11:00,Incoming,1,(205) 225-9848
3,3,2016-12-19 08:29:00,Incoming,33,(334) 728-0615
4,4,2016-12-19 13:01:00,Incoming,22,(415) 727-6703


Unnamed: 0,index,Date and time,Destination,Direction,Number,Type
0,0,2016-12-19 07:11:00,"Auburn, AL",Incoming,(334) 703-1602,Text
1,1,2016-12-19 07:14:00,"Auburn, AL",Outgoing,(334) 703-1602,Text
2,2,2016-12-19 07:15:00,"Auburn, AL",Incoming,(334) 703-1602,Text
3,3,2016-12-19 07:17:00,"Auburn, AL",Outgoing,(334) 703-1602,Text
4,4,2016-12-19 08:41:00,-,Incoming,13347031602,Picture


Unnamed: 0,index,Date and time,MB
0,0,2016-12-19 02:11:00,0.0087
1,1,2016-12-19 06:01:00,0.0097
2,2,2016-12-19 07:04:00,0.4482
3,3,2016-12-19 08:28:00,0.0185
4,4,2016-12-19 11:33:00,0.0097


In [29]:
data['Date and time'] = pd.to_datetime(data['Date and time'])
text['Date and time'] = pd.to_datetime(text['Date and time'])
talk['Date and time'] = pd.to_datetime(talk['Date and time'])
data['MB'] = pd.to_numeric(data['MB'])

I think the Data column is good for a seeing usage over a period of time, maybe seeing if there's a pattern in the my activity over the course of a day, or days of the week I'm more active. Otherwise, I may cache that for later.

I think there's a number of graphs to be made form the Text and Talk sets.

#### Text 

- A graph between me and identifiable phone numbers, outgoing.
- A graph between me and identifiable phone numbers, incoming.
- A graph between me (Seattle) and destinations, though this may not be accurate since the destination seems to be based on the area code of the phone number.
- Activity over a day, week, or month.

#### Talk
- A weighted graph showing calls between phone numbers (people) and time talking.
- A graph between me and identifiable phone numbers, outgoing.
- A graph between me and identifiable phone numbers, incoming.