## Communication Graph

This notebook is dedicated to exploration and pattern finding in my cell phone bills which are in PDF formats. The ultimate goal is to make a graph out of it.

After developing a pattern, I'll make a function or a class to do everything for me.

#### Exploration and Pattern Finding

The first section is just exploration.

In [1]:
# Set up.
%matplotlib inline
%pdb on

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import matplotlib as plt
import numpy as np
import os
import pandas as pd
import seaborn
import PyPDF2

from tmobile_bill_parser import (
    parse_bill,
    parse_multiple_bills
)

Automatic pdb calling has been turned ON


In [2]:
bill_directory = parse_multiple_bills('bills')

### Introduction

- [ ] Make all data the proper datatype.
- [ ] Separate destination into city and state.
- [ ] Numbers must be in roughly the same format.
- [x] Treat Text, Data, and Talk as separate tables or graphs.

In [3]:
text_dfs = [pd.DataFrame(bill_directory[bill_period]['Text']) for bill_period in bill_directory]
data_dfs = [pd.DataFrame(bill_directory[bill_period]['Data']) for bill_period in bill_directory]
talk_dfs = [pd.DataFrame(bill_directory[bill_period]['Talk']) for bill_period in bill_directory]

text_df = pd.concat(text_dfs).reset_index()
data_df = pd.concat(data_dfs).reset_index()
talk_df = pd.concat(talk_dfs).reset_index()

In [4]:
text_df.head()

Unnamed: 0,index,Amount,Date and time,Destination,Direction,Number,Type
0,0,-,"12/19/16, 7:11 AM","Auburn, AL",Incoming,(334) 703-1602,Text
1,1,-,"12/19/16, 7:14 AM","Auburn, AL",Outgoing,(334) 703-1602,Text
2,2,-,"12/19/16, 7:15 AM","Auburn, AL",Incoming,(334) 703-1602,Text
3,3,-,"12/19/16, 7:17 AM","Auburn, AL",Outgoing,(334) 703-1602,Text
4,4,-,"12/19/16, 8:41 AM",-,Incoming,13347031602,Picture


In [5]:
text_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14526 entries, 0 to 14525
Data columns (total 7 columns):
index            14526 non-null int64
Amount           14526 non-null object
Date and time    14526 non-null object
Destination      14526 non-null object
Direction        14526 non-null object
Number           14526 non-null object
Type             14526 non-null object
dtypes: int64(1), object(6)
memory usage: 794.5+ KB


In [8]:
text_df['Amount'].value_counts()
text = text_df.drop(['Amount'], axis=1)

-    14526
Name: Amount, dtype: int64

Unnamed: 0,index,Date and time,Destination,Direction,Number,Type
0,0,"12/19/16, 7:11 AM","Auburn, AL",Incoming,(334) 703-1602,Text
1,1,"12/19/16, 7:14 AM","Auburn, AL",Outgoing,(334) 703-1602,Text
2,2,"12/19/16, 7:15 AM","Auburn, AL",Incoming,(334) 703-1602,Text
3,3,"12/19/16, 7:17 AM","Auburn, AL",Outgoing,(334) 703-1602,Text
4,4,"12/19/16, 8:41 AM",-,Incoming,13347031602,Picture


In [10]:
data_df.info()
data_df.head()
data = data_df.drop(['Amount', 'Origin', 'Type'], axis=1)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4632 entries, 0 to 4631
Data columns (total 7 columns):
index            4632 non-null int64
Amount           4632 non-null object
Date and time    4632 non-null object
MB               4632 non-null object
Origin           4632 non-null object
Service          4632 non-null object
Type             4632 non-null object
dtypes: int64(1), object(6)
memory usage: 253.4+ KB


Unnamed: 0,index,Amount,Date and time,MB,Origin,Service,Type
0,0,-,"12/19/16, 2:11 AM",0.0087,-,Mobile Internet,-
1,1,-,"12/19/16, 6:01 AM",0.0097,-,Mobile Internet,-
2,2,-,"12/19/16, 7:04 AM",0.4482,-,Mobile Internet,-
3,3,-,"12/19/16, 8:28 AM",0.0185,-,Mobile Internet,-
4,4,-,"12/19/16, 11:33 AM",0.0097,-,Mobile Internet,-


In [22]:
talk_df['Amount'].value_counts()
talk = talk_df.drop(['Amount', 'Type'], axis=1)

-    1774
Name: Amount, dtype: int64

In [23]:
talk.head()
text.head()
data.head()

Unnamed: 0,index,Date and time,Description,Min,Number
0,0,"12/19/16, 7:19 AM",Incoming,8,(334) 703-1602
1,1,"12/19/16, 7:32 AM",Incoming,1,(334) 833-1465
2,2,"12/19/16, 8:11 AM",Incoming,1,(205) 225-9848
3,3,"12/19/16, 8:29 AM",Incoming,33,(334) 728-0615
4,4,"12/19/16, 1:01 PM",Incoming,22,(415) 727-6703


Unnamed: 0,index,Date and time,Destination,Direction,Number,Type
0,0,"12/19/16, 7:11 AM","Auburn, AL",Incoming,(334) 703-1602,Text
1,1,"12/19/16, 7:14 AM","Auburn, AL",Outgoing,(334) 703-1602,Text
2,2,"12/19/16, 7:15 AM","Auburn, AL",Incoming,(334) 703-1602,Text
3,3,"12/19/16, 7:17 AM","Auburn, AL",Outgoing,(334) 703-1602,Text
4,4,"12/19/16, 8:41 AM",-,Incoming,13347031602,Picture


Unnamed: 0,index,Date and time,MB,Service
0,0,"12/19/16, 2:11 AM",0.0087,Mobile Internet
1,1,"12/19/16, 6:01 AM",0.0097,Mobile Internet
2,2,"12/19/16, 7:04 AM",0.4482,Mobile Internet
3,3,"12/19/16, 8:28 AM",0.0185,Mobile Internet
4,4,"12/19/16, 11:33 AM",0.0097,Mobile Internet


In [21]:
text['Destination'].value_counts()

Auburn, AL                    10008
-                               629
Columbia, MO                    390
Hilo, HI                        332
Portsmouth, VA                  250
Indianapls, IN                  242
Kalispell, MT                   226
Seattle, WA                     224
Opelika, AL                     211
Troy, AL                        210
Syracuse, NY                    191
Directtoconsumer Shortcode      149
Tacoma, WA                      128
Fernndnbch, FL                  120
Walnut Crk, CA                  110
Alexandria, VA                  103
Jacksonvl, FL                    83
Seattle Sr, WA                   65
Phila, PA                        60
Atlanta Ne, GA                   55
Bellevue, WA                     47
Carbondale, PA                   43
Newark, DE                       42
Premium Msg                      42
Gainesvl, FL                     41
Orlando, FL                      36
Beaumont, TX                     33
Sebastian, FL               