# Blue Coat log analysis notebook
=================================================

## Loading the logs from parquet files in HDFS

The following code loads the access logs from the BlueCoat/accesslog directory in the user's HDFS directory and makes them available as a table.

'data' is a dataframe that contains rows selected by the SQL query.

In [36]:
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *

if __name__ == '__main__':
    sqlContext = SQLContext(sc)

    df = sqlContext.load('fw/log')
    sqlContext.registerDataFrameAsTable(df, "fwlog")
    data = sqlContext.sql("SELECT * FROM fwlog WHERE proto = 'TCP'")
    data.show()

action date       dstip        dstport len proto source       srcip          srcport time           ttl
ACCEPT 2015-06-16 216.98.57.48 443     52  TCP   msr-off-fw03 73.137.160.105 58424   00:00:00+00:00 113
ACCEPT 2015-06-16 216.98.57.48 443     60  TCP   msr-off-fw03 70.196.133.122 35087   00:00:00+00:00 48 
ACCEPT 2015-06-16 216.98.57.48 443     64  TCP   msr-off-fw03 166.170.50.82  5367    00:00:00+00:00 51 
ACCEPT 2015-06-16 216.98.57.61 636     60  TCP   msr-off-fw03 54.243.163.67  36818   00:00:00+00:00 44 
ACCEPT 2015-06-16 216.98.57.48 443     52  TCP   msr-off-fw03 65.94.67.221   49962   00:00:00+00:00 115
DENY   2015-06-16 24.105.29.40 443     52  TCP   msr-off-fw03 10.3.20.28     52928   00:00:00+00:00 125
ACCEPT 2015-06-16 216.98.57.48 443     52  TCP   msr-off-fw03 37.55.105.237  52298   00:00:00+00:00 116
ACCEPT 2015-06-16 216.98.57.48 443     64  TCP   msr-off-fw03 74.141.115.23  56777   00:00:00+00:00 50 
ACCEPT 2015-06-16 216.98.57.53 443     52  TCP   msr-off-fw03 19

## Plotting a pie chart from the data
The following code uses the 'matplotlib' to draw a pie chart from 'data'

In [None]:
%matplotlib inline

hosts = []
hits = []
for i in data.take(5):
    totals.append(i.hits)
    hosts.append(i.host)

plt.pie(hits, labels=hosts)

## Plotting a bar graph from the data
The following code uses the 'matplotlib' to draw a pie chart from 'data'

In [None]:
%matplotlib inline

hosts = []
hits = []
for i in data.take(4):
    hits.append(i.hits)
    hosts.append(i.host)
    
N = 4
ind = np.arange(N)  # the x locations for the groups
width = 0.35       # the width of the bars
fig = plt.figure()
ax = fig.add_subplot(111)
rects1 = ax.bar(ind, hits, width, color='r')

# add some
ax.set_ylabel('hits')
ax.set_title('Top 4 denied destinations')
ax.set_xticks(ind+width)
ax.set_xticklabels(hosts)


In [None]:
!ls