#   NVD Data curation and analysis Documentation

### Introduction

The aim of this notebook is to document the process of parsing available NVD data from XML format to CSV, merging them to one master dataset, creating visualization to gain initial insights. NVD data has a dataset of vulnerabilites, each tied to a CVE ID, references and various scores used to show threat. To run the code below we will need to have NVD datasets. There are files from 2002 to 2017 available on the NVD website(https://nvd.nist.gov/download.cfm). NVD only provides datasets in XML format so we will require parsing data to CSV format.

### Parsing XML files to CVS

 We require few fields (CVE-ID, CWE-ID,Timestamp) to be extracted for our initial analysis, thus we will parse only those tags. The code below only shows the procedure for the year 2017, but the File names have to changed according to the file we are currently reading. This process has to be iterated. (Repeat this for all files individually from 2002 - 2017, including the file named recent). Please note to be mindful to save the downloaded files in the same folder as this notebook is saved on. 
 If you choose to skip this step and access the CSV files directly, you can find them on (https://drive.google.com/open?id=0B-NONBqqQBznYlRLUU5zS0lLZU0).

In [1]:
#Parsing using ElelmentTree
import xml.etree.ElementTree as ET
import csv

In [2]:
#Find root node
CVE_tree = ET.parse("nvdcve-2.0-2017.xml")
CVE_root= CVE_tree.getroot()

In [3]:
#Create new CSV file to write the extracted fields
f = open('NVD_2017.csv', 'w')

In [4]:
#Extracting attributes and tags
CVE_count = 0;
CVE_listOfId = [];
for entry in CVE_root:
    cve_id = "";
    cwe_id = "";
    modified_date = "";
    cvss = "";
    for child in entry:
        
        #print (child.tag) #Print Child.tag will help you code further to identify child nodes
        
        if (child.tag == '{http://scap.nist.gov/schema/vulnerability/0.4}cve-id'):
            cve_id = child.text;
        if (child.tag == '{http://scap.nist.gov/schema/vulnerability/0.4}cwe'):
            cwe_id = child.attrib['id'];
        #Use when cvsss needed
        #if (child.tag == '{http://scap.nist.gov/schema/vulnerability/0.4}cvss'):
            #cvss = child.text;
        if (child.tag == '{http://scap.nist.gov/schema/vulnerability/0.4}published-datetime'):
            modified_date = child.text;
            
    #Dont write header if you will be using the merged database
    #Head = "CVE ID,CWE ID,Timestamp\n";
    #f.write(Head);
    vuln = '{o1},{o2},{o3}\n'.format(o1=cve_id,o2=cwe_id,o3=modified_date);
    f.write(vuln);
    CVE_count = CVE_count +1;

In [5]:
#This is to ensure that the file has been written into
print (CVE_count)
f.close();

484


## File Merge

Now that we have multiple individual CSV files corresponding to each year, we have to merge the files to yield one usable master database

In [6]:
fout=open("Merge_2002-17.csv","a")
# first file:
for line in open("CVE_2002.csv"):
    fout.write(line)
# now the rest:    
for num in range(2003,2017):
    f = open("CVE_"+str(num)+".csv")
    #f.next() # skip the header
    for line in f:
         fout.write(line)
    f.close() # not really needed
fout.close()

## Preparing dataset for visualization

To gain insights we need to know the number of vulnerabilites that have a valid CWE ID. We therefore create a new dataset that will make this visualization easy to produce. Once we know that we can calculate the percentage of vulnerabilities that have a valid CVE ID. This is calculated for each month of all years.
The new dataset will contain the year, month and percentage ratio of number of CWE ID to CVE ID. The reason a new dataset is created is to ease data formatting during visualization. We have used a hash-map to identify key value pairs for each month and year and store CVE-ID and CWE-ID count. 
For this analysis we have used individual year files and not the merged dataset. Therefore this process has to be iterated for each CSV file that you wish to visualize. The NVD 2002 database has entries starting from the year 1988. On an analysis of the merged dataset of all years, we found significant percentage values showing only from the year 1996. Therefore, all analysis hence forth will be only for the years after 1996. The process of seperating data for the years 1996-2001 has to be done manually, by copying the text into separate CSV files.

In [7]:
#using panda 
import pandas as pd
import csv

In [8]:
#Open merged file to calculate values
File= pd.read_csv("Merged_2002-17.csv")

In [9]:
#using hashmap, we will store a key value pair of every month and year combination
with open('Merged_2002-17.csv','r') as f:
    r = csv.reader(f, delimiter=',')
    cve_count = {};
    cwe_count = {};
    index = 0;
    for row in r:
        if(index!=0):
            year = row[2];
            month = row[3];
            CWE = row[1];
            CVE = row[0];
            key = month+"/"+"01"+"/"+year;
    #checking if that combination exists and incrementing count of CVE ID and CWE ID
            if key in cve_count:
                curr = cve_count[key];
                cve_count[key] = curr+1;
            else:
                cve_count[key] = 1;

            if CWE != "0":
                if key in cwe_count:
                    curr = cwe_count[key];
                    cwe_count[key] = curr+1;
                else:
                    cwe_count[key] = 1;
        index = index+1;

In [10]:
#creating a hashmap for the ratio values
ratioMap = {};
for k,v in cve_count.iteritems():
    cve_c = v;
    cwe_c = 0;
    if k in cwe_count:
        
        cwe_c = cwe_count[k];
    ratioMap[k] = round(100 * float(cwe_c)/float(cve_c),2);

In [11]:
#writing hashmap values to a file
f = open('ratio_merged.csv', 'w')
Header = "Date,Percentage\n";
f.write(Header);
for k,v in ratioMap.iteritems():
    
    outline = "{o1},{o3}\n".format(o1=k, o3=v);
    #print outline;
    f.write(outline);
f.close();

## Creating Visualization using Bokeh

In [13]:
from bokeh.charts import TimeSeries, show, output_file, vplot
import numpy as np
import pandas as pd
from bokeh.layouts import gridplot
from dateutil import parser
from bokeh.layouts import column
from bokeh.io import output_notebook
from bokeh.models import FixedTicker, NumeralTickFormatter

In [14]:
output_notebook()

In [15]:
#1996
#Open file for visualization
Plot_File= pd.read_csv("ratio_1996.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint1 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='1996',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line, use this when required. One has been retained for demostration
tsline1 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='1996',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#1997
#Open file for visualization
Plot_File= pd.read_csv("ratio_1997.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint2 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='1997',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
#tsline2 = TimeSeries(data,x='Date', y=['PERCENTAGE'],title='1997',color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     #ylabel='Percentage', legend=True,width=900, height=400)

#1998
#Open file for visualization
Plot_File= pd.read_csv("ratio_1998.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint3 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='1998',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
#tsline3 = TimeSeries(data,x='Date', y=['PERCENTAGE'],title='1998',color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',ylabel='Percentage', legend=True,width=900, height=400)

#1999
#Open file for visualization
Plot_File= pd.read_csv("ratio_1999.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint4 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='1999',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline4 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='1999',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2000
#Open file for visualization
Plot_File= pd.read_csv("ratio_2000.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint5 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2000',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline5 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2000',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2001
#Open file for visualization
Plot_File= pd.read_csv("ratio_2001.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint6 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2001',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline6 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2001',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2002
#Open file for visualization
Plot_File= pd.read_csv("ratio_2002.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint7 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2002',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline7 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2002',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2003
#Open file for visualization
Plot_File= pd.read_csv("ratio_2003.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint8 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2003',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline8 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2003',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2004
#Open file for visualization
Plot_File= pd.read_csv("ratio_2004.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint9 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2004',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline9 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2004',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2005
#Open file for visualization
Plot_File= pd.read_csv("ratio_2005.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint10 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2005',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline10 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2005',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2006
#Open file for visualization
Plot_File= pd.read_csv("ratio_2006.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint11 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2006',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline11 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2006',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)


In [16]:
#2007
#Open file for visualization
Plot_File= pd.read_csv("ratio_2007.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint12 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2007',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline12 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2007',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2008
#Open file for visualization
Plot_File= pd.read_csv("ratio_2008.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint13 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2008',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline13 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2008',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2009
#Open file for visualization
Plot_File= pd.read_csv("ratio_2009.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint14 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2009',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline14 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2009',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2010
#Open file for visualization
Plot_File= pd.read_csv("ratio_2010.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint15 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2010',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline15 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2010',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2011
#Open file for visualization
Plot_File= pd.read_csv("ratio_2011.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint16 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2011',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline16 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2011',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2012
#Open file for visualization
Plot_File= pd.read_csv("ratio_2012.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint17 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2012',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline17 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2012',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2013
#Open file for visualization
Plot_File= pd.read_csv("ratio_2013.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint18 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2013',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline18 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2013',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2014
#Open file for visualization
Plot_File= pd.read_csv("ratio_2014.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint19 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2014',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline19 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2014',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2015
#Open file for visualization
Plot_File= pd.read_csv("ratio_2015.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint20 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2015',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline20 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2015',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2016
#Open file for visualization
Plot_File= pd.read_csv("ratio_2016.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint21 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2016',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline21 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2016',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

#2017
#Open file for visualization
Plot_File= pd.read_csv("ratio_2017.csv")
#store colums in dataframes
date = Plot_File['Date'];
per = Plot_File['Percentage'];
#at this point the csv doesn't take the date column to be a date but a string
i=0;
dt = [];
for d in date:
    dt.append(d)    
    try:
        dt[i] = parser.parse(date[i]);
    except ValueError:
        print date[i];
    #No exceptions? We can proceed
    i+=1;
    #create a dictionary
data = dict(
    Date=dt,
    PERCENTAGE=per)
#create a timeseries graph with points
tspoint22 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2017',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='point',
     ylabel='Percentage', legend=True, width=900, height=400)

#create a timeseries graph with a line
tsline22 = TimeSeries(data,
    x='Date', y=['PERCENTAGE'],title='2017',
    color=['PERCENTAGE'], dash=['PERCENTAGE'], builder_type='line',
     ylabel='Percentage', legend=True,width=900, height=400)

# Visualization of Percentage of vulnerabilites with CWE IDs

In [17]:
#opens visualization
show(column(tspoint1,tspoint2,tspoint3,tspoint4,tspoint5))
#add these parameters depending on how many years you want to view tsline6 belongs to year 2001 and tsline22 is year 2017
#tsline6,tspoint6,tsline7,tspoint7,tsline8,tspoint8,tsline9,tspoint9,tsline10,tspoint10,tsline11,tspoint11,tsline12,tspoint12,tsline13,tspoint13,tsline14,tspoint14,tsline15,tspoint15,tsline16,tspoint16,tsline17,tspoint17,tsline18,tspoint18,tsline19,tspoint19,tsline20,tspoint20,tsline21,tspoint21,tsline22,tspoint22