# Introduction

The CWE table consists of a forest of 4 tables.

In [29]:
import lxml.etree
tree = lxml.etree.parse('cwec_v2.9.xml')
root = tree.getroot()
for table in root: 
    print table.tag

Views
Categories
Weaknesses
Compound_Elements


The table we are concerned with is the **Weaknesses** table. I will discuss the others in a later commit. The first thing to observe is that the number of "columns" in every "row" of this table varies. For instance, let's consider row[0]:

In [30]:
weakness_table = root[2]
for row in weakness_table[0]: 
    print row.tag

Description
Relationships
Weakness_Ordinalities
Applicable_Platforms
Time_of_Introduction
Common_Consequences
Potential_Mitigations
Causal_Nature
Demonstrative_Examples
Taxonomy_Mappings
Content_History


And now let's consider row 20:

In [31]:
for row in weakness_table[20]: 
    print row.tag

Description
Relationships
Relationship_Notes
Weakness_Ordinalities
Applicable_Platforms
Alternate_Terms
Terminology_Notes
Time_of_Introduction
Likelihood_of_Exploit
Common_Consequences
Detection_Methods
Potential_Mitigations
Causal_Nature
Demonstrative_Examples
Observed_Examples
Functional_Areas
Affected_Resources
References
Taxonomy_Mappings
White_Box_Definitions
Related_Attack_Patterns
Content_History


Since we are interested in creating a histogram of all the fields used, we simple loop through every row of the table, and then loop through every column label counting them as they occur: 

In [32]:
histogram = {}
for row in weakness_table: 
    for column in row: 
        if column.tag not in histogram: 
            histogram[column.tag] = 0
        else:
            histogram[column.tag] += 1
print histogram

{'Relationships': 705, 'Affected_Resources': 50, 'Time_of_Introduction': 664, 'Detection_Methods': 76, 'White_Box_Definitions': 29, 'Common_Consequences': 701, 'Background_Details': 41, 'Potential_Mitigations': 523, 'Taxonomy_Mappings': 597, 'Relationship_Notes': 122, 'Description': 718, 'Applicable_Platforms': 556, 'Likelihood_of_Exploit': 184, 'Relevant_Properties': 15, 'Other_Notes': 23, 'Weakness_Ordinalities': 130, 'Observed_Examples': 357, 'Functional_Areas': 27, 'Causal_Nature': 74, 'Related_Attack_Patterns': 206, 'Research_Gaps': 74, 'Alternate_Terms': 65, 'Maintenance_Notes': 86, 'Enabling_Factors_for_Exploitation': 22, 'Theoretical_Notes': 26, 'References': 281, 'Terminology_Notes': 26, 'Demonstrative_Examples': 385, 'Content_History': 718, 'Modes_of_Introduction': 32}


Next, we plot the histogram: 

In [34]:
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import Range1d

data = {}
data['Entries'] = histogram

df_data = pd.DataFrame(data).sort_values(by='Entries', ascending=True)
series = df_data.loc[:,'Entries']

p = figure(width=800, y_range=series.index.tolist(), title="Weaknesses Histogram")

p.xaxis.axis_label = 'Frequency'
p.xaxis.axis_label_text_font_size = '10pt'
p.xaxis.major_label_text_font_size = '8pt'

p.yaxis.axis_label = 'Field'
p.yaxis.axis_label_text_font_size = '10pt'
p.yaxis.major_label_text_font_size = '8pt'

j = 1
for k,v in series.iteritems():
  
  #Print fields, values, orders
  #print (k,v,j) 
  p.rect(x=v/2, y=j, width=abs(v), height=0.4,
    width_units="data", height_units="data")
  j += 1

show(p)

And thats it! :-) 