## Explore NVD Feeds easily -- [nvdlib](https://github.com/fabric8-analytics/nvdlib) basic usage
---

In this tutorial there will be introduced a basic usage of nvdlib.

The demonstration will cover:

- **Fetching** NVD Feeds
- Creating **Collection of Documents** from JSONFeeds
- [Optional] Getting familiar with NVD Document model
- **Iterating** over collection of documents using **Cursor**
- **Querying** collection of documents using query selectors

In [1]:
import os
import sys

import tempfile  # temporary directory


In [2]:
import nvdlib
from nvdlib.manager import FeedManager

---

In [3]:
FEED_NAMES = [2002, 2003, 2004, 2005]  # explicitly choose feeds to work with

In [4]:
nvdlib.set_logging_level('DEBUG')  # by default 'WARNING' logging level is used

---

#### 1) Fetch NVD Feeds

We will fetch NVD Feeds from [NVD](https://nvd.nist.gov/) database using FeedManager defined in https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/manager.py. FeedManager is a context manager which takes control over asynchronous calls using event loop.

    This will store JSON feeds locally for future usage.
   
*NOTE: In this tutorial, we won't cover any JSONFeed related operations, as it is assumed that this is not the purpose of nvdlib. However, nvdlib is capable of handling raw JSONFeeds and their metadata in case user needed such level of control.*
    
    
#### 2) Create collection from those feeds

Creating collection from feeds parses each feed to its [Document](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/model.py#L425)(get familiar with our [model](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/model.py)) form and produces a [Collection](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/collection.py) object.

    Collection is a user facade which acts as a proxy to set of documents and makes quering and operation on documents much easier.
   
Collections can use different [adapters](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/adapters) based on user choice.

    The default adapter despite being very lightweight, provides limited functionality and shows lower performence.

In [5]:
tmp_dir = tempfile.mkdtemp(prefix='nvdlib_')  # create temporary directory in order to simulate clean environment

In [6]:
%%time

with FeedManager(data_dir=tmp_dir, n_workers=5) as feed_manager:
    
    feeds = feed_manager.fetch_feeds(FEED_NAMES)
    collection = feed_manager.collect(feeds)  # create collection, optionaly, custom feeds can be specified 
    
    # [OPTIONAL] step
    collection.set_name('Tutorial')  # choose whatever name you want for future identification

2018-08-24 11:27:49,475 [INFO]: Fetching feeds...
2018-08-24 11:27:49,478 [DEBUG]: Local feeds found: []
2018-08-24 11:27:49,479 [DEBUG]: Remote feeds found: [2002, 2003, 2004, 2005]
2018-08-24 11:28:05,353 [INFO]: Downloading feed `2005`...
2018-08-24 11:28:05,434 [INFO]: Downloading feed `2004`...
2018-08-24 11:28:05,441 [INFO]: Downloading feed `2003`...
2018-08-24 11:28:06,229 [INFO]: Downloading feed `2002`...
2018-08-24 11:28:12,059 [INFO]: Writing feed `2003`...
2018-08-24 11:28:12,065 [INFO]: Finished downloading feed `2003`
2018-08-24 11:28:12,708 [INFO]: Writing feed `2004`...
2018-08-24 11:28:12,717 [INFO]: Finished downloading feed `2004`
2018-08-24 11:28:13,422 [INFO]: Writing feed `2005`...
2018-08-24 11:28:13,437 [INFO]: Finished downloading feed `2005`
2018-08-24 11:28:14,467 [INFO]: Writing feed `2002`...
2018-08-24 11:28:14,478 [INFO]: Finished downloading feed `2002`
2018-08-24 11:28:14,503 [INFO]: Collecting entries...
2018-08-24 11:28:14,505 [DEBUG]: Collecting ent

CPU times: user 7.71 s, sys: 814 ms, total: 8.53 s
Wall time: 38.9 s


    TIP: Hide debug output of jupyter cells by pressing `shift+v` while the cell is selected

In [7]:
collection  # a visual representation of the collection proxy


Collection: {
   _id: 140718403696904
   name: 'Tutorial'
   adapter: 'DEFAULT',
   documents: 15743
}

---

#### 3) [OPTIONAL] NVD Document model exploration (skip if already familiar with)

    It is very important to get familiar with the [nvdlib customized NVD model](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/model.py) before doing any other work.
    
Spend some time exploring the Document model. Despite acting somewhat similar to dict, each attribute should be accessible via 'dot notation', python attribute hints should also help with the task, hence a comfortable access and attribute exploration should be guaranteed.

In [8]:
doc, = collection.sample(sample_size=1)  # load a single random document from the collection

##### Model

In [9]:
doc.pretty()

{
    'id_': 'CVE-2005-4432',
    'cve': {
        'id_': 'CVE-2005-4432',
        'year': 2005,
        'assigner': 'cve@mitre.org',
        'data_version': '4.0',
        'affects': {
            'data': [
                {
                    'vendor_name': 'playsms',
                    'product_name': 'playsms',
                    'versions': ['0.8']
                }
            ]
        },
        'references': {
            'data': [
                {
                    'url':
                        'http://marc.info/?l=full-disclosure&m='
                        '113478814326427&w=2',
                    'name': '20051217 XSS Vuln in PlaySmS',
                    'refsource': 'FULLDISC'
                },
                {
                    'url':
                        'http://marc.info/?l=full-disclosure&m='
                        '113970096305873&w=2',
                    'name': '20060211 XSS in PlaySMS',
                    'refsource': 'FULLDISC'
                

When accessing document attributes, we can access them either in the standard 'dot' way

In [10]:
doc.cve.descriptions

DescriptionEntry(data=[DescriptionNode(lang='en', value='Cross-site scripting (XSS) vulnerability in index.php in PlaySMS 0.8 allows remote attackers to inject arbitrary web script or HTML via the err parameter.')]

    TIP: Every entry also has a `pretty` method

In [11]:
doc.cve.descriptions.pretty()

{
    'data': [
        {
            'lang': 'en',
            'value':
                'Cross-site scripting (XSS) vulnerability in index.php in '
                'PlaySMS 0.8 allows remote attackers to inject arbitrary web '
                'script or HTML via the err parameter.'
        }
    ]
}


or by using a `rgetattr()` function from our [utils](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/utils.py) module

In [12]:
from nvdlib.utils import rgetattr

rgetattr(doc, 'impact.cvss.base_score')

4.3

or by using a convenient `project()` function provided for such purposes

In [13]:
# this function can take a dictionary of multiple attributes to be projected
projection = doc.project({'id_': 0, 'cve.descriptions': 1, 'impact.cvss.base_score': 1})
projection

{'cve': {'descriptions': DescriptionEntry(data=[DescriptionNode(lang='en', value='Cross-site scripting (XSS) vulnerability in index.php in PlaySMS 0.8 allows remote attackers to inject arbitrary web script or HTML via the err parameter.')]}, 'impact': {'cvss': {'base_score': 4.3}}}

Projection returns `AttrDict`, an augmented dictionary which provides attribute level access via dot notation.

    TIP: Each projection also defines `pretty()` method

One more cool thing is possible with `project` method.

Note that there is multiple descriptions entries under `doc.cve.descriptions.data` stored in an array-like object.

    We can use selectors and projections even on the array elements.

In [14]:
projection = doc.project({'cve.descriptions.data.value': 1})
projection.pretty()

{
    'id_': 'CVE-2005-4432',
    'cve': {
        'descriptions': {
            'data': {
                'value': [
                    'Cross-site scripting (XSS) vulnerability in '
                    'index.php in PlaySMS 0.8 allows remote attackers to '
                    'inject arbitrary web script or HTML via the err '
                    'parameter.'
                ]
            }
        }
    }
}


---

#### 4) Iterating over Collection using Cursor

Collection defines method called `cursor()`, which instantiates an iterator over the collection and preserves the state.

Elements can then be accesed either by `next()` method or `next_batch()` method, which returns a batch (of given size) of documents.

    Iterating over collection of documents is as easy as creating a Cursor.

In [15]:
cursor = collection.cursor()

In [16]:
next_doc = cursor.next()
next_doc.project({'cve': 1}).pretty()

{
    'id_': 'CVE-1999-0001',
    'cve': {
        'id_': 'CVE-1999-0001',
        'year': 1999,
        'assigner': 'cve@mitre.org',
        'data_version': '4.0',
        'affects': {
            'data': [
                {
                    'vendor_name': 'bsdi',
                    'product_name': 'bsd_os',
                    'versions': ['3.1']
                },
                {
                    'vendor_name': 'freebsd',
                    'product_name': 'freebsd',
                    'versions': [
                        '1.0',
                        '1.1',
                        '1.1.5.1',
                        '1.2',
                        '2.0',
                        '2.0.1',
                        '2.0.5',
                        '2.1.5',
                        '2.1.6',
                        '2.1.6.1',
                        '2.1.7',
                        '2.1.7.1',
                        '2.2',
                        '2.2.2',
                       

In [17]:
next_batch = cursor.next_batch()  # default batch size is 20
f"Batch contains {len(next_batch)} documents."

'Batch contains 20 documents.'

---

#### 5) Querying collection of documents

In order to make full use of nvdlib capabilities, we will demonstrate usage of querying collections of documents using various [selectors](https://github.com/fabric8-analytics/nvdlib/blob/master/nvdlib/query_selectors.py) and (for mongodb users somewhat familiar) `find()` method.

    Even lightweight version of [nvdlib](https://github.com/fabric8-analytics/nvdlib) allows to query and handle collections of documents at ease using query selectors.

In [18]:
from nvdlib.query_selectors import match, search    # basic match and search (regex-like) selectors
from nvdlib.query_selectors import ge, gt, le, lt   # comparison operators greater/lower (and equal) than
from nvdlib.query_selectors import in_, in_range    # array handling selectors

##### Querying by exact match

In [19]:
collection.find({'cve.affects.data.vendor_name': 'microsoft'})


Collection: {
   _id: 140719104683760
   name: 'None'
   adapter: 'DEFAULT',
   documents: 1210
}

##### Querying by pattern matches

Selectors `match` and `search` serve for this exact purpose

In [20]:
win_collection = collection.find({'cve.affects.data.product_name': search('windows')})
win_collection.set_name('Windows CVEs')
win_collection


Collection: {
   _id: 140717913649048
   name: 'Windows CVEs'
   adapter: 'DEFAULT',
   documents: 522
}

In [21]:
win_doc, = win_collection.sample(1)
win_doc.cve.pretty()

{
    'id_': 'CVE-2001-1552',
    'year': 2001,
    'assigner': 'cve@mitre.org',
    'data_version': '4.0',
    'affects': {
        'data': [
            {
                'vendor_name': 'microsoft',
                'product_name': 'windows_me',
                'versions': ['*']
            }
        ]
    },
    'references': {
        'data': [
            {
                'url':
                    'http://archives.neohapsis.com/archives/bugtraq/2001-10/'
                    '0133.html',
                'name': '20011017 Ssdpsrv.exe in WindowsME',
                'refsource': 'BUGTRAQ'
            },
            {
                'url': 'http://www.iss.net/security_center/static/7318.php',
                'name': 'winme-ssdp-dos(7318)',
                'refsource': 'XF'
            },
            {
                'url': 'http://www.securityfocus.com/bid/3442',
                'name': '3442',
                'refsource': 'BID'
            }
        ]
    },
    'descriptions': {
 

In [22]:
collection.find({'cve.year': match("200[1-3]{1}")})


Collection: {
   _id: 140717913713240
   name: 'None'
   adapter: 'DEFAULT',
   documents: 5482
}

##### Querying by range of values

The query above using regex although possible, is not very intuitive. For this purpose, we provide methods `in_` and `in_range`

In [23]:
collection.find({'cve.year': in_range(2001, 2003)})


Collection: {
   _id: 140717859742888
   name: 'None'
   adapter: 'DEFAULT',
   documents: 5482
}

In this context (as years are always integer values), same query can be expressed by `_in` selector

In [24]:
collection.find({'cve.year': in_([2001, 2002, 2003])})


Collection: {
   _id: 140718003599008
   name: 'None'
   adapter: 'DEFAULT',
   documents: 5482
}

##### Querying by value comparisons

In [25]:
collection.find({'impact.cvss.base_score': gt(9)})


Collection: {
   _id: 140718003699840
   name: 'None'
   adapter: 'DEFAULT',
   documents: 1355
}

More complex query

In [26]:
pre_release_december_collection = collection.find({
    'published_date.month': 12,
    'impact.cvss.base_score': ge(9.0),
    'cve.affects.data.versions': le('1.0.0')
})

# NOTE: comparing versions in this way is very unreliable, here only for demonstration purposes

pre_release_december_collection


Collection: {
   _id: 140718012048216
   name: 'None'
   adapter: 'DEFAULT',
   documents: 93
}

In [27]:
pre_release_december_collection.pretty(sample_size=2)

{
    'id_': 'CVE-2002-2268',
    'cve': {
        'id_': 'CVE-2002-2268',
        'year': 2002,
        'assigner': 'cve@mitre.org',
        'data_version': '4.0',
        'affects': {
            'data': [
                {
                    'vendor_name': 'netdave',
                    'product_name': 'webster_http_server',
                    'versions': ['*']
                }
            ]
        },
        'references': {
            'data': [
                {
                    'url':
                        'http://seclists.org/lists/bugtraq/2002/Dec/0013.html',
                    'name': '20021201 Advisory: Webster HTTP Server',
                    'refsource': 'BUGTRAQ'
                },
                {
                    'url':
                        'http://www.securiteam.com/windowsntfocus/6R0030A6AY.'
                        'html',
                    'name':
                        'http://www.securiteam.com/windowsntfocus/6R0030A6AY.'
                      

---