Skip to content

How To Add A Collector

jfuruness edited this page Nov 29, 2020 · 5 revisions

lib_bgp_data

  1. To start, make sure everything is installed properly
  2. Move to a new branch with git checkout -b my_branch_name
  3. Run the ROAs Parser
  4. Get up to speed on the following topics:

Now that you have a general sense of how to run a collector, let's take a look at the ROAs Parser. First, copy this directory into your own directory. You can do this by typing: cp -R roas my_collectors

Now cd into your new folder. We will work from here to create your new collector.

Let's first look at the __init__.py file. Please include the formatting for python files as it is included here, which follows the pep8 standards. To easily check pep8, if you are using vim you can install flake8. Note that your docstring should link back to your wiki page. The wiki page should have the following:

  1. Short Description
  2. Long Description
  3. Usage (both command line and in a script)
  4. Table Schemas (if any)
  5. Design Choices

If you are not familiar with __init__.py, this is a file that denotes what folders above this folder can see. You can have hundreds of files in your folder, but if you only import one class in this file then your folder will hide the rest. Anything that should be used outside your specific folder should be imported in the __init__.py file. For example, ROAs_Parser might be used outside that specific folder, but other classes inside the folder are not, so they are not in the __init__.py file.

Now lets take a look at the roas_parser. Aside from the stuff at the top which is similar to the __init__.py file, the imports are very different. You'll notice that normal packages import normally, such as the re function. To import classes from files outside of your current folder (in the folder above) you need to do

from ..<above folder> import <stuff you want to import>

Note that the number of dots denotes the directories to go up. 1 dot means current directory. two dots means directory above. three dots means two directories above. etc.

You can see this as an example from:

from ...utils import utils

This imports the utils file from the utils folder, which is outside of our current folder (two directories up). To import classes and other things from the current folder, do the same as above but with only one period. Example below.

from .tables import ROAs_Table

After that we have the class. Also notice the use of __slots__. This is not required, but turns the class almost like into a named tuple. Attributes that are not included in the __slots__ cannot be added. It decreases the reference time for a value significantly, I think by about a third. You probably won't need this, and it can cause lots of errors if you don't understand it, so probably just delete it.

Note that the ROAs_Parser inherits from the Parser class. You should use this class. All subclasses of parser are automatically added to the command line arguments (as the lowercase name of your class). They must also have a _run function, which will act as the main function that runs your code. This function will also catch all errors and log them. So for example, do the following:

  1. Change the name of your class to be something else
  2. Import your class in the __init__.py your folder, and all folders above
  3. Run your class from the command line, as you would for the ROAs Parser](https://github.com/jfuruness/lib_bgp_data/wiki/ROAs-Parser)

Now you can see that just be inheriting the Parser class, your class can be run from the command line.

Let's take a look at some of the other functions.

Inside the _run function we have a Database context manager. This will open a connection to the database with the table specified and also create that table by automatically calling the _create_tables function. Check out the Database wiki for more information on database functionality. Change this run function to download and store your info that you need.

Now let's look at tables.py. This file typically contains all the database interactions side of things. All the tables that get generated. Try to name your tables in a similar fashion. These tables inherit from a Generic_Table class. When this class is initialized it connects to the database, and calls the _create_tables function if it exists. The real dict cursor is used to make all return values into a list of dictionaries, to make it easier to use. Note that this is not the most effective memory wise, and other cursor factories should be used if memory consumption is a problem. Again, see the Database wiki for more information on how to use these classes.

There you have it. Please let me know any questions you might have. Take a look at the Utils section for things you may want to use in your submodule.

Ah, and don't forget to document! See template for documentation of a collector here: https://github.com/jfuruness/lib_bgp_data/wiki/Template-Collector-Wiki-Page