Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dictionary for dorks storage with regular dumps to json file #29

Closed
wants to merge 2 commits into from
Closed

Conversation

johnnykv
Copy link
Member

While rewriting the dork_db with sqlalchemy i realized that the dorks database does not in any way justify using a full-blown RDBMS... So here goes my take on a much more simple and maintainable approach.

From commit message:

  • Complete rewrite of dork_db.py.
  • Dorks are now completely stored in memory.
  • Every 10'th update the memory representation gets dumped to a JSON file.

- Complete rewrite of dork_db.py.
- Dorks are now completely stored in memory.
- Every 10'th update the memory representation is dumped to a json file
@ghost ghost assigned glaslos Jan 18, 2013
@glaslos
Copy link
Member

glaslos commented Jan 18, 2013

I don't agree with your solution but at the same time I agree that this has to change.
Let me explain: The sqlite dork db was implemented for set-ups without a full-blown database. For example my sensors don't report into a local database but send the events via hpfeeds into a central database. This is also quite useful if you run Glastopf on "weak" hardware like virtual servers or RaspberryPi like systems. This is also why I think the in-memory solution is not a good one as a general approach.

Other options to solve this:

  • If you are logging events to a local database the dork db is basically a smaller copy of the events db. Instead of having separate db's we could also leverage the data in the event database or create a linked table with just unique request paths. This would also reduce the size of the events database.
  • If you are running Glastopf on a low performance machine, we have to get something less heavy than the whole thing in memory. I had dork databases with close to 100k entries which shouldn't be kept in memory. I'd go with keeping them in some format in a file and if we need dorks, read X random dorks from it. Downside is that we can't make any run-time selection algorithms like "Give me the 200 most attacked paths as dorks"...

What do you think? (Going to look at your code after lunch...)

@johnnykv
Copy link
Member Author

I see your point, i did not consider use cases involving embedded systems.
Loosing capability to do stuff like "200 most attacked paths..." would also be a shame. Actually i was in doubt myself, might just be me getting tired of sql :-)

Take a look at the implementation (pretty much finished) done with sqlalchemy, if you think that is better no biggie - ill just push that instead.

import datetime
import threading
import logging
from sqlalchemy import Table, Column, Integer, String, MetaData
from sqlalchemy import create_engine, select


logger = logging.getLogger(__name__)


class DorkDB(object):
    """
    Responsible for communication with the dork database.
    """

    sqlite_lock = threading.Lock()

    def __init__(self, dork_connection_string="sqlite:///db/dork.db"):
        meta = MetaData()
        self.tables = self.create(meta)
        self.engine = create_engine(dork_connection_string)
        #Create database if it does not exist
        meta.create_all(self.engine)
        self.conn = self.engine.connect()

    def create(self, meta):
        tables = {}
        tablenames = ["intitle", "intext", "inurl", "filetype", "ext", "allinurl"]
        for table in tablenames:
            tables[table] = Table(table, meta,
                                  Column('content', String, primary_key=True),
                                  Column('count', Integer),
                                  Column('firsttime', String),
                                  Column('lasttime', String),
                                  )
        return tables

    def insert(self, insert_list):
        if len(insert_list) == 0:
            return
        #TODO: exception handling - or fail hard? 
        with DorkDB.sqlite_lock:
            conn = self.engine.connect()
            print "start"
            for item in insert_list:
                tablename = item['table']
                table = self.tables[tablename]
                content = item['content']

                #skip empty
                if not content:
                    continue

                dt_string = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                #check table if content exists - content is primary key.
                db_content = conn.execute(
                    select([table]).
                    where(table.c.content == content)).fetchone()
                if db_content == None:
                    conn.execute(
                        table.insert().values({'content': content,
                                               'count': 1,
                                               'firsttime': dt_string,
                                               'lasttime': dt_string}))
                else:
                    #update existing entry
                    conn.execute(
                        table.update().
                        where(table.c.content == content).
                        values(lastime=dt_string,
                               count=table.c.count + 1))
        #TODO: Clean up db?


    def get_dork_list(self, tablename, starts_with=None):
        with DorkDB.sqlite_lock:
            table = self.tables[tablename]

            if starts_with == None:
                result = self.conn.execute(select([table]))
            else:
                result = self.conn.execute(
                    table.select().
                    where(table.c.content.like('%{0}'.format(starts_with))))

        return_list = []
        for entry in result:
            return_list.append(entry[0])

        return return_list

@glaslos
Copy link
Member

glaslos commented Jan 18, 2013

I think that is the way to go... But still feel free to criticize me I haven't put a lot of thinking in it. Also what do you think of having the events and the dorks in the same database in different tables? Instead of inserting events and dorks separately, we can insert the dorks as soon as we insert the event. Also request_url in the events.db and content in the dork.db are the same, right? We could use the content column in the dorks.db as request_url in the events.db and save some space.

@johnnykv
Copy link
Member Author

I like that idea, also easy to implement. Only problem would be if your sensor only uses hpfeeds for logging you can't use the dork stuff.

@glaslos
Copy link
Member

glaslos commented Jan 18, 2013

Just start with assuming we can create a sqlite database. I'll talk to Mark during the workshop to figure out if we can get a query interface in HPFeeds. Like my sensor send a request for dorks to hpfeeds and a "machine-learning-uber-beast" selects the perfect dorks for me and publishes them to a channel. This could mean instead of showing the same top 10 dorks on all sensors we can distribute the attack surface. If you think thats too slow, we can cache the dorks for X minutes on every sensor. The uber-brain also gets the events and is able to evaluate the effectiveness of the used domain, the dorks, various configuration setting and location of the sensor.

@johnnykv
Copy link
Member Author

I will give it a shot, should be pretty easy to implement.

I like the idea of being able to query hpfeeds data - actually i like it so much that i already made a API for just that :-) Would be a no-nobrainer to extend that to output dorks.

@glaslos
Copy link
Member

glaslos commented Jan 18, 2013

Well that's very cool! We might skip the request via HPFeeds and go directly via HTTP to your API.

@glaslos
Copy link
Member

glaslos commented Jan 18, 2013

What do you think about using your RAPI service as bootstrapping for Glastopf sensors? So instead of loading the same dorks from the same database for every Glastopf sensor, let them ask your service for 1k of mixed dorks (or let them provide some parameters if they are interested in something specific) which they then use to create the first dork pages.
you could provide an RAPI call to provide a special configuration with an identifier for every sensor, HPFeeds key and then we can track what kind of data we get back from them.

@johnnykv
Copy link
Member Author

Yeah, entirely doable. Would require some interaction with the hpfeeds auth system. Ill get started with a simple simple unauthorized dork servicer for tryout. Ok, way out of topic on this issue. Closing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants