-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dictionary for dorks storage with regular dumps to json file #29
Conversation
- Complete rewrite of dork_db.py. - Dorks are now completely stored in memory. - Every 10'th update the memory representation is dumped to a json file
I don't agree with your solution but at the same time I agree that this has to change. Other options to solve this:
What do you think? (Going to look at your code after lunch...) |
I see your point, i did not consider use cases involving embedded systems. Take a look at the implementation (pretty much finished) done with sqlalchemy, if you think that is better no biggie - ill just push that instead. import datetime
import threading
import logging
from sqlalchemy import Table, Column, Integer, String, MetaData
from sqlalchemy import create_engine, select
logger = logging.getLogger(__name__)
class DorkDB(object):
"""
Responsible for communication with the dork database.
"""
sqlite_lock = threading.Lock()
def __init__(self, dork_connection_string="sqlite:///db/dork.db"):
meta = MetaData()
self.tables = self.create(meta)
self.engine = create_engine(dork_connection_string)
#Create database if it does not exist
meta.create_all(self.engine)
self.conn = self.engine.connect()
def create(self, meta):
tables = {}
tablenames = ["intitle", "intext", "inurl", "filetype", "ext", "allinurl"]
for table in tablenames:
tables[table] = Table(table, meta,
Column('content', String, primary_key=True),
Column('count', Integer),
Column('firsttime', String),
Column('lasttime', String),
)
return tables
def insert(self, insert_list):
if len(insert_list) == 0:
return
#TODO: exception handling - or fail hard?
with DorkDB.sqlite_lock:
conn = self.engine.connect()
print "start"
for item in insert_list:
tablename = item['table']
table = self.tables[tablename]
content = item['content']
#skip empty
if not content:
continue
dt_string = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
#check table if content exists - content is primary key.
db_content = conn.execute(
select([table]).
where(table.c.content == content)).fetchone()
if db_content == None:
conn.execute(
table.insert().values({'content': content,
'count': 1,
'firsttime': dt_string,
'lasttime': dt_string}))
else:
#update existing entry
conn.execute(
table.update().
where(table.c.content == content).
values(lastime=dt_string,
count=table.c.count + 1))
#TODO: Clean up db?
def get_dork_list(self, tablename, starts_with=None):
with DorkDB.sqlite_lock:
table = self.tables[tablename]
if starts_with == None:
result = self.conn.execute(select([table]))
else:
result = self.conn.execute(
table.select().
where(table.c.content.like('%{0}'.format(starts_with))))
return_list = []
for entry in result:
return_list.append(entry[0])
return return_list |
I think that is the way to go... But still feel free to criticize me I haven't put a lot of thinking in it. Also what do you think of having the events and the dorks in the same database in different tables? Instead of inserting events and dorks separately, we can insert the dorks as soon as we insert the event. Also request_url in the events.db and content in the dork.db are the same, right? We could use the content column in the dorks.db as request_url in the events.db and save some space. |
I like that idea, also easy to implement. Only problem would be if your sensor only uses hpfeeds for logging you can't use the dork stuff. |
Just start with assuming we can create a sqlite database. I'll talk to Mark during the workshop to figure out if we can get a query interface in HPFeeds. Like my sensor send a request for dorks to hpfeeds and a "machine-learning-uber-beast" selects the perfect dorks for me and publishes them to a channel. This could mean instead of showing the same top 10 dorks on all sensors we can distribute the attack surface. If you think thats too slow, we can cache the dorks for X minutes on every sensor. The uber-brain also gets the events and is able to evaluate the effectiveness of the used domain, the dorks, various configuration setting and location of the sensor. |
I will give it a shot, should be pretty easy to implement. I like the idea of being able to query hpfeeds data - actually i like it so much that i already made a API for just that :-) Would be a no-nobrainer to extend that to output dorks. |
Well that's very cool! We might skip the request via HPFeeds and go directly via HTTP to your API. |
What do you think about using your RAPI service as bootstrapping for Glastopf sensors? So instead of loading the same dorks from the same database for every Glastopf sensor, let them ask your service for 1k of mixed dorks (or let them provide some parameters if they are interested in something specific) which they then use to create the first dork pages. |
Yeah, entirely doable. Would require some interaction with the hpfeeds auth system. Ill get started with a simple simple unauthorized dork servicer for tryout. Ok, way out of topic on this issue. Closing :) |
While rewriting the dork_db with sqlalchemy i realized that the dorks database does not in any way justify using a full-blown RDBMS... So here goes my take on a much more simple and maintainable approach.
From commit message: