-
Notifications
You must be signed in to change notification settings - Fork 53
Project dead? Need takeover? #72
Comments
Hi @deadbits! Thank you very much for your offer. It's really appreciated! I definitely believe in Aleph and the only reason it seems abandoned is indeed the lack of developers/time. Are you interested on leading the development here? What exactly do you have in mind? Bringing up this discussion already helps. =) |
Well initially I have some ideas for a bit of everything. Is this the type of work you are as inline with the project? If so, I can take over leading development here or at the very least implement some new features and help with PRs and issues General:
Collectors (updated):
Parsing & Enrichment (updated):
Exporting:
|
Some more feature thoughts (updated) :
|
Hi folks. These are all great ideas!
I am the original developer of aleph and indeed I started reworking the
whole code from scratch using celery to make it more scalable.
Unfortunately a bunch of stuff happened in the meantime and I had to put it
on hold. Having more people working in the code is always appreciated.
I'll work with @merces to upload this code into a new branch and we can
start from there. Sounds good?
Cheers
…On Thu, Dec 27, 2018, 10:36 AM Adam M. Swanda ***@***.*** wrote:
Another thought : tokenizing emails and using those as passwords for
protected attachments instead of the hard-coded list that exists now
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARRobJL_ywuQY3gP8NOcuxvFhTO2lXDks5u9RNFgaJpZM4ZiM-K>
.
|
That sounds good to me. Thanks!
…
On Dec 27, 2018 at 11:06 PM, <Jan Seidl ***@***.***)> wrote:
Hi folks. These are all great ideas!
I am the original developer of aleph and indeed I started reworking the
whole code from scratch using celery to make it more scalable.
Unfortunately a bunch of stuff happened in the meantime and I had to put it
on hold. Having more people working in the code is always appreciated.
I'll work with @merces to upload this code into a new branch and we can
start from there. Sounds good?
Cheers
On Thu, Dec 27, 2018, 10:36 AM Adam M. Swanda ***@***.***
wrote:
> Another thought : tokenizing emails and using those as passwords for
> protected attachments instead of the hard-coded list that exists now
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#72 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AARRobJL_ywuQY3gP8NOcuxvFhTO2lXDks5u9RNFgaJpZM4ZiM-K>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#72 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/ABRWFe9ZkIbo_Fc1Rx6QL5zrOS2G82bsks5u9ZjTgaJpZM4ZiM-K).
|
I know we'll likely move this to another ticket, or several, but I put all my ideas into one list so it's easier to view instead of my comments above: General
Samples Object
Collectors
Plugins
Decoders (subset of plugin to run under certain conditions?)
Export Options |
Wow. This is a lot of good ideas indeed! Thanks for that, @deadbits! @jseidl We can leverage the "Development" branch as this is not being used by anyone. Would you be able to upload this new code there by the end of next week? I think we should leverage the energy @deadbits is willing to put on it to start it as soon as possible. 🙂 Thank you all! |
@merces There's also a handful of open-source Python libraries I have in mind to lean on for some of the ideas so it's not code written from scratch. Though, I think a decent amount of them are quick-wins while others require more major work. Regardless, I'm definitely up to help out in any way I can, and working with you and @jseidl to figure out what should be kept or scrapped, what should be prioritized, etc., etc. |
There are a lot of good ideas, some of them are actually part of the
current aleph state such as keeping track of the sources (different
filenames) and stuff. I have all my V2 code in a bitbucket private repo
because I was playing on having the core libs as a separate project, linked
to the processor nodes app and collector nodes app via GIT but I'm still
not sure on this approach. I think I have a presentation I did about my
ideas for V2, I'll take a look over here and add the link to this topic.
The main idea of using celery is benefiting from celery's inherent
scalability and pluggability which allows you to use many different
backends for messaging (inter-process) and allows you to add nodes as per
need (eg. add more collector nodes, or add more processor nodes as you
need). The idea of having pluggable inputs/outputs and processor plugins is
the core idea behind Aleph.
IIRC I had almost everything of the core functionality already ported to
the V2 scheme but it isn't polished AT ALL. I indeed crave since the
beginning for having TDD (unit tests) for every single module that we can.
I'm not a Github ninja but I'm afraid we need something else to keep track
of all the ideas, what is going into each work "round" and etc. Any ideas?
Can we use github by itself without polluting the current release? Don't
want like the issues or ideas for the dev version showing up on the main
project page etc.
At the top of my head, the idea for V2 was having the core relying on
celery for distributed processing with a MQ backend, which would also serve
as transport channel between collectors and the processing cluster. This
would allow anyone to develop a collector, simply by having it in the end,
posting the collected sample to the MQ in a given channel/topic. Also with
a document database gatheing all our extracted data (such as
elasticsearch), enables us to further datamine the crap out of it :). V2
does have pluggable output (like storing the samples in a sample vault, or
simple filesystem, or amazon S3) planned.
In the end, the idea is for aleph to be a framework, with
pluggable/interchangeable parts that share the same common interface.
I'll gather all the resources I have for V2 and get back on this thread so
we can schedule a hangouts or something to discuss.
Cheers!
Jan Seidl
http://wroot.org
http://www.linkedin.com/in/janseidl
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version.
…On Wed, Jan 2, 2019 at 4:18 PM Adam M. Swanda ***@***.***> wrote:
@merces <https://github.com/merces>
Definitely a lot of ideas, indeed heh I don't know how many fit with the
direction of this project and imho some would be higher priority than
others. Not to mention implementing *all* those would take quite some
time. Though, I think a decent amount of them are quick-wins while others
require more major work.
Regardless, I'm definitely up to helping out in any way I can, and working
with you and @jseidl <https://github.com/jseidl> to figure out what
should be kept or scraped, what should be prioritized, etc., etc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARRoT8sGyXmlkMkfyyhrzWu_Ec2u-3oks5u_Uw-gaJpZM4ZiM-K>
.
|
@jseidl As far as tracking "sprints" and work progress, tasks, etc.:
- We could either use Github's built-in "Projects" feature that is enabled for each repo (I think you just need to enable it in the Repo settings page)
- Something like Waffle.io which does very much the same thing but is more full featured and hooks with Github's API and has some other neat integrations.
Github projects (https://help.github.com/articles/about-project-boards/) can be restricted by the repo owner to who can view/edit the Issues on the board.
On Waffle.io I believe your project board is private (don't quote me on that), and there's a free tier https://waffle.io/pricing
Celery and MQ is definitely a perfect approach for what you're talking about too. I was even going to suggest ZMQ before I finished reading your post. I like everything I hear so far :)
A hangout would be great to link up and figure out the planning / Issues workflow for sure. I'm pretty flexible with my schedule so anytime that's good for all of you. I'm in the US on Eastern time- not sure where you all are but I'm sure we can figure something out.
…--
Adam M. Swanda
PGP: https://keybase.io/deadbits
On Jan 2, 2019 at 7:36 PM, <Jan Seidl ***@***.***)> wrote:
There are a lot of good ideas, some of them are actually part of the
current aleph state such as keeping track of the sources (different
filenames) and stuff. I have all my V2 code in a bitbucket private repo
because I was playing on having the core libs as a separate project, linked
to the processor nodes app and collector nodes app via GIT but I'm still
not sure on this approach. I think I have a presentation I did about my
ideas for V2, I'll take a look over here and add the link to this topic.
The main idea of using celery is benefiting from celery's inherent
scalability and pluggability which allows you to use many different
backends for messaging (inter-process) and allows you to add nodes as per
need (eg. add more collector nodes, or add more processor nodes as you
need). The idea of having pluggable inputs/outputs and processor plugins is
the core idea behind Aleph.
IIRC I had almost everything of the core functionality already ported to
the V2 scheme but it isn't polished AT ALL. I indeed crave since the
beginning for having TDD (unit tests) for every single module that we can.
I'm not a Github ninja but I'm afraid we need something else to keep track
of all the ideas, what is going into each work "round" and etc. Any ideas?
Can we use github by itself without polluting the current release? Don't
want like the issues or ideas for the dev version showing up on the main
project page etc.
At the top of my head, the idea for V2 was having the core relying on
celery for distributed processing with a MQ backend, which would also serve
as transport channel between collectors and the processing cluster. This
would allow anyone to develop a collector, simply by having it in the end,
posting the collected sample to the MQ in a given channel/topic. Also with
a document database gatheing all our extracted data (such as
elasticsearch), enables us to further datamine the crap out of it :). V2
does have pluggable output (like storing the samples in a sample vault, or
simple filesystem, or amazon S3) planned.
In the end, the idea is for aleph to be a framework, with
pluggable/interchangeable parts that share the same common interface.
I'll gather all the resources I have for V2 and get back on this thread so
we can schedule a hangouts or something to discuss.
Cheers!
Jan Seidl
http://wroot.org
http://www.linkedin.com/in/janseidl
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version.
On Wed, Jan 2, 2019 at 4:18 PM Adam M. Swanda ***@***.***>
wrote:
> @merces <https://github.com/merces>
> Definitely a lot of ideas, indeed heh I don't know how many fit with the
> direction of this project and imho some would be higher priority than
> others. Not to mention implementing *all* those would take quite some
> time. Though, I think a decent amount of them are quick-wins while others
> require more major work.
>
> Regardless, I'm definitely up to helping out in any way I can, and working
> with you and @jseidl <https://github.com/jseidl> to figure out what
> should be kept or scraped, what should be prioritized, etc., etc.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#72 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AARRoT8sGyXmlkMkfyyhrzWu_Ec2u-3oks5u_Uw-gaJpZM4ZiM-K>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#72 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/ABRWFSRecpcFIJf6Lm5safPgdSqSw6glks5u_VBwgaJpZM4ZiM-K).
|
Right on! Actually, it is planned to support ZMQ for standalone
applications.
Thanks for the suggestions on projects and waffle, will definitely take a
look.
I'm in the US on Pacific Time, have a pretty flexible schedule too. The
other folks?
Jan Seidl
http://wroot.org
http://www.linkedin.com/in/janseidl
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version.
On Wed, Jan 2, 2019 at 5:00 PM Adam M. Swanda <notifications@github.com>
wrote:
…
@jseidl As far as tracking "sprints" and work progress, tasks, etc.:
- We could either use Github's built-in "Projects" feature that is enabled
for each repo (I think you just need to enable it in the Repo settings page)
- Something like Waffle.io which does very much the same thing but is more
full featured and hooks with Github's API and has some other neat
integrations.
Github projects (https://help.github.com/articles/about-project-boards/)
can be restricted by the repo owner to who can view/edit the Issues on the
board.
On Waffle.io I believe your project board is private (don't quote me on
that), and there's a free tier https://waffle.io/pricing
Celery and MQ is definitely a perfect approach for what you're talking
about too. I was even going to suggest ZMQ before I finished reading your
post. I like everything I hear so far :)
A hangout would be great to link up and figure out the planning / Issues
workflow for sure. I'm pretty flexible with my schedule so anytime that's
good for all of you. I'm in the US on Eastern time- not sure where you all
are but I'm sure we can figure something out.
--
Adam M. Swanda
PGP: https://keybase.io/deadbits
>
> On Jan 2, 2019 at 7:36 PM, <Jan Seidl ***@***.***)>
wrote:
>
>
> There are a lot of good ideas, some of them are actually part of the
> current aleph state such as keeping track of the sources (different
> filenames) and stuff. I have all my V2 code in a bitbucket private repo
> because I was playing on having the core libs as a separate project,
linked
> to the processor nodes app and collector nodes app via GIT but I'm still
> not sure on this approach. I think I have a presentation I did about my
> ideas for V2, I'll take a look over here and add the link to this topic.
>
> The main idea of using celery is benefiting from celery's inherent
> scalability and pluggability which allows you to use many different
> backends for messaging (inter-process) and allows you to add nodes as per
> need (eg. add more collector nodes, or add more processor nodes as you
> need). The idea of having pluggable inputs/outputs and processor plugins
is
> the core idea behind Aleph.
>
> IIRC I had almost everything of the core functionality already ported to
> the V2 scheme but it isn't polished AT ALL. I indeed crave since the
> beginning for having TDD (unit tests) for every single module that we
can.
>
> I'm not a Github ninja but I'm afraid we need something else to keep
track
> of all the ideas, what is going into each work "round" and etc. Any
ideas?
> Can we use github by itself without polluting the current release? Don't
> want like the issues or ideas for the dev version showing up on the main
> project page etc.
>
> At the top of my head, the idea for V2 was having the core relying on
> celery for distributed processing with a MQ backend, which would also
serve
> as transport channel between collectors and the processing cluster. This
> would allow anyone to develop a collector, simply by having it in the
end,
> posting the collected sample to the MQ in a given channel/topic. Also
with
> a document database gatheing all our extracted data (such as
> elasticsearch), enables us to further datamine the crap out of it :). V2
> does have pluggable output (like storing the samples in a sample vault,
or
> simple filesystem, or amazon S3) planned.
>
> In the end, the idea is for aleph to be a framework, with
> pluggable/interchangeable parts that share the same common interface.
>
> I'll gather all the resources I have for V2 and get back on this thread
so
> we can schedule a hangouts or something to discuss.
>
> Cheers!
>
> Jan Seidl
>
> http://wroot.org
> http://www.linkedin.com/in/janseidl
>
>
> This message contains confidential information and is intended only for
the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be
intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain
viruses.
> The sender therefore does not accept liability for any errors or
omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required please request a hard-copy
> version.
>
>
> On Wed, Jan 2, 2019 at 4:18 PM Adam M. Swanda ***@***.***>
> wrote:
>
> > @merces <https://github.com/merces>
> > Definitely a lot of ideas, indeed heh I don't know how many fit with
the
> > direction of this project and imho some would be higher priority than
> > others. Not to mention implementing *all* those would take quite some
> > time. Though, I think a decent amount of them are quick-wins while
others
> > require more major work.
> >
> > Regardless, I'm definitely up to helping out in any way I can, and
working
> > with you and @jseidl <https://github.com/jseidl> to figure out what
> > should be kept or scraped, what should be prioritized, etc., etc.
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <#72 (comment)>,
or mute
> > the thread
> > <
https://github.com/notifications/unsubscribe-auth/AARRoT8sGyXmlkMkfyyhrzWu_Ec2u-3oks5u_Uw-gaJpZM4ZiM-K
>
> > .
> >
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub (
#72 (comment)), or
mute the thread (
https://github.com/notifications/unsubscribe-auth/ABRWFSRecpcFIJf6Lm5safPgdSqSw6glks5u_VBwgaJpZM4ZiM-K
).
>
>
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARRoXMk6c4eWw32HqRzgCgzXbY8BmJ6ks5u_VZIgaJpZM4ZiM-K>
.
|
Hi folks,
After extensive digging, I've found the presentation. I think the thought
process hasn't changed from whem this document was created, except that
instead of using celery just for the scheduled tasks, the processor daemon
itself will be run on celery to ease deployment and worker control,
possibly with auto scaling. Also I'd like to all the collectors fist save
the sample locally then consume the local file into the transport to avoid
losing the sample in case connection fails abruptly or something else weird
happens during collection.
On the processor side, on starting up reprocess samples left in the temp
dir, delete from temp dir only when making sure all data is stored on the
backends.
I never attached a PDF into a github thread over email so if the attachment
doesn't upload I'll host somewhere and put the link in a next entry.
Cheers!
Jan Seidl
http://wroot.org
http://www.linkedin.com/in/janseidl
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version.
…On Wed, Jan 2, 2019 at 5:03 PM Jan Seidl ***@***.***> wrote:
Right on! Actually, it is planned to support ZMQ for standalone
applications.
Thanks for the suggestions on projects and waffle, will definitely take a
look.
I'm in the US on Pacific Time, have a pretty flexible schedule too. The
other folks?
Jan Seidl
http://wroot.org
http://www.linkedin.com/in/janseidl
This message contains confidential information and is intended only for
the individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version.
On Wed, Jan 2, 2019 at 5:00 PM Adam M. Swanda ***@***.***>
wrote:
>
>
>
> @jseidl As far as tracking "sprints" and work progress, tasks, etc.:
>
>
>
> - We could either use Github's built-in "Projects" feature that is
> enabled for each repo (I think you just need to enable it in the Repo
> settings page)
>
> - Something like Waffle.io which does very much the same thing but is
> more full featured and hooks with Github's API and has some other neat
> integrations.
>
>
>
> Github projects (https://help.github.com/articles/about-project-boards/)
> can be restricted by the repo owner to who can view/edit the Issues on the
> board.
>
>
>
> On Waffle.io I believe your project board is private (don't quote me on
> that), and there's a free tier https://waffle.io/pricing
>
>
>
> Celery and MQ is definitely a perfect approach for what you're talking
> about too. I was even going to suggest ZMQ before I finished reading your
> post. I like everything I hear so far :)
>
>
>
> A hangout would be great to link up and figure out the planning / Issues
> workflow for sure. I'm pretty flexible with my schedule so anytime that's
> good for all of you. I'm in the US on Eastern time- not sure where you all
> are but I'm sure we can figure something out.
>
>
>
> --
> Adam M. Swanda
>
>
>
> PGP: https://keybase.io/deadbits
>
>
>
>
>
>
>
> >
> > On Jan 2, 2019 at 7:36 PM, <Jan Seidl ***@***.***)>
> wrote:
> >
> >
> > There are a lot of good ideas, some of them are actually part of the
> > current aleph state such as keeping track of the sources (different
> > filenames) and stuff. I have all my V2 code in a bitbucket private repo
> > because I was playing on having the core libs as a separate project,
> linked
> > to the processor nodes app and collector nodes app via GIT but I'm still
> > not sure on this approach. I think I have a presentation I did about my
> > ideas for V2, I'll take a look over here and add the link to this topic.
> >
> > The main idea of using celery is benefiting from celery's inherent
> > scalability and pluggability which allows you to use many different
> > backends for messaging (inter-process) and allows you to add nodes as
> per
> > need (eg. add more collector nodes, or add more processor nodes as you
> > need). The idea of having pluggable inputs/outputs and processor
> plugins is
> > the core idea behind Aleph.
> >
> > IIRC I had almost everything of the core functionality already ported to
> > the V2 scheme but it isn't polished AT ALL. I indeed crave since the
> > beginning for having TDD (unit tests) for every single module that we
> can.
> >
> > I'm not a Github ninja but I'm afraid we need something else to keep
> track
> > of all the ideas, what is going into each work "round" and etc. Any
> ideas?
> > Can we use github by itself without polluting the current release? Don't
> > want like the issues or ideas for the dev version showing up on the main
> > project page etc.
> >
> > At the top of my head, the idea for V2 was having the core relying on
> > celery for distributed processing with a MQ backend, which would also
> serve
> > as transport channel between collectors and the processing cluster. This
> > would allow anyone to develop a collector, simply by having it in the
> end,
> > posting the collected sample to the MQ in a given channel/topic. Also
> with
> > a document database gatheing all our extracted data (such as
> > elasticsearch), enables us to further datamine the crap out of it :). V2
> > does have pluggable output (like storing the samples in a sample vault,
> or
> > simple filesystem, or amazon S3) planned.
> >
> > In the end, the idea is for aleph to be a framework, with
> > pluggable/interchangeable parts that share the same common interface.
> >
> > I'll gather all the resources I have for V2 and get back on this thread
> so
> > we can schedule a hangouts or something to discuss.
> >
> > Cheers!
> >
> > Jan Seidl
> >
> > http://wroot.org
> > http://www.linkedin.com/in/janseidl
> >
> >
> > This message contains confidential information and is intended only for
> the
> > individual named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be
> intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain
> viruses.
> > The sender therefore does not accept liability for any errors or
> omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required please request a hard-copy
> > version.
> >
> >
> > On Wed, Jan 2, 2019 at 4:18 PM Adam M. Swanda ***@***.***
> >
> > wrote:
> >
> > > @merces <https://github.com/merces>
> > > Definitely a lot of ideas, indeed heh I don't know how many fit with
> the
> > > direction of this project and imho some would be higher priority than
> > > others. Not to mention implementing *all* those would take quite some
> > > time. Though, I think a decent amount of them are quick-wins while
> others
> > > require more major work.
> > >
> > > Regardless, I'm definitely up to helping out in any way I can, and
> working
> > > with you and @jseidl <https://github.com/jseidl> to figure out what
> > > should be kept or scraped, what should be prioritized, etc., etc.
> > >
> > > —
> > > You are receiving this because you were mentioned.
> > > Reply to this email directly, view it on GitHub
> > > <#72 (comment)>,
> or mute
> > > the thread
> > > <
> https://github.com/notifications/unsubscribe-auth/AARRoT8sGyXmlkMkfyyhrzWu_Ec2u-3oks5u_Uw-gaJpZM4ZiM-K
> >
> > > .
> > >
> >
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub (
> #72 (comment)), or
> mute the thread (
> https://github.com/notifications/unsubscribe-auth/ABRWFSRecpcFIJf6Lm5safPgdSqSw6glks5u_VBwgaJpZM4ZiM-K
> ).
> >
> >
> >
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#72 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AARRoXMk6c4eWw32HqRzgCgzXbY8BmJ6ks5u_VZIgaJpZM4ZiM-K>
> .
>
|
Ok, attaching PDF from mail didn't work. Uploaded to my Drive here: https://drive.google.com/open?id=1lvNFhJcguHfLgXHm865XXWVnfahTQcOA |
I built a project similar to this architecture and this is definitely the best approach. I'm guessing you're already planning this but storing locally by hash is a solid way to avoid collisions (instead of uuid4 or what not) Basically:
@jseidl We can schedule some time to sync up next week maybe? Or this weekend even if that works for you. My weekday evenings are typically open, tomorrow I'm out most of the afternoon. Outside of that, I'm ready to get rolling 🚀 |
Sure Adam, anytime after 11am pacific works for me.
Using UUID4 for ID was something I was testing and I'm quite regretful
hehehe. Yes we should name the local files as per hashing.
Jan Seidl
http://wroot.org
http://www.linkedin.com/in/janseidl
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version.
…On Fri, Jan 4, 2019 at 5:14 PM Adam M. Swanda ***@***.***> wrote:
Also I'd like to all the collectors fist save the sample locally then
consume the local file into the transport to avoid losing the sample in
case connection fails abruptly or something else weird happens during
collection.
...
On the processor side, on starting up reprocess samples left in the temp
dir, delete from temp dir only when making sure all data is stored on the
backends.
I built a project similar to this architecture and this is definitely the
best approach. I'm guessing you're already planning this but storing
locally by hash is a solid way to avoid collisions (instead of uuid4 or
what not)
Basically:
- Receive sample from wherever
- Store locally with a unique file name
- Put the file into transport
- When you're sure it's stored on the backend DB (or at least
accepted by the Consumer as an Object), delete it locally
@jseidl <https://github.com/jseidl> We can schedule some time to sync up
next week maybe? Or this weekend even if that works for you. My weekday
evenings are typically open, tomorrow I'm out most of the afternoon.
Outside of that, I'm ready to get rolling 🚀
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARRoZwjvGFlj6VVwC2kDP9i1Rhw6lqPks5u__xugaJpZM4ZiM-K>
.
|
Read through your presentation last night - good stuff! Overall it sounds like a really solid framework and the ideas on how to scale it, create the components separately, etc., are all awesome. I saw you had that plugins would "run in order". I might have misread or skipped a part but is the idea for plugins to run one a a time on any given Processor, or would plugins for a MIME type run in parallel via threading/multiprocessing? These are thoughts for way down the road but just had it on my mind after reading your PDF: Basically sending files down different plugins "paths" depending on their MIME type and any useful information from the previous plugin. Also, The malware framework FAME also has a pretty cool feature for their plugins where a plugin inheriting the base class can use "acts_on", "generates", "triggered_by" and a few others. It's an interesting idea that might be useful to think on how to implement something similar. "generates" alerts of various types, or "triggered_by" another module in my example above |
Hi Adam.
At the current state the plugin system has I s own "acts on" which are the
mimetype of the files it should, well, act on. The triggered by idea is
very good we should use as well.
The plugin chaining idea was for future proofing since the idea of having a
separate class of plug-ins for intelligence already solves the immediate
need for chaining.
As everything will be celery, plug-ins could call the depending plugin
directly, async, and it would be run by another worker node (iirc) thus not
hogging that much the pipeline. I don't know. This particular feature
definitely needs more thinking.
Ping me directly so we can schedule our talk,
Cheers!
…On Sat, Jan 5, 2019, 5:50 AM Adam M. Swanda ***@***.*** wrote:
Read through your presentation last night - good stuff! Overall it sounds
like a really solid framework and the ideas on how to scale it, create the
components separately, etc., are all awesome.
I saw you had that plugins would "run in order". I might have misread or
skipped a part but is the idea for plugins to run one a a time on any given
Processor, or would plugins for a MIME type run in parallel via
threading/multiprocessing?
------------------------------
These are thoughts for way down the road but just had it on my mind after
reading your PDF:
Another idea could be to have the plugins have an order of execution per
MIME type, so each plugin can act on the results of the last. For example,
maybe a Zip file comes in so it hits the "brute_zip" plugin, inside is an
executable so the "yara_scan" plugin runs; the results of "yara_scan" says
that the executable is Trojan ABC- so the "malware_decoder" plugin runs,
and then "extract_iocs" runs on the results on malware_decoder, and so
on... That way you get the results of all the plugins still but get to
provide deeper levels of context as opposed to say: If file == EXE run
strings and extract_iocs, sort of thing
Basically sending files down different plugins "paths" depending on their
MIME type and any useful information from the previous plugin.
------------------------------
Also, The malware framework FAME also has a pretty cool feature for their
plugins where a plugin inheriting the base class can use "acts_on",
"generates", "triggered_by" and a few others. It's an interesting idea that
might be useful to think on how to implement something similar. "generates"
alerts of various types, or "triggered_by" another module in my example
above
https://github.com/certsocietegenerale/fame/blob/ab0e9cc3640b2337dbd873a41e03987ba1ba8035/docs/modules.rst#scope
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARRoWCBHqGaYig-NVDQzpHegwE71kWsks5vAK2kgaJpZM4ZiM-K>
.
|
Jan,
How's tomorrow sometime between 1PM and 4PM Eastern?
Otherwise I'm open essentially anytime after 5-6pm on weekdays. I'm kind of packed today being out and about.
Wire, Google Meet (of Duo) works for me, or there's https://appear.in/ which is pretty solid too. Just lmk what time works best for you and we can just directly email invite details.
If @merces wants to be in on the call too, that'd be great. You both have much more knowledge of the current state of things than me heh
Thanks! Hope to chat soon!
--
Adam M. Swanda
PGP: https://keybase.io/deadbits
…
On Jan 5, 2019 at 3:29 PM, <Jan Seidl ***@***.***)> wrote:
Hi Adam.
At the current state the plugin system has I s own "acts on" which are the
mimetype of the files it should, well, act on. The triggered by idea is
very good we should use as well.
The plugin chaining idea was for future proofing since the idea of having a
separate class of plug-ins for intelligence already solves the immediate
need for chaining.
As everything will be celery, plug-ins could call the depending plugin
directly, async, and it would be run by another worker node (iirc) thus not
hogging that much the pipeline. I don't know. This particular feature
definitely needs more thinking.
Ping me directly so we can schedule our talk,
Cheers!
On Sat, Jan 5, 2019, 5:50 AM Adam M. Swanda ***@***.*** wrote:
> Read through your presentation last night - good stuff! Overall it sounds
> like a really solid framework and the ideas on how to scale it, create the
> components separately, etc., are all awesome.
>
> I saw you had that plugins would "run in order". I might have misread or
> skipped a part but is the idea for plugins to run one a a time on any given
> Processor, or would plugins for a MIME type run in parallel via
> threading/multiprocessing?
> ------------------------------
>
> These are thoughts for way down the road but just had it on my mind after
> reading your PDF:
> Another idea could be to have the plugins have an order of execution per
> MIME type, so each plugin can act on the results of the last. For example,
> maybe a Zip file comes in so it hits the "brute_zip" plugin, inside is an
> executable so the "yara_scan" plugin runs; the results of "yara_scan" says
> that the executable is Trojan ABC- so the "malware_decoder" plugin runs,
> and then "extract_iocs" runs on the results on malware_decoder, and so
> on... That way you get the results of all the plugins still but get to
> provide deeper levels of context as opposed to say: If file == EXE run
> strings and extract_iocs, sort of thing
>
> Basically sending files down different plugins "paths" depending on their
> MIME type and any useful information from the previous plugin.
> ------------------------------
>
> Also, The malware framework FAME also has a pretty cool feature for their
> plugins where a plugin inheriting the base class can use "acts_on",
> "generates", "triggered_by" and a few others. It's an interesting idea that
> might be useful to think on how to implement something similar. "generates"
> alerts of various types, or "triggered_by" another module in my example
> above
>
> https://github.com/certsocietegenerale/fame/blob/ab0e9cc3640b2337dbd873a41e03987ba1ba8035/docs/modules.rst#scope
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#72 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AARRoWCBHqGaYig-NVDQzpHegwE71kWsks5vAK2kgaJpZM4ZiM-K>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#72 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/ABRWFeOnU79u-9hCnFSpA0CKGyATz-4Pks5vAQsmgaJpZM4ZiM-K).
|
Hi Adam. 3pm works best for me (12pm here pdt). I'm up for duo.
On Sat, Jan 5, 2019, 12:55 PM Adam M. Swanda <notifications@github.com
wrote:
…
Jan,
How's tomorrow sometime between 1PM and 4PM Eastern?
Otherwise I'm open essentially anytime after 5-6pm on weekdays. I'm kind
of packed today being out and about.
Wire, Google Meet (of Duo) works for me, or there's https://appear.in/
which is pretty solid too. Just lmk what time works best for you and we can
just directly email invite details.
If @merces wants to be in on the call too, that'd be great. You both have
much more knowledge of the current state of things than me heh
Thanks! Hope to chat soon!
--
Adam M. Swanda
PGP: https://keybase.io/deadbits
>
> On Jan 5, 2019 at 3:29 PM, <Jan Seidl ***@***.***)>
wrote:
>
>
> Hi Adam.
>
> At the current state the plugin system has I s own "acts on" which are
the
> mimetype of the files it should, well, act on. The triggered by idea is
> very good we should use as well.
>
> The plugin chaining idea was for future proofing since the idea of
having a
> separate class of plug-ins for intelligence already solves the immediate
> need for chaining.
>
> As everything will be celery, plug-ins could call the depending plugin
> directly, async, and it would be run by another worker node (iirc) thus
not
> hogging that much the pipeline. I don't know. This particular feature
> definitely needs more thinking.
>
> Ping me directly so we can schedule our talk,
> Cheers!
>
> On Sat, Jan 5, 2019, 5:50 AM Adam M. Swanda ***@***.***
wrote:
>
> > Read through your presentation last night - good stuff! Overall it
sounds
> > like a really solid framework and the ideas on how to scale it, create
the
> > components separately, etc., are all awesome.
> >
> > I saw you had that plugins would "run in order". I might have misread
or
> > skipped a part but is the idea for plugins to run one a a time on any
given
> > Processor, or would plugins for a MIME type run in parallel via
> > threading/multiprocessing?
> > ------------------------------
> >
> > These are thoughts for way down the road but just had it on my mind
after
> > reading your PDF:
> > Another idea could be to have the plugins have an order of execution
per
> > MIME type, so each plugin can act on the results of the last. For
example,
> > maybe a Zip file comes in so it hits the "brute_zip" plugin, inside is
an
> > executable so the "yara_scan" plugin runs; the results of "yara_scan"
says
> > that the executable is Trojan ABC- so the "malware_decoder" plugin
runs,
> > and then "extract_iocs" runs on the results on malware_decoder, and so
> > on... That way you get the results of all the plugins still but get to
> > provide deeper levels of context as opposed to say: If file == EXE run
> > strings and extract_iocs, sort of thing
> >
> > Basically sending files down different plugins "paths" depending on
their
> > MIME type and any useful information from the previous plugin.
> > ------------------------------
> >
> > Also, The malware framework FAME also has a pretty cool feature for
their
> > plugins where a plugin inheriting the base class can use "acts_on",
> > "generates", "triggered_by" and a few others. It's an interesting idea
that
> > might be useful to think on how to implement something similar.
"generates"
> > alerts of various types, or "triggered_by" another module in my example
> > above
> >
> >
https://github.com/certsocietegenerale/fame/blob/ab0e9cc3640b2337dbd873a41e03987ba1ba8035/docs/modules.rst#scope
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <#72 (comment)>,
or mute
> > the thread
> > <
https://github.com/notifications/unsubscribe-auth/AARRoWCBHqGaYig-NVDQzpHegwE71kWsks5vAK2kgaJpZM4ZiM-K
>
> > .
> >
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub (
#72 (comment)), or
mute the thread (
https://github.com/notifications/unsubscribe-auth/ABRWFeOnU79u-9hCnFSpA0CKGyATz-4Pks5vAQsmgaJpZM4ZiM-K
).
>
>
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARRod8R4b6m2xuraDcIQUfXo-KBPsGAks5vARE6gaJpZM4ZiM-K>
.
|
Great! I am driving at the moment but when I get stationary I’ll shoot you an email with an invite
…--
Adam M. Swanda
PGP: https://keybase.io/deadbits
On Jan 5, 2019 at 3:58 PM, <Jan Seidl ***@***.***)> wrote:
Hi Adam. 3pm works best for me (12pm here pdt). I'm up for duo.
On Sat, Jan 5, 2019, 12:55 PM Adam M. Swanda ***@***.***
wrote:
>
>
>
> Jan,
>
>
>
> How's tomorrow sometime between 1PM and 4PM Eastern?
>
>
> Otherwise I'm open essentially anytime after 5-6pm on weekdays. I'm kind
> of packed today being out and about.
>
>
>
> Wire, Google Meet (of Duo) works for me, or there's https://appear.in/
> which is pretty solid too. Just lmk what time works best for you and we can
> just directly email invite details.
>
>
>
> If @merces wants to be in on the call too, that'd be great. You both have
> much more knowledge of the current state of things than me heh
>
>
>
> Thanks! Hope to chat soon!
> --
> Adam M. Swanda
>
>
>
> PGP: https://keybase.io/deadbits
>
>
>
>
>
>
>
> >
> > On Jan 5, 2019 at 3:29 PM, <Jan Seidl ***@***.***)>
> wrote:
> >
> >
> > Hi Adam.
> >
> > At the current state the plugin system has I s own "acts on" which are
> the
> > mimetype of the files it should, well, act on. The triggered by idea is
> > very good we should use as well.
> >
> > The plugin chaining idea was for future proofing since the idea of
> having a
> > separate class of plug-ins for intelligence already solves the immediate
> > need for chaining.
> >
> > As everything will be celery, plug-ins could call the depending plugin
> > directly, async, and it would be run by another worker node (iirc) thus
> not
> > hogging that much the pipeline. I don't know. This particular feature
> > definitely needs more thinking.
> >
> > Ping me directly so we can schedule our talk,
> > Cheers!
> >
> > On Sat, Jan 5, 2019, 5:50 AM Adam M. Swanda ***@***.***
> wrote:
> >
> > > Read through your presentation last night - good stuff! Overall it
> sounds
> > > like a really solid framework and the ideas on how to scale it, create
> the
> > > components separately, etc., are all awesome.
> > >
> > > I saw you had that plugins would "run in order". I might have misread
> or
> > > skipped a part but is the idea for plugins to run one a a time on any
> given
> > > Processor, or would plugins for a MIME type run in parallel via
> > > threading/multiprocessing?
> > > ------------------------------
> > >
> > > These are thoughts for way down the road but just had it on my mind
> after
> > > reading your PDF:
> > > Another idea could be to have the plugins have an order of execution
> per
> > > MIME type, so each plugin can act on the results of the last. For
> example,
> > > maybe a Zip file comes in so it hits the "brute_zip" plugin, inside is
> an
> > > executable so the "yara_scan" plugin runs; the results of "yara_scan"
> says
> > > that the executable is Trojan ABC- so the "malware_decoder" plugin
> runs,
> > > and then "extract_iocs" runs on the results on malware_decoder, and so
> > > on... That way you get the results of all the plugins still but get to
> > > provide deeper levels of context as opposed to say: If file == EXE run
> > > strings and extract_iocs, sort of thing
> > >
> > > Basically sending files down different plugins "paths" depending on
> their
> > > MIME type and any useful information from the previous plugin.
> > > ------------------------------
> > >
> > > Also, The malware framework FAME also has a pretty cool feature for
> their
> > > plugins where a plugin inheriting the base class can use "acts_on",
> > > "generates", "triggered_by" and a few others. It's an interesting idea
> that
> > > might be useful to think on how to implement something similar.
> "generates"
> > > alerts of various types, or "triggered_by" another module in my example
> > > above
> > >
> > >
> https://github.com/certsocietegenerale/fame/blob/ab0e9cc3640b2337dbd873a41e03987ba1ba8035/docs/modules.rst#scope
> > >
> > > —
> > > You are receiving this because you were mentioned.
> > > Reply to this email directly, view it on GitHub
> > > <#72 (comment)>,
> or mute
> > > the thread
> > > <
> https://github.com/notifications/unsubscribe-auth/AARRoWCBHqGaYig-NVDQzpHegwE71kWsks5vAK2kgaJpZM4ZiM-K
> >
> > > .
> > >
> >
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub (
> #72 (comment)), or
> mute the thread (
> https://github.com/notifications/unsubscribe-auth/ABRWFeOnU79u-9hCnFSpA0CKGyATz-4Pks5vAQsmgaJpZM4ZiM-K
> ).
> >
> >
> >
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#72 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AARRod8R4b6m2xuraDcIQUfXo-KBPsGAks5vARE6gaJpZM4ZiM-K>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#72 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/ABRWFeOfONkhePzlKngOf1W6E55NKdS8ks5vARHlgaJpZM4ZiM-K).
|
@jseidl we'll have to use Meet since Duo is mobile only and doesn't support screen share etc. |
We can probably close this at this point 😏 |
Just want to know if this is officially dead. If so, is it deprecated in favor of another project? Is it just lack of developers / time? is there anything the community can do to help?
The initial concept here has really solid potential and I'd hate to see it just disappear to time on GitHub. Lmk how I can help!
CC:
@merces @jseidl @turicas
The text was updated successfully, but these errors were encountered: