Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message level processing #119

Closed
ashubhumca opened this issue Aug 14, 2015 · 5 comments
Closed

Message level processing #119

ashubhumca opened this issue Aug 14, 2015 · 5 comments

Comments

@ashubhumca
Copy link
Contributor

Hi,

I started looking into Secor from last 2-3 days and found it really useful. Now, I've a requirement to process individual message before dumping it into Amazon S3. Processing could be kind of transformation, projection etc. After exploring the source code I didn't find any place where we can plug our custom transformation logic.
My first query is: Am I missing something in the source code? If Yes, then please let me know where can I do it? Otherwise I've created another package (In my local repo) which can be used for the same purpose and I would love to contribute my extension in the Secor repo.

Thanks,
Ashish

@zackdever
Copy link
Contributor

We handle this by setting secor.message.parser.class in the configs to a custom parser we wrote. As I understand it, the preference is for really generic parsers to be included in Secor, but it might be nice if there were some place outside of Secor to collect everyone's more specific parsers.

@HenryCaiHaiying
Copy link
Contributor

Secor's design philosophy is to have a simple data ingestion pipeline to
get the kafka data into S3 as fast as possible and act as the source of the
truth for kafka data on S3 The more data transformation you add to secor
will add the delay and possibly introduce more points for errors. Once the
data is on S3, you have a variety of options to write data transformation
logics to post process the data.

On Fri, Aug 14, 2015 at 12:25 PM, Zack Dever notifications@github.com
wrote:

We handle this by setting secor.message.parser.class in the configs to a
custom parser we wrote. As I understand it, the preference is for really
generic parsers to be included in Secor, but it might be nice if there were
some place outside of Secor to collect everyone's more specific parsers.


Reply to this email directly or view it on GitHub
#119 (comment).

@pgarbacki
Copy link
Contributor

+1 to what @HenryCaiHaiying wrote. There are better tools to do stream processing such as Storm or Samza.

@zackdever back in the day I created secor-contrib repo. I think it is a good place for less generic parsers. https://github.com/pinterest/secor-contrib

@ashubhumca
Copy link
Contributor Author

Thanks to all of you for your replies. I agree with all the points which you have mentioned.
The code snippet which I've added for my custom requirement won't be any issue for the overall design of Secor. I understand this is a general tool for data ingestion but some times there might be very basic transformations like projection, formatting etc. kind of requirements possible which won't be very CPU intensive. And by default there won't be any transformation until unless user is specifying something to do. Users should be aware of what kind of logic they want to apply for transformation for performance perspective. This is just an extension in the features of Secor.

Here is the basic idea:

One transformation interface:

 package com.pinterest.secor.transformer;
 public interface MessageTransformer {

      public byte[] transform(byte[] message);

 }

Then default transformation class:

 public class DefaultMessageTransformer implements MessageTransformer {

protected SecorConfig mConfig;

public DefaultMessageTransformer(SecorConfig config) {
    mConfig = config;
}

@Override
public byte[] transform(byte[] message) {
    return message;
}

 }

which will not be doing anything and will be available in all the config properties by default. And for any custom transformation user will have to plugin their transformation class.
Please share your thoughts.

Thanks and Regards,
Ashish

@ashubhumca
Copy link
Contributor Author

Support has been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants