Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cascalog should support nested tuples #22

Open
sritchie opened this issue Dec 6, 2011 · 8 comments
Open

Cascalog should support nested tuples #22

sritchie opened this issue Dec 6, 2011 · 8 comments
Labels

Comments

@sritchie
Copy link
Collaborator

sritchie commented Dec 6, 2011

e.g.

(<- [?blarg ?pivot] (src ?blarg ?blurg) (pivot ?blurg :>> ?pivot))

:>> into a var will capture the output into a nested tuple (just a seq of fields)

Unclear how to handle nested serialization. Perhaps Cascading can handle nested tuples?

@sritchie
Copy link
Collaborator Author

sritchie commented Dec 6, 2011

Nested serialization is now trivial with Kryo.

@sritchie
Copy link
Collaborator Author

I wonder what the efficiency (and syntax) of arbitrary destructuring forms in a query might be:

(let [src [[1 2 [[3] 4]]]]
   (<- [?three]
        (src [_ _ [[?three] ?four]])))

@isaiah
Copy link

isaiah commented Mar 8, 2012

Arbitrary destructuring syntax is essential to cope with scheme like protobuf message elephant-bird, the "flatten" keyword in piglatin works quite well in this situation.

@sritchie
Copy link
Collaborator Author

sritchie commented Mar 8, 2012

Interesting, can you give me an example of how this would look in Cascalog, with a protobuf message?

@Quantisan
Copy link
Collaborator

We sort of talked about this on the forum already.

message DateOfBirth {
  message Date {
    required int32 year = 1;
    required int32 month = 2;
    required int32 day = 3;
  }
  required int64 timestamp = 1;
  optional string user_id = 2;
  required Date date = 3;
}

Accessing this we're currently using:

(defn to-dob-y-m-d [x]
  (let [y  (.getInteger x 0)
        m  (.getInteger x 1)
        d  (.getInteger x 2)]
    [y m d]))

(defn dob-generator [dir]
  (let [src    (hfs-protobuf dir Customer$DateOfBirth     ;; custom tap
                             :outfields customer-date-of-birth-names)]
    (<- customer-date-of-birth-fields
        (src :>> (to-cascalog-fields customer-date-of-birth-names))
        (to-dob-y-m-d ?date :> ?dob-year ?dob-month ?dob-day))))

Ideally, a destructuring approach (suggested by @pingles) to skip to-dob-y-m-d such that we can do

(defn dob-generator [dir]
  (let [src    (hfs-protobuf dir Customer$DateOfBirth     ;; custom tap
                             :outfields customer-date-of-birth-names)]
    (<- customer-date-of-birth-fields
        (src ?timestamp ?user_id [?dob-year ?dob-month ?dob-day]))))

@sritchie
Copy link
Collaborator Author

sritchie commented Mar 8, 2012

I bet we could do this through something like a Cascalog destructuring protocol. Cascalog could provide implementations for the sequential data structures in Clojure, and users could extend the protocol to Thrift and Protobuf objects. This would be awesome.

I think it's time for me to buckle down and learn a bit more about core.logic so we can start pulling more ideas and syntax from that project. (Cascalog's Datalog is different from the Prolog in core.logic, but I'd like to follow their lead with destructuring, at least.)

I'm not sure when I'll be able to get to this, but I'd be happy to accept a pull request with some initial work. What do you think?

@Quantisan
Copy link
Collaborator

Sounds good to me. Will start looking into it

@pingles
Copy link
Contributor

pingles commented Mar 9, 2012

Nice idea sam, I'll see if Paul and I could take a look some time today. I've not looked through much of Cascalog yet so any pointers to the current destructuring/var binding from tuples would be cool.

Time to dive into core.logic then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants