Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all value types #1580

Merged
merged 19 commits into from
Feb 17, 2015
Merged

Support all value types #1580

merged 19 commits into from
Feb 17, 2015

Conversation

otoolep
Copy link
Contributor

@otoolep otoolep commented Feb 12, 2015

No description provided.

@otoolep
Copy link
Contributor Author

otoolep commented Feb 12, 2015

WIP.

binary.BigEndian.PutUint64(buf[1:9], math.Float64bits(v))
case bool:
buf = make([]byte, 2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbjohnson -- from implementing this function, it seems to me we need to code more information in a (field ID, value) pair -- we must also encode the type. Can you confirm? unmarshalValues just assumes that it's a float type in the database, but that will no longer be true with these changes.

If I am correct, a single byte after the field ID would be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have to encode the type as that's kept in the metastore. Field id, followed by value (if a fixed size type) or in the case of bytes and strings, field id followed by length, followed by value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why is this code casting it to a float before sending it up? Is that redundant? It did occur to me that the info is in the metastore, and we could decode at a higher level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear -- the conversion is happening in unmarshalValues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like I could preform the type conversion using the metastore around here:

https://github.com/influxdb/influxdb/blob/master/server.go#L1839

so the low-level conversion to float (and support for other types at this low-level as implied by the comments) may be unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah -- unless your distinction is just between numeric (floats) with a fixed length and variable length like strings? I interpreted the comments to mean we needed a breakdown by all supported types at this low-level of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ReadSeries function is actually cruft left over from stuff that was in there for debugging purposes. The real way to read raw series data is through the query engine, which now actually works.

There's some other spot in the code where you can get a value based on the field ID. At some point someone needs to cast this to the correct data type. That should be done in the query engine before it gets yielded to the mappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah -- OK. I do recall @benbjohnson remarking that ReadSeries was debug. If casting strictly happens at the higher level, yeah, then at this low-level it really is just a bunch of bytes.

Let me take another pass.

@otoolep
Copy link
Contributor Author

otoolep commented Feb 12, 2015

Just for the record, unmarshalValues is also part of the production code, and is called while iterating over the shard (I had assumed after the discussion above that it was actually just used by ReadSeries). So right now all type "conversion" is being done at the low-level, and no reference is made to the Metastore when performing the conversion, that I can see.

@otoolep
Copy link
Contributor Author

otoolep commented Feb 12, 2015

Obviously using a single source of truth for field types would be best, of course.

@otoolep otoolep force-pushed the support_all_value_types branch 3 times, most recently from c99b16a to 335bad2 Compare February 13, 2015 01:21
@otoolep
Copy link
Contributor Author

otoolep commented Feb 13, 2015

@pauldix @benbjohnson -- please review these latest changes and let me know what you think. This is still WIP, but I think it's close. I'd like some feedback at this point. Points to note:

  • I basically just lifted the "marshal" and "unmarshal" code from shard.go. There are some potential optimizations there -- and we should do them -- but wanted to get this working first.
  • I just added a new interface to the cursor, such that it expects a field decoder. I think this is fine, but for now I just slammed the Measurement object in there. Doesn't feel great.
  • @benbjohnson suggested a dedicated field codec, with a static of set of fields, presumably to avoid races. But we seem to have the same race with mapValues and there is no locking there. Is this code special, or we just haven't fixed up mapValues yet?
  • All integers are still stored internally as floats.
  • While the build is green, I haven't actually added any tests for booleans and strings yet.

@pauldix
Copy link
Member

pauldix commented Feb 13, 2015

We need to also support raw bytes as a type.

On the mapValues part of things, almost all the aggregate functions look at only one field. It would be nice to have a function that takes this field ID and returns the raw value without marshaling it into a map. We should be able to just cast the raw bytes from the DB into the type.

This would save us a ton when aggregating a through a bunch of data by not allocating a bunch of maps.

@otoolep
Copy link
Contributor Author

otoolep commented Feb 13, 2015

How would a user insert raw bytes?

@benbjohnson
Copy link
Contributor

Materializing into a map was just to get it working. It definitely need to get optimized. As for the mapValues(), the lack of locking is a bug. Queries can potentially run for a while so we need to be able to copy the field metadata to the codec to be used during query execution.

@pauldix
Copy link
Member

pauldix commented Feb 13, 2015

I'm looking at some other stuff on the query side in terms of raw data queries. I think it would be useful to have the iterators yield the raw bytes to the map functions. Then have the iterator have a method that will pull a field value from those bytes.

This will be useful for queries that need to access multiple fields in the map portion of the query.

@otoolep otoolep force-pushed the support_all_value_types branch 2 times, most recently from b3cdf28 to 7ace67a Compare February 17, 2015 00:23
@otoolep
Copy link
Contributor Author

otoolep commented Feb 17, 2015

Field codec split into its own type, which allows the locking to be decoupled from the server lock. I still need to add support for other types, and have questions about interactions with the shard iterators.

@otoolep
Copy link
Contributor Author

otoolep commented Feb 17, 2015

Unit tests added for string and boolean types.

@pauldix
Copy link
Member

pauldix commented Feb 17, 2015

LGTM. Drop it like it's hot. Where's the Snoop Dogg emoji when you need it?

@otoolep
Copy link
Contributor Author

otoolep commented Feb 17, 2015

Thanks for the review @pauldix -- merging now.

otoolep added a commit that referenced this pull request Feb 17, 2015
@otoolep otoolep merged commit cd87583 into master Feb 17, 2015
@otoolep otoolep deleted the support_all_value_types branch February 17, 2015 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants