Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alter MODE to work on all datatypes and return multiple values #521

Closed
wants to merge 3 commits into from

Conversation

richthegeek
Copy link
Contributor

This alteration allows MODE to be used to get, for example, the "10 most common strings" in a dataset.

I've never written in Go before, so if I've made some silly choices let me know!

I assume it needs some tests attached, although I'm not sure how to write these.

…rgument for how many modal values it will return
@richthegeek
Copy link
Contributor Author

Old function signature:

MODE(column{int, float})

New function signature:
MODE(column, [size = 1])

@jvshahid jvshahid added this to the 0.6.2 milestone May 8, 2014
@jvshahid jvshahid self-assigned this May 8, 2014
@maxd
Copy link

maxd commented May 9, 2014

I want to clarify how the size option works.

At present MODE function return only most frequent values. For example for dataset:

count of city_id city_id
1439 498817
1439 524901
1416 472459
1416 500096
1416 520555
1416 1486209
1416 1497543
1355 2013348
1344 542374
1338 542420

MODE function return only city_ids where count = 1493 i.e. 498817 and 524901.

Which values will be returned is I set size = 10 for this dataset?

Thanks

@richthegeek
Copy link
Contributor Author

It will return the 10 most frequent values, in order of frequency. So for this dataset, it will return as it was supplied (it's already sorted by count). It does this by taking items from the list in descending order until it has size values.

I have infact just realised that this code changes how mode works, so if you called mode(city_id) now it would default to size=1 and thus return only 498817. I've commented on the commit with the fixes required, i'll commit those in a minute.

I think allowing any data-type in mode is a good thing, but perhaps the size option is better suited to a top(column, n) or similarly-named function?

@maxd
Copy link

maxd commented May 9, 2014

I read article about MODE and I think that the size option is appropriate only for top(column, size) function.

@jvshahid What do you think?

@richthegeek
Copy link
Contributor Author

I've implemented the suggested change on a branch in my own repository (richthegeek/influxdb@dbaaacf55b)

This still allows mode(column) to work on all column types, whilst top(column, size = 1) is also available.

There's a lot of duplicated code in there, but it works.

@jvshahid jvshahid modified the milestones: Next release, 0.6.2 May 12, 2014
@jvshahid jvshahid modified the milestones: 0.7.2, Next release May 31, 2014
@jvshahid jvshahid closed this in cc5a5e8 May 31, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants