Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional technical analysis algorithms #9260

Merged
merged 1 commit into from
Apr 24, 2018
Merged

Add additional technical analysis algorithms #9260

merged 1 commit into from
Apr 24, 2018

Conversation

phemmer
Copy link
Contributor

@phemmer phemmer commented Dec 24, 2017

This adds several technical analysis algorithms:

The exponential_moving_average algorithm also has a PR open at #8827, but this implementation has a few differences:

  1. The holdback period is customizable. Defaults to the algorithm's warmup period.
  2. Starts the algorithm using a exponential moving average with scaling alpha instead of a simple average. This results in a faster time in which the algorithm can start producing accurate values. Starting the algorithm with a simple average is still offered as a parameter.

The algorithms themselves are pulled in from an external library of mine. I have a personal closed-source application which uses them. I pulled the algorithms out of the app and dumped them into their own repo with MIT license. However I can instead copy them into the InfluxDB source if desired.

Will write documentation & tests if this clears initial review.

Example usage

Sample data

> select v from test where time >= 0 and time <= 8
name: test
time v
---- -
0    1
1    2
2    3
3    4
4    5
5    4
6    3
7    2
8    1

exponential_moving_average:

> select last(v),exponential_moving_average(last(v),3) from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last exponential_moving_average
---- ---- --------------------------
0    1    
1    2    
2    3    2.333333333333333
3    4    3.1666666666666665
4    5    4.083333333333333
5    4    4.041666666666666
6    3    3.520833333333333
7    2    2.7604166666666665
8    1    1.8802083333333333

> select last(v),exponential_moving_average(last(v),3,0) from test where time >= 0 and time <= 8 group by time(1ns) fill(none)
name: test
time last exponential_moving_average
---- ---- --------------------------
0    1    1
1    2    1.6666666666666665
2    3    2.333333333333333
3    4    3.1666666666666665
4    5    4.083333333333333
5    4    4.041666666666666
6    3    3.520833333333333
7    2    2.7604166666666665
8    1    1.8802083333333333

> select last(v),exponential_moving_average(last(v),3,0,'simple') from test where time >= 0 and time <= 8 group by time(1ns) fill(none)
name: test
time last exponential_moving_average
---- ---- --------------------------
0    1    1
1    2    1.5
2    3    2
3    4    3
4    5    4
5    4    4
6    3    3.5
7    2    2.75
8    1    1.875

double_exponential_moving_average

> select last(v),double_exponential_moving_average(last(v),3) from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last double_exponential_moving_average
---- ---- ---------------------------------
0    1    
1    2    
2    3    2.7777777777777777
3    4    3.8055555555555554
4    5    4.861111111111111
5    4    4.409722222222221
6    3    3.444444444444444
7    2    2.342013888888889
8    1    1.2309027777777777

triple_exponential_moving_average

> select last(v),triple_exponential_moving_average(last(v),3) from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last triple_exponential_moving_average
---- ---- ---------------------------------
0    1    
1    2    
2    3    2.9259259259259256
3    4    3.976851851851852
4    5    5.016203703703704
5    4    4.2824074074074066
6    3    3.158564814814815
7    2    2.0280671296296306
8    1    0.9584780092592595

relative_strength_index

> select last(v),relative_strength_index(last(v),3)from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last relative_strength_index
---- ---- -----------------------
0    1    
1    2    
2    3    
3    4    100
4    5    100
5    4    66.66666666666667
6    3    44.44444444444445
7    2    29.629629629629633
8    1    19.75308641975309

triple_exponential_average

> select last(v),triple_exponential_average(last(v),3)from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last triple_exponential_average
---- ---- --------------------------
0    1    
1    2    
2    3    
3    4    29.360465116279077
4    5    30.224719101123586
5    4    18.46419327006039
6    3    6.591405680990525
7    2    -3.0833618038947574
8    1    -11.478804970476785

kaufmans_efficiency_ratio

> select last(v),kaufmans_efficiency_ratio(last(v),3)from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last kaufmans_efficiency_ratio
---- ---- -------------------------
0    1    
1    2    
2    3    
3    4    1
4    5    1
5    4    0.3333333333333333
6    3    0.3333333333333333
7    2    1
8    1    1

kaufmans_adaptive_moving_average

> select last(v),kaufmans_adaptive_moving_average(last(v),3)from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last kaufmans_adaptive_moving_average
---- ---- --------------------------------
0    1    
1    2    
2    3    
3    4    3.4444444444444446
4    5    4.135802469135803
5    4    4.126248964928667
6    3    4.047019004728395
7    2    3.1372327804046636
8    1    2.1873515446692573

chande_momentum_oscillator

> select last(v),chande_momentum_oscillator(last(v),3) from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last chande_momentum_oscillator
---- ---- --------------------------
0    1    
1    2    
2    3    100
3    4    100
4    5    100
5    4    33.33333333333333
6    3    -33.33333333333333
7    2    -100
8    1    -100

> select last(v),chande_momentum_oscillator(last(v),3,-1,'exponential') from test where time >= 0 and time <= 8 group by time(1ns)
name: test
time last chande_momentum_oscillator
---- ---- --------------------------
0    1    
1    2    
2    3    
3    4    100
4    5    100
5    4    33.33333333333334
6    3    -11.111111111111105
7    2    -40.74074074074073
8    1    -60.493827160493815
Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)
Required only if applicable

You can erase any checkboxes below this note if they are not applicable to your Pull Request.

@ghost ghost added the proposed label Dec 24, 2017
@phemmer
Copy link
Contributor Author

phemmer commented Dec 24, 2017

test failure looks unrelated to changes in PR

@phemmer
Copy link
Contributor Author

phemmer commented Jan 9, 2018

Ping?

@e-dard
Copy link
Contributor

e-dard commented Jan 12, 2018

@phemmer these look really interesting, and I'm sorry for the lack of response. We've all been quite busy for the last couple of weeks 😄.

Pinging @rbetts to make sure that we don't lose track of this PR as it deserves some attention. Thanks again @phemmer for your continued contributions to the project!

@nathanielc
Copy link
Contributor

@phemmer I have given this a quick review and it looks good. I still haven't gone over the actual implementations yet.

As for whether you should bring the implementations into this repo directly I think that would make the most sense, but I don't have a strong opinion. I'll defer to @e-dard and other on that point.

@jsternberg
Copy link
Contributor

As an update, we are triaging this internally as we plan to review it in the future and attempt to merge whatever we can. I just wanted to comment because it appears as if we are doing nothing with this, but we are currently at the end of the 1.5 cycle and this would go into 1.6.

While I haven't looked at it yet, I will say in advance that I am going to ask the code used for the algorithms is moved into the influxdb repository itself either under a subpackage of the query package or somewhere under the tree.

I can't give a timeline of when we're going to get to a formal review at the moment and I apologize for the delay. We really appreciate your contribution and we have not forgotten about it.

@phemmer
Copy link
Contributor Author

phemmer commented Feb 7, 2018

That's fine. I'll work on getting the code moved.
One question I have is how to handle the copyright. For all external dependencies the code is stored in https://github.com/influxdata/influxdb/blob/master/LICENSE_OF_DEPENDENCIES.md. But in this case it's no longer a dependency. How do you want this addressed?

@jsternberg
Copy link
Contributor

Since the code will be moved into the repository itself, I think it would be covered under the CLA that you signed so I don't think it should be a legal issue.

@phemmer
Copy link
Contributor Author

phemmer commented Feb 7, 2018

The CLA just says that I authorize you to take the code I contribute and modify it (which is the part I'm assuming you're referring to). Which is valid. Just that the code contributed is coming from an external project with MIT license which requires that license to be referenced.
It could be as simple as just dropping a LICENSE file in the directory with an additional copyright line on top attributing it to Errplane Inc.

@jsternberg
Copy link
Contributor

jsternberg commented Feb 7, 2018

Is the referenced project yours? If it is, you own the copyright and I think the CLA would apply even without the MIT license since you're giving us permission to distribute and modify the code. It looks like the referenced library is coming from your GitHub.

If it's coming from a project that you don't own the full copyright to then we'll likely have to copy the copyright headers and isolate it into a subpackage within query. I might suggest copying it into a subpackage anyway since it'll probably be easier to manage the code than to just dump it into the query package.

I'll ask internally just to double check. We're not going to get to reviewing/merging it until after we start the 1.6 cycle, so updating the code to move it won't end up with this merged this week. I'm going to add this to the 1.6 milestone for now so we can come back to this when 1.6 starts.

@phemmer
Copy link
Contributor Author

phemmer commented Feb 7, 2018

It is a project I fully own, but the CLA doesn't grant the ability to strip the copyright of contributed code. If this were the case then it would essentially be saying that the code in InfluxDB is an original work and was not derived from an external source, and that my repo would be in copyright violation since it claims original copyright and does not claim to be a work derived from InfluxDB.

@jsternberg
Copy link
Contributor

I'll just double check with someone since I'm not a lawyer.

@jsternberg jsternberg added this to the 1.6.0 milestone Feb 7, 2018
@rbetts
Copy link
Contributor

rbetts commented Feb 8, 2018

@phemmer I don't fully understand your concern about licensing. The CLA allows us to modify, redistribute, and sub-license contributed code. You maintain full copyright control and can maintain, evolve, or contribute the code elsewhere as you wish.

Can you re-state what you are asking us to do with respect to copyright so that I can be sure I understand your request accurately?

@rbetts
Copy link
Contributor

rbetts commented Feb 8, 2018

@phemmer I was talking to @jsternberg some more about this and I think I might understand your question. When you author code, you can license that code to different people under different terms. For example, if you write a program, you can license that program to some users under MIT terms, to other users under a commercial license, and I suppose to yet more users under GPL terms if you really wanted.

When you contribute code you wrote to InfluxData - you contribute the code to us under the licensing terms of the CLA (contributor license agreement). That license doesn't restrict your copyright at all -- you are still free to re-license that code to others under the terms of your choice.

Does that help at all? You're such a great contributor and community member - we really appreciate the thought you give to InfluxDB. Thank you.

@phemmer
Copy link
Contributor Author

phemmer commented Feb 8, 2018

Ok, let me try putting it as a scenario:
If the code is copied into the influxdb repo without the LICENSE file stating that I was the original author, then the effective license on the code becomes the repo LICENSE, which attributes copyright to "Errplane Inc." There is no reference to where the code came from, and if you wanted to you could claim that you are the author of the code and that I copied it.

Not that I expect this will ever happen.

@pauldix
Copy link
Member

pauldix commented Feb 9, 2018

@phemmer any code contributed to this repo by external contributors (not InfluxData employees) is covered by the CLA. It doesn't give us copyright to the code, but it does give us a non-exclusive perpetual license to do whatever we want with it after it is contributed. That could be using it in this project, using it in a future closed source project of ours. The copyright notice is for the project as a whole and all of the code that our people contribute.

Since you're the author of the original code, you can do whatever you want with it. Meaning, you don't have to drag a license over from something else. Individuals and companies dual license code all the time. Most companies that have GPL or AGPL code offer it both under those terms or under a commercial license. You have the original in your closed source project, but since you own it you can contribute it to this repo without any license or copyright notice. You don't need to relicense it because the CLA covers that (as long as you're the original author and have ownership of the code). If you wrote this code for your employer in their closed source project then it would be more problematic.

Your authorship of the code you contribute is documented from the commit history in the repo and our mentions and thanks in the changelog.

I hope that makes sense. Please let me know if I'm misunderstanding your concern or what you'd like to have happen. We'd love to include this in Influx and appreciate all of your contributions to our projects.

@timhallinflux
Copy link
Contributor

@phemmer -- wanted to follow-up and confirm that your question was resolved by @pauldix 's response. If so, our plan is to get this integrated with our 1.6 release. Thank you!...we appreciate all of your efforts and contributions across the stack.

@phemmer
Copy link
Contributor Author

phemmer commented Apr 2, 2018

Did some research and found one stackexchange question which summarizes the issue fairly well: https://softwareengineering.stackexchange.com/q/188966/106782 . Just substitute "employer" with "Errplane, Inc".

Bottom line, as long as I keep my repo around which shows proof that I'm the original author of the code, then there's no question regarding my ability to do whatever I want to the code in the future (use, modify, re-license, sell, etc).

Now, that that's out of the way, I'll work on copying the code over. Probably won't get to it until next weekend at the earliest though.

@jsternberg
Copy link
Contributor

@phemmer hi, have you had any time to make the changes requested? If you need any help, I can move the code from your library directly into influxdb and make sure that I give you credit. Before it is merged, I will make sure that we have your approval and that any credit is given to you for your code.

Thanks for your contribution. We want to try and advance this so it gets into the 1.6 cycle if at all possible.

@timhallinflux
Copy link
Contributor

We are looking to cut the first release candidate in about a week.

@jsternberg
Copy link
Contributor

@phemmer I did the work to adapt your library with no (extra) dependencies in #9733. I don't really care which PR we work in, but it had enough changes I didn't want to clobber your personal branch without permission.

I'm going to go do a review over there since most of the work is yours. We'll need test cases on everything before merging, but we're much closer to merging.

query/compile.go Outdated
@@ -550,6 +556,161 @@ func (c *compiledField) compileMovingAverage(args []influxql.Expr) error {
}
}

func (c *compiledField) compileExponentialMovingAverage(name string, args []influxql.Expr) error {
if got := len(args) - 1; got < 1 || got > 3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not use len(args) - 1 as the error message will be very confusing.

This applies to every new function.

query/compile.go Outdated
switch arg3 := args[3].(type) {
case *influxql.StringLiteral:
switch arg3.Val {
case "exponential_moving_average":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, you can put the two accepted strings on the same line.

case "exponential_moving_average", "average":

Can we also name these "simple" and "exponential"? I don't like the length of exponential moving average and I would like it if we used the prefix for the warmup type. I notice that EMA stands for "exponential moving average" and SMA stands for "simple moving average" so labeling the field as "this is the moving average type" would mean that we don't need to use the longer name and I think it would reduce confusion.

I understand your concern so please push back on this if you feel that the longer name is very important.

}

// newChandeMomentumOscillatorIterator returns an iterator for operating on a triple_exponential_moving_average() call.
func newChandeMomentumOscillatorIterator(input Iterator, n int, nHold int, warmupType string, opt IteratorOptions) (Iterator, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like it if warmup type was passed in as the type rather than a string and it was parsed before this, but I also don't care enough to block the PR since I can change this myself. I'm just making this comment as a note that I'll probably change the argument for this at some point.

Copy link
Contributor Author

@phemmer phemmer Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, but there's one somewhat messy situation: select.go and functions.go.
Basically there's 2 ways of doing a CMO. One method uses a smoothing technique, and the other uses no smoothing. In order to handle this, I had to change gota.WarmEMA to int8 so I could set it to -1 in the case of "none" so that we could differentiate the two types of CMO.

The alternative is to name these functions completely differently. E.G. chande_momentum_oscillator and chande_momentum_oscillator_smoothed.

@jsternberg
Copy link
Contributor

Just a few simple comments. If you make all of the changes that I have suggested (minus the string comment which you can do or not do, I don't really care), then please squash this into a single commit with the necessary commit message detail and I'll have this merged by the end of today.

This adds numerous technical analysis algorithms:

* exponential_moving_average
* double_exponential_moving_average
* triple_exponential_moving_average
* relative_strength_index
* triple_exponential_average
* kaufmans_efficiency_ratio (commonly referred to as just "Efficiency Ratio")
* kaufmans_adaptive_moving_average
* chande_momentum_oscillator (both the common 'smoothed' version, and the ta-lib version)
@phemmer
Copy link
Contributor Author

phemmer commented Apr 24, 2018

Comments address & squashed. See note about CMO/CMOS: #9260 (comment)

@jsternberg
Copy link
Contributor

So I think the proper way to do what you said with the warmup type is to add a default type that acts as the zero value instead of using -1. But, I'm going to consider the code at this moment complete so this is ready to merge if we need to.

I'll put together unit tests and get them merged either today or later this week. I'm going to use the same inputs/outputs that the sub-library uses so this will just be a check to make sure they use them correctly.

@jsternberg jsternberg merged commit ba8dabb into influxdata:master Apr 24, 2018
@ghost ghost removed the proposed label Apr 24, 2018
@phemmer
Copy link
Contributor Author

phemmer commented Apr 24, 2018

So I think the proper way to do what you said with the warmup type is to add a default type that acts as the zero value instead of using -1

I thought of this, and decided against it. If there were a use for a zero-value warmup type, I might be inclined. But within the gota code, there isn't one. The only use is outside the code, in the query parser, to decide which algorithm to invoke. It's more an abuse of the parameter than anything. A zero-value would still be an abuse as it it only means something to the parser, not gota.

@phemmer phemmer deleted the ta_algs branch April 24, 2018 16:12
@phemmer
Copy link
Contributor Author

phemmer commented May 15, 2018

@jsternberg I was talking with @sanderson the other day about these functions, and naming the TRIX function triple_exponential average is really bugging me. I think the name is too likely to be confused with triple_exponential_moving_average. The problem is that everyone refers to it as "TRIX", and there's no real consensus on the long name. I'm thinking it might be a good idea to rename it before a full release and we're stuck with it. Maybe triple_exponential_change_rate, or triple_exponential_oscillator

@jsternberg
Copy link
Contributor

@phemmer we're not really sure what that means because we're not very familiar with the functions. Is this algorithm implemented in any other database for us to copy from?

Whatever you think is the better name, we'll likely use. Just make a PR for the name you think is most appropriate and I'll review it before we release 1.6.

@phemmer
Copy link
Contributor Author

phemmer commented May 16, 2018

The algorithm is implemented by lots of other tools, but they literally call it "trix". However InfluxDB's function names are all long names, and don't use abbreviations, so I'm trying to be consistent in that regard.

I'll ponder a name and put up a PR today.

@jsternberg
Copy link
Contributor

If the function doesn't really have a long name, it might be better for us to use short names for all of the algorithms where short names are the "common name". It's likely best to choose the function name that would be the first to pop into the head of a person who is reasonably familiar with the problem space, especially with these kinds of technical analysis algorithms that are probably frequently used by people in statistics, but not commonly known outside of that.

We can always document the full name in the documentation with a link to the entry in Wikipedia or somewhere else.

@sanderson
Copy link
Contributor

@phemmer Any chance you could give me a quick rundown of required and optional arguments for each of these functions?

@phemmer
Copy link
Contributor Author

phemmer commented May 18, 2018

Provided the answer over on influxdata/docs.influxdata.com-ARCHIVE#1553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants