# Matching #136

Merged
merged 17 commits into from Oct 17, 2014

## Conversation

Projects
None yet
2 participants
Contributor

### AlexisEidelman commented Aug 31, 2014

 No description provided.

### AlexisEidelman added some commits Apr 8, 2014

``` more matching functions ```
``` 37c1982 ```
``` Ranking Matching ```
``` 6dcec12 ```

### gdementen commented on `src/matching.py` in `6dcec12`Apr 9, 2014

 the test is not needed IMO since it is a new function
Member Author

 OK

### gdementen commented on `src/matching.py` in `6dcec12`Apr 9, 2014

 This "takes" the individuals with the lowest scores. Is that intended? Taking those with the highest scores seems more logical to me.

### gdementen replied Apr 9, 2014

 Forget that... These are ranks not scores, so it is pretty logical as is.
Member Author

### AlexisEidelman replied Apr 9, 2014

 In fact, both can be right, we just have to chose one and define it as a norm. For example we can rank people from the youngest to the oldest or from the oldest to the youngest. It's not obvious wether "rank1 = age" should set the youngest or the oldest in first position. That raise the idea of having a 'decreasing' boolean option, as in sorting function in general.

### gdementen replied Apr 9, 2014

 True, it would be better to have either score here, or change other methods to have rank instead of score. This is an arbitrary choice indeed. For me, the naming "implicitly" sets the default order: taking the highest "score" and smallest "rank" seem logical, but using the same vocabulary everywhere seems like a good idea. Having a "reverse" argument like in the sorted() Python builtin wouldn't hurt...

### gdementen replied Apr 9, 2014

 Hmm, changing other methods to have rank instead of score is not really an option because in most cases, the score is actually a probability, which means it is much more practical to take the highest first.

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 typo in class name: Neighbour !

### gdementen replied Apr 9, 2014

 Unless you have some other matching function in mind that would inherit directly from Score_Matching, I do not think it makes sense to separate Score_Matching and Closest_Neighbourg.

### gdementen replied Apr 9, 2014

 and by the way, I am a bit puzzled why you named it Closest_Neighbour, is it a standard name for the technique in the literature? I fail to see what is the relation to a "closest neighbour" algorithm...
Member Author

### AlexisEidelman replied Apr 9, 2014

 Sorry for the typo. :) Yes I've something else in mind. We can do everything we want with what I called closest neighbour algorithm it's not the lonely solution. There is some other ways of matching (see for example the discussion on liam-dev google groups "Probability-weighted option for order of decreasing difficulty (ODD) matching in LIAM2"). As I told you in past I'm thinking of implemented an optimal matching method where the goal is minimazing for example the sum of distance of all pairs. I wasn't really confident with the name I wrote. First, nearest neighbour is better. But I didn't find a better name. I wanted to explicit that the method is based on a list and an iteration. Why did I reject Ordered_Matching ? Because there can be some confusion with Ranking matching. Sequential_matching ?

### gdementen replied Apr 9, 2014

 SequentialMatching sounds good to me

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 missing "raise" statement

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 the orderby field and argument (and the related traverse and collect_variables methods) should probably be moved to Classic_CN_Matching since Closest_Neighbourg is apparently meant as a "generic"/abstract class, it should be general.
Member Author

### AlexisEidelman replied Apr 9, 2014

 I did that because both Classic_CN_Matching and ODD_CN_Matching share these methods so I thought it was general in that way but I may not use well the heritance in python.

### gdementen replied Apr 9, 2014

 Well, that's the whole point of my comment: ODD_CN_Matching does not use the orderby field, nor needs the overridden traverse and collect_variables methods. The only common bit is the evaluate method which can stay where it is.
Member Author

### AlexisEidelman replied Apr 9, 2014

 Yes, you're right. It makes me realize that ODD_CN_Matching is in fact a Classic_CN_Matching where the orderby is calculated (as you told below). So, now I support the "argument option" more than two function. I'll change that.

### gdementen commented on `src/matching.py` in `6dcec12`Apr 9, 2014

 uh? AFAIK, this is an invalid import statement

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 I wonder if it wouldn't be better to have both under the same "matching" function with an "algorithm" argument, akin to the "method" argument you did for "sidewalk" alignment. That is what I would have done anyway, but I am unsure it is better. If you have an opinion about this, please share it. ;-)
Member Author

### AlexisEidelman replied Apr 9, 2014

 In general, I think it more user-friendly to have option more than different function name when the deep goal of the function is the same. But I did it that way because of the "difficulty" argument which is useless in Classic_CN_Matching. So I preferred to explicit it rather than having a argument for Classic_CN_Matching with no interest. That's my opinion but I think there are good reasons for both options but no decisive one.

### gdementen replied Apr 9, 2014

 It would make sense to rename difficulty to orderby, as it is really what it is anyway. We would then have two "predefined" string values for it, or the most general way: the user passes the orderby vector directly.
Member Author

### AlexisEidelman replied Apr 9, 2014

 Agree, cf above.

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 invalid import?
Member Author

### AlexisEidelman replied Apr 9, 2014

 It works for me. What do you mean ?

### gdementen replied Apr 9, 2014

 ah well, it works indeed, but you should rather use "import itertools"

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 missing raise

### gdementen commented on `6dcec12`Apr 9, 2014

 since "Ranking_Matching" is not present in "functions", it is not available in models!
Member Author

### AlexisEidelman replied Apr 9, 2014

 True but as I didn't tested it, I don't consider it as finished :) That why I didn't push it to liam2/liam2. But your comments are welcomed right now !

### gdementen replied Apr 9, 2014

 You did send a pull request though... Maybe you didn't intent to have your PR "updated" when you committed your second change to TaxIPP-Life:master but that's how github works, and that's why it is usually recommended to create a specific branch per PR. By the way, I was about to write that your PR needs tests before I merge it ;-) … On Wed, Apr 9, 2014 at 11:44 AM, AlexisEidelman ***@***.***: True but as I didn't tested it, I don't consider it as finished :) That why I didn't push it to liam2/liam2. But your comments are welcomed right now ! ## Reply to this email directly or view it on GitHubhttps://github.com/TaxIPP-Life/liam2/commit/6dcec1276f3b64b990192d947d5a646cc3867743#commitcomment-5953630 .

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 'square distance to the other mean' looks like a misnomer: there is nothing squared about it, see above for some thoughts.

### gdementen commented on `src/matching.py` in `37c1982`Apr 9, 2014

 Those two seem too long to be practical to type.

### gdementen replied Apr 9, 2014

 "sum of squares" is actually "Sum of Squared Difference to Mean" => we could call it "SSDM". But since this is very close to (and in fact would result in the same results than) the variance (over all the fields/variables of the expression), we could call it something like "var" or "fieldsvar" or even "varvar" ;-) "square distance to the other mean" is actually "Distance to Other Mean" => call it "DOM"?

### AlexisEidelman added some commits Apr 13, 2014

``` matching after Gaëtan's comments ```
``` ca41a95 ```
``` matching: doc and changelog ```
```No test because, there is nothing to do except adding other matching
options in simulation.yml```
``` 2fe472b ```
``` add arguments to the function in loop_wh_progress ```
``` 7efd233 ```
``` add a pool_size parameter in matching function ```
```Oops, there is something wrong with change/version_0_8_2.rst, I'll fix
it in an incoming commit.```
``` 32af7f8 ```
``` matching changes ```
``` 078e584 ```
``` change rank for score ```
``` f3222e0 ```
``` step for more general MatchingClass ```
``` 2cd0b0e ```
``` new organisation of matching functions ```
``` 1c2d3b9 ```
``` small correction ```
``` e0b00a8 ```
``` oups ```
``` 711d227 ```
``` test for matching functions ```
``` 405b105 ```
``` small improvment in doc about matching ```
``` 34d7c66 ```
``` small corrections ```
``` 5d59937 ```

### gdementen reviewed Sep 1, 2014

 from expr import expr_eval, collect_variables, traverse_expr from exprbases import EvaluableExpression from context import context_length, context_subset, context_delete from utils import loop_wh_progress # TODO: a quoi sert le collect variable ?

#### gdementen Sep 1, 2014

Member

for various reasons, including testing whether all columns/variables needed for an expressions are present in the context. However, in the devel branch, it is not needed anymore to define one for each function you add as there is a generic one which works on all functions.

### gdementen reviewed Sep 1, 2014

 *generic setup* :: matching(set1filter=boolean_expr,

#### gdementen Sep 1, 2014

Member

should be rank_matching

### gdementen reviewed Sep 1, 2014

 def __init__(self, set1filter, set2filter, score, orderby): class ScoreMatching(EvaluableExpression): ''' General framework for a Matching based on score That kind of matching doesn't

Member

doesn't what?

Member

### gdementen commented Sep 1, 2014

 Thanks for following up on this! The tests you added are a good start: they ensure the functions are actually callable and do not crash... But I was hoping for "real" tests with asserts which check whether the results are correct/meaningful or not. The reason I insist so much that you write those, instead of writing them myself is that I am unsure what it should return and understanding the functions deeply enough takes more time than I have available. I know that the normal "matching" function does not currently have those. This is because that part dates back to when the asserts were not implemented. All new code should have proper tests. That is the only rule I will be inflexible about. My other comments are all informational only... PS: Why didn't you update your previous PR?

### AlexisEidelman added some commits Sep 1, 2014

``` more documentation ```
``` 2eafc60 ```
``` matching test ```
``` 1ad7679 ```

### gdementen added a commit that referenced this pull request Oct 17, 2014

``` merged PR #136 (more matching functions) ```
``` 89c7024 ```
Member

### gdementen commented Oct 17, 2014

 this is in 89c7024 ! Sorry it took so long, and thanks for the great contribution!
Member

### gdementen commented Oct 17, 2014

 as a follow-up, it would be nice to tackle #145