Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making custom (ternary) util.classification #49

Closed
mtb-za opened this issue Feb 17, 2021 · 10 comments · Fixed by #60
Closed

Making custom (ternary) util.classification #49

mtb-za opened this issue Feb 17, 2021 · 10 comments · Fixed by #60
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@mtb-za
Copy link
Contributor

mtb-za commented Feb 17, 2021

Is your feature request related to a problem? Please describe.
Is it possible to create a ternary classification (such as the USDA's sand/silt/clay, or QAP(F) one)? Looking at the source code, it seems to only have the x and y axes. Being able to add such classifier polygons to a plot would be quite nice for visualisation, as well as allowing for 'easy' classification of data that is often on a ternary plot.

Describe alternatives you've considered
This might be possible using some of the sklearn bits, but the current TAS example is very nice and clean, and it would be nice to be able to do something similar.

I started with the following to create the USDA soil classification scheme (at the bottom of this page), modelled on the tas .json file.

usda_poly = {"name":"USDA",
 "axes":{"x":"Clay", "y":"Sand", "z":"Silt"},
 "fields":
     # just doing the one field as a test.
     {'clay':{"name":'clay'},"poly":[[100,0,0], [55,45,0], [40,45,15], [40,20,40], [60,0,40]]}
}

This should be called using something like usda = PolygonClassifier(name=usda_poly['name'], axes=usda_poly['axes'], fields=usda_poly['fields']['clay']), from my understanding of the PolygonClassifier constructor, but obviously this does not work, because I have three axes, not two.

This would possibly link quite nicely with #26?

@mtb-za mtb-za added the enhancement New feature or request label Feb 17, 2021
@morganjwilliams
Copy link
Owner

Hey @mtb-za! I think this would be a good feature to add, it's currently not possible off-the-shelf based on what's there at the moment.

There's a little bit of complexity in it, as the current implementation is based on some matplotlib 2D-only components (as you note). To make this possible, there are a few options:

  1. re-implement matplotlib.patches.Polygon and make a Polygon (or even TernaryPolygon) which is able to be added to a ternary axis. Polygons haven't been implemented explicitly for mpltern (which pyrolite uses for ternary plots), but there is the ability to 'fill within point bounds'. If we did implement something, I'd suspect Yuji (the author of mpltern) would b interested in integrating it there.
  2. do some internal ternary-to-xy conversions and use 2D polygons which are anchored in axes coordinates rather than data coordinates. This might limit the use of some of the available mpltern features (axes re-limiting, for example).
  3. sacrifice some generalisability and added functionality which comes with matplotlib out of the box, and instead just plot some ternary lines (maybe together with a ternary fill, after dealing with formatting/configuration separately) - in which case it's relatively straightforward. As this would be in data coordinates, I think the relimiting etc should still work as expected.

My preference would generally be for the first, but I'll need to look into what dots need to be connected in order to make it work. If it's going to be fairly involved, we could implement the last idea in the interim.

@morganjwilliams
Copy link
Owner

morganjwilliams commented Jun 28, 2021

Hey @mtb-za! I've added an implementation for these ternary classification diagrams and added the USDA soil texture classification as an example. There's an issue with the docs builds at the moment but will also look to add an example diagram there for this, and think about adding a few more (e.g. QAP, potentially with a second upside-down axis for AFP..).

For the moment this uses the second solution above; there will likely be a few limitations but it's working for now and allows some generalization of these diagrams regardless of whether they're x-y or ternary.

@mtb-za
Copy link
Contributor Author

mtb-za commented Jun 28, 2021

Sounds really promising! I am happy to add a couple of standard examples (QAP, AFP, maybe the USDA soil plot although that is not geochemical so let me know if you would rather not have it) to the docs once I know how the basics works. Is it pretty much the same process as for the 2D case?

@morganjwilliams
Copy link
Owner

morganjwilliams commented Jun 28, 2021

I added the USDA soil one as an example on develop - you can check out the config file here or the classifier here (note how it's much less work than TAS!), but it's basically the same as the 2D case with a few modifications to axes labels etc.

As for the docs - there's a third party package issue for the moment; it might take some debugging to sort out (or it might sort out when the next version of that package is released..).

@morganjwilliams
Copy link
Owner

@mtb-za I've addressed the build problem with the docs for now; let me know if you'd like to add some classifiers and/or examples before I release v0.3.1.

@mtb-za
Copy link
Contributor Author

mtb-za commented Jul 7, 2021

Just noticed that you already implemented the USDA one. 😅

It looks like it is much less work because you are not adding the fields to the plot? The add_to_axes() method is the bulk of the TAS code.

Thanks for this though, I am sure that it will be very useful.

I think it is worth adding the QAP ones (both for volcanic and plutonic rocks). I will look at doing a PR with those tomorrow.

@mtb-za
Copy link
Contributor Author

mtb-za commented Jul 8, 2021

Been working on this, and I have a possible concern that should not hold up getting the functionality released if you are close, because it might be a pain to fix properly.

The QAP diagram has both the left and right sides of the triangle going up to 100 at the top. This is not a standard ternary plot though.

cf, amongst many examples:
and

This can be compared to the one that I have started creating:
Figure_1

I think that this is different enough that it will not actually plot things correctly given how I am currently making the boundaries, unless I have missed an option to create a base plot like this?

@morganjwilliams
Copy link
Owner

Hey @mtb-za! The notation for the QAP plot is nonstandard and likely arranged this way to some degree because of the common QAPF format, but doesn't affect the actual mineralogical compositions - so overall it should be fine. The main thing to watch out for is that the lines projecting from the quartz axis represent plag/alkali-feldspar ratios, and aren't constant proportions of either - so the line on the left from (0, 0.9, 0.1) to quartz at (1.0, 0, 0) represents A/P = 9, and passes through (0.6, 0.36, 0.04) rather than (0.6, 0.3, 0.1). The only other issue is that the left axis is alkali feldspar rather than Anorthosite. Hope that makes sense, let me know if you run into any other issues! The past week has been pretty busy so the release is a little while off yet.

@mtb-za
Copy link
Contributor Author

mtb-za commented Jul 21, 2021

The only other issue is that the left axis is alkali feldspar rather than Anorthosite.

Whoops! Not sure how I did that....

I am taking a look at the other stuff now. Certainly more complex than I anticipated, but I think that I have the hang of it now.

Do you have a preference for how to handle the granite/syenogranite/monzogranite field? I assume (but have yet to test) that fields can not overlap, but syenogranite and monzogranite combine to be the granite area.

@morganjwilliams morganjwilliams linked a pull request Jul 22, 2021 that will close this issue
8 tasks
@morganjwilliams
Copy link
Owner

Thanks for starting a PR - I'll add some comments over there.

For the granite/syenogranite/monzogranite field, you're correct that it won't work as expected if you have them all individually (I think if granite is listed last, nothing would get classified as granite..). There's a few options for dealing with it - from only including the more specific terms through to doing something like TAS with two sets of labels or having a similar 'specificity' toggle (with each polygon potentially having multiple labels). These solutions simply swap out labels, not polygons - if for some reason you wanted to use different sets of polygons depending on the situation then it'd be a little more complicated - likely involving adding tags to polygons and modifying how the diagram is constructed based on these tags; this would ensure the visualization side of things was flexible and accurate. From the classification side of things, I don't see significant downsides to just having the more specific labels - you could just postprocess the classification to aggregate these two, with e.g.:

df.loc[df.QAP.isin(["syenogranite", "monzogranite"]), "QAP"] = "granite"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants