Predicted .tif file assigns a class to zero #193

willieseun · 2022-09-05T09:07:02Z

I tried plotting the resulting .tif file in ArcMap and first, the predicted classes were not up to the number of classes in my training data. Second, I wasn't able to build unique values for the raster and the black bands around the raster are not disappearing when I set to display nodata value with no colour.

jgrss · 2022-09-14T03:59:49Z

@mmann1123 I didn't intentionally close this. Do we need to reopen?

willieseun · 2022-09-14T08:35:38Z

Please reopen. It even brings additional errors now.

jgrss · 2022-09-14T14:22:10Z

@willieseun can you paste a code snippet to show us how you created your predictions?

mmann1123 · 2022-09-14T14:40:07Z

I think in the latest ml updates I resolved the issue of dropping one of the prediction classes. Missing values I'm not sure.

…

On Wed, Sep 14, 2022, 10:22 AM Jordan Graesser ***@***.***> wrote: @willieseun <https://github.com/willieseun> can you paste a code snippet to show us how you created your predictions? — Reply to this email directly, view it on GitHub <#193 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHR6VAFIP4535TSVFL5KELV6HNR3ANCNFSM6AAAAAAQEZGFME> . You are receiving this because you were mentioned.Message ID: ***@***.***>

willieseun · 2022-09-14T15:00:53Z

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
from glob import glob
import rioxarray
import os
import re
le = LabelEncoder()

psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']

labels = gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
labels['Classvalue'] = le.fit(labels.Classname).transform(labels.Classname)
print(labels)
predictors = ['something.tif']
with gw.open(predictors, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
pl = Pipeline([ ('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100, max_features=9))])
fig, ax = plt.subplots(dpi=200,figsize=(5,5))

        X, Xy, clf = fit(src, pl, labels, col="Classvalue")
	print('Starting to predict')
	y = predict(src, X, clf)
	y.plot(robust=True, ax=ax)
	y.sel(time="t1").gw.to_raster("wom_RF.tif")
	plt.tight_layout(pad=1)
	plt.show()

willieseun · 2022-09-14T15:06:42Z

I hope you get it. It is not rendering it well enough.

willieseun · 2022-09-14T16:12:01Z

Just to add, because I don't want to open another issue.
It seems that the gauss resampling method is not available as a parameter when resampling.

jgrss · 2022-09-14T23:38:59Z

@willieseun I think I am getting close to the issue. Before we make any changes, can you check if the following creates the output that you are hoping for.

Note that I used geowombat data because I don't have access to the data you used.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray

import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt

le = LabelEncoder()

# psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']
psearlst = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']

labels = gpd.read_file(l8_224078_20200518_polygons) #gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
# Added 1 here
labels['Classvalue'] = le.fit(labels.name).transform(labels.name) + 1

fig, ax = plt.subplots(dpi=200, figsize=(5,5))

# Resampling for faster processing/testing
with gw.config.update(ref_res=500):
    with gw.open(l8_224078_20200518, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
        pl = Pipeline(
            [
                ('scaler', StandardScaler()),
                ('pca', PCA()),
                ('clf', RandomForestClassifier(n_estimators=100))
            ]
        )

        X, Xy, clf = fit(src, pl, labels, col="name")
        y = predict(src, X, clf)
       # Convert the numpy array to a DataArray and add the 'no data' value
        y = ndarray_to_xarray(
            src,
            y.astype('uint8'),
            band_names=['estimates'],
            row_chunks=64,
            col_chunks=64,
            attrs={
                'crs': src.crs,
                'res': src.res,
                'transform': src.transform,
                'nodatavals': (0,)
            }
        )
        print(y)
        y.gw.imshow(robust=True, ax=ax)
        y.gw.save("wom_RF.tif", overwrite=True)
        plt.tight_layout(pad=1)

willieseun · 2022-09-15T06:55:37Z

Ok, Let me check...

willieseun · 2022-09-15T08:02:27Z

It is returning a type error traceback.
TypeError: not all arguments converted during string formatting
from this line
X, Xy, clf = fit(src, pl, labels, col="name")

Note: I updated to the latest release

willieseun · 2022-09-15T08:12:36Z

The save function seems to be working well though.

jgrss · 2022-09-15T13:05:54Z

Apologies, it should be col='Classvalue' not 'name'.

willieseun · 2022-09-19T10:12:37Z

It is still not working properly.

willieseun · 2022-09-19T10:13:26Z

The resulting tif file is looking bad.

willieseun · 2022-09-19T12:31:06Z

I think you should try using it with classes more than 10. The shapefile I am using has 12 classes. Maybe that is why.

jgrss · 2022-09-19T13:15:37Z

@willieseun Can you elaborate on what you mean by bad? Do you mean that it is still not rendering the nodata value properly, that the classified values are not correct, or that the classification does not look accurate? On the latter, if you are using the test data then I would not expect it to look good because the data being used are just test data, which are meant to show the utility of the function but not to produce a good map.

The only things we are addressing with this open issue are 1) nodata rendering in the classification and 2) the classified values are correct. Can you confirm that either of these are still not as expected?

If you would like us to reproduce any errors with your data then you will need to post a link to your dataset.

willieseun · 2022-09-19T14:16:11Z

I have attached the predicted map

jgrss · 2022-09-19T14:29:29Z

Can you point out the issue? It looks like there are ~10 classes, so is this from your dataset? The pixels also look to be resampled, so did you keep the gw.config.update(ref_res=500)?

jgrss · 2022-09-19T14:32:05Z

It does look like zeros are being displayed, so if you attempted to set them as your nodata value then something is still not working. However, we will have some updates to how nodata values are handled coming up in #204.

willieseun · 2022-09-19T14:33:53Z

No. I changed it to 50.

jgrss · 2022-09-19T14:35:40Z

Okay, if you've modified anything then we need to see your code snippet in order to reproduce the results that you shared.

willieseun · 2022-09-19T14:42:58Z

What does 500 mean for clarification, 500m?

jgrss · 2022-09-19T14:44:43Z

What does 500 mean for clarification, 500m?

ref_res=value <CRS units>, so in this case 500 is in meters.

willieseun · 2022-09-19T14:54:42Z

Ok

willieseun · 2022-09-19T14:57:55Z

import matplotlib.pyplot as plt
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
import rioxarray
from pyspatialml import Raster
from pyspatialml.datasets import nc
from copy import deepcopy

le = LabelEncoder()
labels = gpd.read_file("trainingsamplesextentNEW.shp")
# Added 1 here
labels['Classvalues'] = le.fit(labels.Classname).transform(labels.Classname) + 1

predictors = ["MSSEP2020_RED.tif", "MSSEP2020_NIR.tif", "MSSEP2020_GREEN.tif", "MSSEP2020_REDEDGE.tif", "RGBSEP2020_BLUE.tif", "RGBSEP2020_GREEN.tif", "RGBSEP2020_GRAY.tif", "RGBSEP2020_RED.tif", "MSSEP2020_HEIGHT.tif", "RGBSEP2020_HEIGHT.tif",
"SEP_CTVI.tif", "SEP_GNDVI.tif", "SEP_KNDVI.tif", "SEP_MSAVI.tif", "SEP_MSAVI2.tif", "SEP_NDVI.tif", "SEP_NDWI.tif", "SEP_NRVI.tif", "SEP_SAVI.tif", "SEP_TTVI.tif"]
band_names = ["redms", "nirms", "greenms", "rededgems", "bluergb", "greenrgb", "grayrgb", "redrgb", "heightms", "heightrgb", "ctvi", "gndvi", "kndvi", "msavi", "msavi2", "ndvi", "ndwi", "nrvi", "savi", "ttvi"]
b_n = []
pred = ["MSSEP2020_RED.tif"]
stack1 = Raster(pred)
stack1.write('SEP_all.tif')
for x in range(1, len(predictors)+1):
    b_n.append(x)
# Use a data pipeline

fig, ax = plt.subplots(dpi=200, figsize=(5,5))

# Resampling for faster processing/testing
with gw.config.update(ref_res=50):
    with gw.open("SEP_all.tif", resampling='bilinear') as src:
        pl = Pipeline(
            [
                ('scaler', StandardScaler()),
                ('pca', PCA()),
                ('clf', RandomForestClassifier(n_estimators=100,  max_features=3))
            ]
        )
        print('Starting to fit')
        X, Xy, clf = fit(src, pl, labels, col="Classvalues")
        print('Starting to predict')
        y = predict(src, X, clf)
       # Convert the numpy array to a DataArray and add the 'no data' value
        y = ndarray_to_xarray(
            src,
            y.astype('uint8'),
            band_names=['estimates'],
            row_chunks=64,
            col_chunks=64,
            attrs={
                'crs': src.crs,
                'res': src.res,
                'transform': src.transform,
                'nodatavals': (0,)
            }
        )
        print(y)
        y.gw.imshow(robust=True, ax=ax)
        y.gw.save("wom_RF1.tif", overwrite=True)
        plt.tight_layout(pad=1)
plt.show()

jgrss · 2022-09-20T00:38:51Z

@willieseun below is a code snippet that masks the nodata values in the predictions, plots the data and hides nodata, and correctly saves the nodata values to file. Note that I still cannot reproduce your specific errors because I don't have access to your data. If you install the latest version (geowombat==2.0.6 from #204) you should be able to reproduce the code 👇 . We still need another PR to make some changes in the ML module (@mmann1123) so that you don't have to modify the DataArray like below. But at least this hopefully helps address your issue. Let us know if it doesn't.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict

import geopandas as gpd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt


def main():
    predictors = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']
    labels = gpd.read_file(l8_224078_20200518_polygons)

    fig, ax = plt.subplots(dpi=200, figsize=(5,5))

    # Resampling for faster processing/testing
    with gw.config.update(ref_res=500):
        with gw.open(
            l8_224078_20200518,
            resampling='bilinear',
            stack_dim='band',
            band_names=predictors
        ) as src:
            pl = Pipeline(
                [
                    ('scaler', StandardScaler()),
                    ('pca', PCA()),
                    ('clf', RandomForestClassifier(n_estimators=100))
                ]
            )

            X, Xy, clf = fit(src, pl, labels, col="name")
            y = predict(src, X, clf)
            y = (
                y.astype('uint8')
                # Coerce from numpy array to dask array (for gw.save())
                # you could borrow chunks from src.gw.row_chunks, but the gw.save()
                # method and rasterio require the blocks to be in intervals of 16
                .chunk({'band': -1, 'y': 64, 'x': 64})
                # Assign geo-attributes
                .assign_attrs(**src.attrs)
                # Set the 'no data' attribute
                .gw.assign_nodata_attrs(0)
                # Convert 'no data' values to nans
                .gw.mask_nodata()
            )
            print(y)
            y.gw.imshow(robust=True, ax=ax)
            y.gw.save("wom_RF.tif", overwrite=True)
            plt.tight_layout(pad=1)
            plt.savefig('test.png')


if __name__ == '__main__':
    main()

willieseun · 2022-09-20T14:02:54Z

Testing this with your data, this is my result.

mmann1123 · 2022-09-20T14:33:40Z

Yeah sorry, that's my fault. Something went sideways. I will need some patience.

willieseun · 2022-09-20T14:56:13Z

Alright. You have all the time.

mmann1123 · 2022-09-20T16:34:35Z

Ok this should be resolved soon.

mmann1123 · 2022-09-21T13:29:35Z

@willieseun This should be resolved and the new build pushed to conda-forge as well. Please note you need to specify your missing data value (if its not in the tif profile) by setting nodata in gw.open.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(dpi=200)

with gw.config.update(
    ref_res=300,
):
    with gw.open(l8_224078_20200518, nodata=0) as src:
        y1 = fit_predict(src, pl_wo_feat, aoi_poly, col="lc")

        y1.sel(band=["targ"]).gw.imshow(robust=True, ax=ax)

Closing this issue unless I hear otherwise.

mmann1123 · 2022-09-25T18:50:19Z

@willieseun Is this resolved?

willieseun · 2022-09-26T10:14:27Z

I am unable to update to the latest release currently.

mmann1123 · 2022-09-26T11:31:23Z

Do you have anaconda installed?

…

On Mon, Sep 26, 2022, 6:14 AM willieseun ***@***.***> wrote: I am unable to update to the latest release currently. — Reply to this email directly, view it on GitHub <#193 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHR6VB36F4MQ5LHNGWWYKLWAFZQ3ANCNFSM6AAAAAAQEZGFME> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

willieseun · 2022-09-26T12:04:45Z

No I use pip

mmann1123 · 2022-09-29T14:23:51Z

@willieseun I would recommend trying out miniconda and doing an install from conda-forge. Everything should be working now sorry that took a while to resolve but I am going to close this.

willieseun · 2022-10-07T08:39:27Z

Still not working

mmann1123 · 2022-10-07T13:34:30Z

Can you share some data and your script? I can't replicate on this side.

jgrss · 2022-10-07T14:01:37Z

@willieseun from the original post

I tried plotting the resulting .tif file in ArcMap and first, the predicted classes were not up to the number of classes in my training data. Second, I wasn't able to build unique values for the raster and the black bands around the raster are not disappearing when I set to display nodata value with no colour.

Issue #1: predicted classes do not match the input training classes
Issue #2: 'No data' values are not hidden in a GIS

Can you please describe what is not working? Is it still both of the issues that you raised? For predictions, unless we can replicate the issue with our test data, we need an example of your data if you are able to share somehow.

willieseun · 2022-10-09T15:46:31Z

Sorry for being so ambiguous. I meant I wasn't able to install with anaconda.

mmann1123 · 2022-10-11T14:19:24Z

I might be able to help interpret if you send the error from the install. In your terminal window try:

conda create geowom_env python=3.9
conda activate geowom_env
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install geowombat

jgrss · 2022-10-11T14:31:56Z

Thanks @mmann1123 and @willieseun, if this remains an issue, please open a new issue with your conda install blocker.

willieseun · 2022-12-15T11:06:44Z

Thanks, I have been able to install the new version

This was referenced Sep 7, 2022

Fix: mmann1123/mldata #194

Closed

Fix: mmann1123/m ldata #195

Closed

fix: mmann1123/m ldata #196

Merged

jgrss closed this as completed in #196 Sep 14, 2022

jgrss reopened this Sep 14, 2022

mmann1123 mentioned this issue Sep 20, 2022

fix: Origin/mmann1123 ml nodata #207

Merged

jgrss closed this as completed in #207 Sep 21, 2022

jgrss reopened this Sep 21, 2022

mmann1123 closed this as completed Sep 21, 2022

mmann1123 self-assigned this Sep 21, 2022

mmann1123 reopened this Sep 21, 2022

mmann1123 closed this as completed Sep 29, 2022

mmann1123 reopened this Oct 7, 2022

mmann1123 closed this as completed Oct 11, 2022

Predicted .tif file assigns a class to zero #193

Predicted .tif file assigns a class to zero #193

Comments

willieseun commented Sep 5, 2022

jgrss commented Sep 14, 2022

willieseun commented Sep 14, 2022

jgrss commented Sep 14, 2022

mmann1123 commented Sep 14, 2022 via email

willieseun commented Sep 14, 2022 • edited Loading

willieseun commented Sep 14, 2022

willieseun commented Sep 14, 2022

jgrss commented Sep 14, 2022

willieseun commented Sep 15, 2022

willieseun commented Sep 15, 2022 • edited Loading

willieseun commented Sep 15, 2022

jgrss commented Sep 15, 2022

willieseun commented Sep 19, 2022

willieseun commented Sep 19, 2022

willieseun commented Sep 19, 2022

jgrss commented Sep 19, 2022

willieseun commented Sep 19, 2022

jgrss commented Sep 19, 2022 • edited Loading

jgrss commented Sep 19, 2022

willieseun commented Sep 19, 2022

jgrss commented Sep 19, 2022

willieseun commented Sep 19, 2022

jgrss commented Sep 19, 2022 • edited Loading

willieseun commented Sep 19, 2022

willieseun commented Sep 19, 2022

jgrss commented Sep 20, 2022 • edited Loading

willieseun commented Sep 20, 2022

mmann1123 commented Sep 20, 2022

willieseun commented Sep 20, 2022

mmann1123 commented Sep 20, 2022

mmann1123 commented Sep 21, 2022

mmann1123 commented Sep 25, 2022

willieseun commented Sep 26, 2022

mmann1123 commented Sep 26, 2022 via email

willieseun commented Sep 26, 2022

mmann1123 commented Sep 29, 2022

willieseun commented Oct 7, 2022

mmann1123 commented Oct 7, 2022

jgrss commented Oct 7, 2022

willieseun commented Oct 9, 2022

mmann1123 commented Oct 11, 2022

jgrss commented Oct 11, 2022

willieseun commented Dec 15, 2022

willieseun commented Sep 14, 2022 •

edited

Loading

willieseun commented Sep 15, 2022 •

edited

Loading

jgrss commented Sep 19, 2022 •

edited

Loading

jgrss commented Sep 19, 2022 •

edited

Loading

jgrss commented Sep 20, 2022 •

edited

Loading