Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicted .tif file assigns a class to zero #193

Closed
willieseun opened this issue Sep 5, 2022 · 43 comments · Fixed by #196 or #207
Closed

Predicted .tif file assigns a class to zero #193

willieseun opened this issue Sep 5, 2022 · 43 comments · Fixed by #196 or #207
Assignees

Comments

@willieseun
Copy link

I tried plotting the resulting .tif file in ArcMap and first, the predicted classes were not up to the number of classes in my training data. Second, I wasn't able to build unique values for the raster and the black bands around the raster are not disappearing when I set to display nodata value with no colour.

@jgrss
Copy link
Owner

jgrss commented Sep 14, 2022

@mmann1123 I didn't intentionally close this. Do we need to reopen?

@willieseun
Copy link
Author

Please reopen. It even brings additional errors now.

@jgrss jgrss reopened this Sep 14, 2022
@jgrss
Copy link
Owner

jgrss commented Sep 14, 2022

@willieseun can you paste a code snippet to show us how you created your predictions?

@mmann1123
Copy link
Collaborator

mmann1123 commented Sep 14, 2022 via email

@willieseun
Copy link
Author

willieseun commented Sep 14, 2022

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
from glob import glob
import rioxarray
import os
import re
le = LabelEncoder()

psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']

labels = gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
labels['Classvalue'] = le.fit(labels.Classname).transform(labels.Classname)
print(labels)
predictors = ['something.tif']
with gw.open(predictors, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
pl = Pipeline([ ('scaler', StandardScaler()),
('pca', PCA()),
('clf', RandomForestClassifier(n_estimators=100, max_features=9))])
fig, ax = plt.subplots(dpi=200,figsize=(5,5))

        X, Xy, clf = fit(src, pl, labels, col="Classvalue")
	print('Starting to predict')
	y = predict(src, X, clf)
	y.plot(robust=True, ax=ax)
	y.sel(time="t1").gw.to_raster("wom_RF.tif")
	plt.tight_layout(pad=1)
	plt.show()

@willieseun
Copy link
Author

I hope you get it. It is not rendering it well enough.

@willieseun
Copy link
Author

Just to add, because I don't want to open another issue.
It seems that the gauss resampling method is not available as a parameter when resampling.

@jgrss
Copy link
Owner

jgrss commented Sep 14, 2022

@willieseun I think I am getting close to the issue. Before we make any changes, can you check if the following creates the output that you are hoping for.

Note that I used geowombat data because I don't have access to the data you used.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray

import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
le = LabelEncoder()

# psearlst = ['B1','B2','B3','B4','B5','B6','B7','B8','B9','B10']
psearlst = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']

labels = gpd.read_file(l8_224078_20200518_polygons) #gpd.read_file('Z:\Projects\Project2022\Tif file Oct\trainingsamplesoct.shp')
# Added 1 here
labels['Classvalue'] = le.fit(labels.name).transform(labels.name) + 1

fig, ax = plt.subplots(dpi=200, figsize=(5,5))

# Resampling for faster processing/testing
with gw.config.update(ref_res=500):
    with gw.open(l8_224078_20200518, resampling='bilinear', stack_dim="band", band_names=psearlst) as src:
        pl = Pipeline(
            [
                ('scaler', StandardScaler()),
                ('pca', PCA()),
                ('clf', RandomForestClassifier(n_estimators=100))
            ]
        )

        X, Xy, clf = fit(src, pl, labels, col="name")
        y = predict(src, X, clf)
       # Convert the numpy array to a DataArray and add the 'no data' value
        y = ndarray_to_xarray(
            src,
            y.astype('uint8'),
            band_names=['estimates'],
            row_chunks=64,
            col_chunks=64,
            attrs={
                'crs': src.crs,
                'res': src.res,
                'transform': src.transform,
                'nodatavals': (0,)
            }
        )
        print(y)
        y.gw.imshow(robust=True, ax=ax)
        y.gw.save("wom_RF.tif", overwrite=True)
        plt.tight_layout(pad=1)

@willieseun
Copy link
Author

Ok, Let me check...

@willieseun
Copy link
Author

willieseun commented Sep 15, 2022

It is returning a type error traceback.
TypeError: not all arguments converted during string formatting
from this line
X, Xy, clf = fit(src, pl, labels, col="name")

Note: I updated to the latest release

@willieseun
Copy link
Author

The save function seems to be working well though.

@jgrss
Copy link
Owner

jgrss commented Sep 15, 2022

Apologies, it should be col='Classvalue' not 'name'.

@willieseun
Copy link
Author

It is still not working properly.

@willieseun
Copy link
Author

The resulting tif file is looking bad.

@willieseun
Copy link
Author

I think you should try using it with classes more than 10. The shapefile I am using has 12 classes. Maybe that is why.

@jgrss
Copy link
Owner

jgrss commented Sep 19, 2022

@willieseun Can you elaborate on what you mean by bad? Do you mean that it is still not rendering the nodata value properly, that the classified values are not correct, or that the classification does not look accurate? On the latter, if you are using the test data then I would not expect it to look good because the data being used are just test data, which are meant to show the utility of the function but not to produce a good map.

The only things we are addressing with this open issue are 1) nodata rendering in the classification and 2) the classified values are correct. Can you confirm that either of these are still not as expected?

If you would like us to reproduce any errors with your data then you will need to post a link to your dataset.

@willieseun
Copy link
Author

geowom

I have attached the predicted map

@jgrss
Copy link
Owner

jgrss commented Sep 19, 2022

Can you point out the issue? It looks like there are ~10 classes, so is this from your dataset? The pixels also look to be resampled, so did you keep the gw.config.update(ref_res=500)?

@jgrss
Copy link
Owner

jgrss commented Sep 19, 2022

It does look like zeros are being displayed, so if you attempted to set them as your nodata value then something is still not working. However, we will have some updates to how nodata values are handled coming up in #204.

@willieseun
Copy link
Author

No. I changed it to 50.

@jgrss
Copy link
Owner

jgrss commented Sep 19, 2022

Okay, if you've modified anything then we need to see your code snippet in order to reproduce the results that you shared.

@willieseun
Copy link
Author

What does 500 mean for clarification, 500m?

@jgrss
Copy link
Owner

jgrss commented Sep 19, 2022

What does 500 mean for clarification, 500m?

ref_res=value <CRS units>, so in this case 500 is in meters.

@willieseun
Copy link
Author

Ok

@willieseun
Copy link
Author

import matplotlib.pyplot as plt
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict, fit_predict
from geowombat.core import ndarray_to_xarray
import geopandas as gpd
from sklearn_xarray.preprocessing import Featurizer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
import rioxarray
from pyspatialml import Raster
from pyspatialml.datasets import nc
from copy import deepcopy


le = LabelEncoder()
labels = gpd.read_file("trainingsamplesextentNEW.shp")
# Added 1 here
labels['Classvalues'] = le.fit(labels.Classname).transform(labels.Classname) + 1

predictors = ["MSSEP2020_RED.tif", "MSSEP2020_NIR.tif", "MSSEP2020_GREEN.tif", "MSSEP2020_REDEDGE.tif", "RGBSEP2020_BLUE.tif", "RGBSEP2020_GREEN.tif", "RGBSEP2020_GRAY.tif", "RGBSEP2020_RED.tif", "MSSEP2020_HEIGHT.tif", "RGBSEP2020_HEIGHT.tif",
"SEP_CTVI.tif", "SEP_GNDVI.tif", "SEP_KNDVI.tif", "SEP_MSAVI.tif", "SEP_MSAVI2.tif", "SEP_NDVI.tif", "SEP_NDWI.tif", "SEP_NRVI.tif", "SEP_SAVI.tif", "SEP_TTVI.tif"]
band_names = ["redms", "nirms", "greenms", "rededgems", "bluergb", "greenrgb", "grayrgb", "redrgb", "heightms", "heightrgb", "ctvi", "gndvi", "kndvi", "msavi", "msavi2", "ndvi", "ndwi", "nrvi", "savi", "ttvi"]
b_n = []
pred = ["MSSEP2020_RED.tif"]
stack1 = Raster(pred)
stack1.write('SEP_all.tif')
for x in range(1, len(predictors)+1):
    b_n.append(x)
# Use a data pipeline

fig, ax = plt.subplots(dpi=200, figsize=(5,5))

# Resampling for faster processing/testing
with gw.config.update(ref_res=50):
    with gw.open("SEP_all.tif", resampling='bilinear') as src:
        pl = Pipeline(
            [
                ('scaler', StandardScaler()),
                ('pca', PCA()),
                ('clf', RandomForestClassifier(n_estimators=100,  max_features=3))
            ]
        )
        print('Starting to fit')
        X, Xy, clf = fit(src, pl, labels, col="Classvalues")
        print('Starting to predict')
        y = predict(src, X, clf)
       # Convert the numpy array to a DataArray and add the 'no data' value
        y = ndarray_to_xarray(
            src,
            y.astype('uint8'),
            band_names=['estimates'],
            row_chunks=64,
            col_chunks=64,
            attrs={
                'crs': src.crs,
                'res': src.res,
                'transform': src.transform,
                'nodatavals': (0,)
            }
        )
        print(y)
        y.gw.imshow(robust=True, ax=ax)
        y.gw.save("wom_RF1.tif", overwrite=True)
        plt.tight_layout(pad=1)
plt.show()

@jgrss
Copy link
Owner

jgrss commented Sep 20, 2022

@willieseun below is a code snippet that masks the nodata values in the predictions, plots the data and hides nodata, and correctly saves the nodata values to file. Note that I still cannot reproduce your specific errors because I don't have access to your data. If you install the latest version (geowombat==2.0.6 from #204) you should be able to reproduce the code 👇 . We still need another PR to make some changes in the ML module (@mmann1123) so that you don't have to modify the DataArray like below. But at least this hopefully helps address your issue. Let us know if it doesn't.

import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
from geowombat.ml import fit, predict

import geopandas as gpd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt


def main():
    predictors = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2']
    labels = gpd.read_file(l8_224078_20200518_polygons)

    fig, ax = plt.subplots(dpi=200, figsize=(5,5))

    # Resampling for faster processing/testing
    with gw.config.update(ref_res=500):
        with gw.open(
            l8_224078_20200518,
            resampling='bilinear',
            stack_dim='band',
            band_names=predictors
        ) as src:
            pl = Pipeline(
                [
                    ('scaler', StandardScaler()),
                    ('pca', PCA()),
                    ('clf', RandomForestClassifier(n_estimators=100))
                ]
            )

            X, Xy, clf = fit(src, pl, labels, col="name")
            y = predict(src, X, clf)
            y = (
                y.astype('uint8')
                # Coerce from numpy array to dask array (for gw.save())
                # you could borrow chunks from src.gw.row_chunks, but the gw.save()
                # method and rasterio require the blocks to be in intervals of 16
                .chunk({'band': -1, 'y': 64, 'x': 64})
                # Assign geo-attributes
                .assign_attrs(**src.attrs)
                # Set the 'no data' attribute
                .gw.assign_nodata_attrs(0)
                # Convert 'no data' values to nans
                .gw.mask_nodata()
            )
            print(y)
            y.gw.imshow(robust=True, ax=ax)
            y.gw.save("wom_RF.tif", overwrite=True)
            plt.tight_layout(pad=1)
            plt.savefig('test.png')


if __name__ == '__main__':
    main()

@willieseun
Copy link
Author

Testing this with your data, this is my result.
test

@mmann1123
Copy link
Collaborator

Yeah sorry, that's my fault. Something went sideways. I will need some patience.

@willieseun
Copy link
Author

Alright. You have all the time.

@mmann1123
Copy link
Collaborator

Ok this should be resolved soon.

@jgrss jgrss reopened this Sep 21, 2022
@mmann1123
Copy link
Collaborator

@willieseun This should be resolved and the new build pushed to conda-forge as well. Please note you need to specify your missing data value (if its not in the tif profile) by setting nodata in gw.open.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(dpi=200)

with gw.config.update(
    ref_res=300,
):
    with gw.open(l8_224078_20200518, nodata=0) as src:
        y1 = fit_predict(src, pl_wo_feat, aoi_poly, col="lc")

        y1.sel(band=["targ"]).gw.imshow(robust=True, ax=ax)

Closing this issue unless I hear otherwise.

@mmann1123 mmann1123 self-assigned this Sep 21, 2022
@mmann1123 mmann1123 reopened this Sep 21, 2022
@mmann1123
Copy link
Collaborator

@willieseun Is this resolved?

@willieseun
Copy link
Author

I am unable to update to the latest release currently.

@mmann1123
Copy link
Collaborator

mmann1123 commented Sep 26, 2022 via email

@willieseun
Copy link
Author

No I use pip

@mmann1123
Copy link
Collaborator

@willieseun I would recommend trying out miniconda and doing an install from conda-forge. Everything should be working now sorry that took a while to resolve but I am going to close this.

@willieseun
Copy link
Author

Still not working

@mmann1123 mmann1123 reopened this Oct 7, 2022
@mmann1123
Copy link
Collaborator

Can you share some data and your script? I can't replicate on this side.

@jgrss
Copy link
Owner

jgrss commented Oct 7, 2022

@willieseun from the original post

I tried plotting the resulting .tif file in ArcMap and first, the predicted classes were not up to the number of classes in my training data. Second, I wasn't able to build unique values for the raster and the black bands around the raster are not disappearing when I set to display nodata value with no colour.

Issue #1: predicted classes do not match the input training classes
Issue #2: 'No data' values are not hidden in a GIS

Can you please describe what is not working? Is it still both of the issues that you raised? For predictions, unless we can replicate the issue with our test data, we need an example of your data if you are able to share somehow.

@willieseun
Copy link
Author

Sorry for being so ambiguous. I meant I wasn't able to install with anaconda.

@mmann1123
Copy link
Collaborator

I might be able to help interpret if you send the error from the install. In your terminal window try:

conda create geowom_env python=3.9
conda activate geowom_env
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install geowombat 

@jgrss
Copy link
Owner

jgrss commented Oct 11, 2022

Thanks @mmann1123 and @willieseun, if this remains an issue, please open a new issue with your conda install blocker.

@willieseun
Copy link
Author

Thanks, I have been able to install the new version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants