Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Extract duplicates features #52114

Closed
2 tasks done
pinilla66 opened this issue Mar 6, 2023 · 4 comments · Fixed by #52135
Closed
2 tasks done

Random Extract duplicates features #52114

pinilla66 opened this issue Mar 6, 2023 · 4 comments · Fixed by #52135
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Processing Relating to QGIS Processing framework or individual Processing algorithms

Comments

@pinilla66
Copy link

What is the bug or the crash?

I've got a vector layer with 2705 features from which I want to randonmly extract 1000 of them. Using the Random Extract tool, it returns a set of 1000 features but some of them are duplicated (up to four times). I've checked the id attribute and it is correctly set.
For being on the safe side, I've also used the Random selection tool to verify if it returns a correct set of features and it has worked well.

Steps to reproduce the issue

  1. Go to Toolbox
  2. Click on Random extract
  3. Select vector layer
  4. Set 1000 features

Versions

Versión de QGIS
3.22.16-Białowieża
Revisión del código de QGIS
6f08e4d
Versión Qt
5.15.3
Versión de Python
3.9.5
Versión de GDAL/OGR
3.6.2
Versión de PROJ
9.1.1
Versión del registro de base de datos EPSG
v10.076 (2022-08-31)
Versión GEOS
3.11.1-CAPI-1.17.1
Versión de SQLite
3.39.4
Versión de PDAL
2.4.3
Versión del cliente de PostgreSQL
14.3
Versión de SpatiaLite
5.0.1
Versión de QWT
6.1.6
Versión de QScintilla2
2.13.1
Versión del SO
Windows 10 Version 2009

Complementos activos de Python
active_fire
0.3
DataPlotly
3.9.2
DEMto3D
3.51
HCMGIS
23.2.1
latlontools
3.6.7
OSMDownloader
1.0.3
pg_raster_import
3.1.0
pointsamplingtool
0.5.4
ProjectPackager
0.5.1
qdraw
3.0.2
qgisnetworklogger
0.2.0
QuickOSM
2.1.1
quick_map_services
0.19.32
sigpac_downloader
0.3
db_manager
0.1.20
grassprovider
2.12.99
MetaSearch
0.3.5
processing
2.12.99
sagaprovider
2.12.99

Supported QGIS version

  • I'm running a supported QGIS version according to the roadmap.

New profile

  • I tried with a new QGIS profile

Additional context

No response

@pinilla66 pinilla66 added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Mar 6, 2023
@YoannQDQ YoannQDQ added the Processing Relating to QGIS Processing framework or individual Processing algorithms label Mar 7, 2023
@roya0045
Copy link
Contributor

roya0045 commented Mar 7, 2023

The algorythm clearly wants to select duplicates, I doubt this is a bug in any way.

@pinilla66
Copy link
Author

Well, I can't imagine why anyone would want to extract the same feature twice or more times for geographical analyses, but it's probably my ignorance. Should I close this thread, then?

@roya0045
Copy link
Contributor

roya0045 commented Mar 7, 2023

I wager that this ticket could be converted in a Feature Request to have an option to not generate duplicates.

In the meantime the select & export option can't generate duplicates.

As far as why duplication is allowed, I'm not sure.

@agiudiceandrea
Copy link
Contributor

agiudiceandrea commented Mar 7, 2023

@roya0045 anyway it seems to me there is an inconsistency: both the "Random extract" and the "Random selection" algorithms have the same description in the short help and in the documentation, but they behave differently.

Moreover the "Random extract" doesn't respect the number of feature to extract, even counting the duplicates.

Trying with a simple layer with 10 features testpoints.zip and trying to extract e.g. 7 features, the resulting layer will not always contain 7 features, but it could also contain less then 7 features.

Moreover also the "Random extract within subsets" algorithm behaves differently from the "Random extract" alg: it seems it extracts always the exact requested number of features without duplicates like the "Random selection" and the "Random selection within subsets" algorithms do.

So it seems we have three algs ("Random selection", "Random selection within subsets" and "Random extract within subsets") that behaves in a way, and only one algorithm ("Random extract") that behaves in a different way...

Maybe @alexbruy could shed some light on this.

YoannQDQ added a commit to YoannQDQ/QGIS that referenced this issue Mar 7, 2023
YoannQDQ added a commit to YoannQDQ/QGIS that referenced this issue Mar 16, 2023
YoannQDQ added a commit to YoannQDQ/QGIS that referenced this issue Mar 21, 2023
YoannQDQ added a commit to YoannQDQ/QGIS that referenced this issue Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Processing Relating to QGIS Processing framework or individual Processing algorithms
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants