TST: Proof of concept reducing IO dependence in tests #3250
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #3237 @jorisvandenbossche left the following comment re the new tests I added there:
Originally posted by @jorisvandenbossche in #3237 (comment)
I decided to have a look and see if it is possible to to this in a more generic way, since we use the nybb dataset quite a lot throughout the tests. (now it's also possible Joris meant this from a test runtime perspective as well, in which case I expect moving things from fiona / pyogrio to json will actually slow things down - but it could be possible to instead load files from an IO engine if present)
Here I've looked at if we were to store a copy of nybb in geojson (which we can read without fiona/pyogrio) for usage in the the non IO tests. We also need to then pass down the CRS, and the data dtype schema of the columns since geojson is not strongly typed. Having a quick look, this seems fairly feasible - I've just converted the usages in
test_geodataframe
to get some thoughts.Just realised now, it may be better to target geoarrow instead of geojson? Pyarrow is currently an optional dependency, but pandas will make it required in 3.0 and thus it will transitively be required.
One thing I am not sure about is if this needs to handle platform specific int widths, not sure what shapefile does (I've generated the geojsons on windows, so if all the linux tests fail I expect that is why)